Re: [PATCH] py3: make ui.write() call fout.flush() per line if fout is a tty or pager

Manuel Jacob Sun, 07 Jun 2020 23:22:19 -0700

On 2020-06-07 13:55, Yuya Nishihara wrote:

# HG changeset patch
# User Yuya Nishihara <y...@tcha.org>
# Date 1591518045 -32400
#      Sun Jun 07 17:20:45 2020 +0900
# Node ID ee3cd52b4b338e05b0a22cb87580d78f1ad7763d
# Parent  03ba7de6a8b9ab339a6dafab969e890228b4c2f1
py3: make ui.write() call fout.flush() per line if fout is a tty orpager
Since sys.stdout.buffer of Python 3 can't be configured as aline-buffered
stream, we need a workaround:

 a. make callers do flush() where appropriate
 b. make ui insert flush() automatically
 c. use unbuffered stream
With my testing, inserting flush() automatically showed a better resultthanusing unbuffered io, so this patch implements (b). Since ui._isatty()call
isn't cheap, the result is cached. I also made ui.write() not scan the
entire string because it's unlikely that a string including '\n'doesn't
end with '\n'.

Maybe we can backport a subset of this patch to stable to mitigate the
ui.status() issue.

All in all, I agree that inserting a flush() per line automatically isthe best approach for now, as it harmonizes the behavior towards what weso far considered to be "the correct behavior" (line-buffered if thestdout is a TTY, which works on Python 2 on UNIX).

So far, the implicit contract of ui.write() included that the stream isflushed if the string includes a newline. If we backport the patch tostable, it should definitively not include the optimization because itbreaks that contract. In default, we can have the optimization after weensured that all callers continue to work with the new contract. Maybeit’s better to completely move the optimization (and the discussion) toa follow-up patch.

Benchmark result:

  % HGRCPATH=/dev/null python3 ./hg --cwd ~/mirrors/mozilla-central \
  --time --pager no status --all

  a. base
  time: real 10.950 secs (user 8.760+0.000 sys 1.270+0.000)
  time: real 9.450 secs (user 8.530+0.000 sys 0.910+0.000)
  time: real 10.180 secs (user 9.290+0.000 sys 0.880+0.000)

  b. this patch
  time: real 11.890 secs (user 10.740+0.000 sys 1.140+0.000)
  time: real 10.430 secs (user 9.410+0.000 sys 1.020+0.000)
  time: real 10.820 secs (user 9.500+0.000 sys 1.300+0.000)

  c. unbuffered io
  time: real 12.000 secs (user 10.350+0.000 sys 1.640+0.000)
  time: real 11.710 secs (user 10.170+0.000 sys 1.520+0.000)
  time: real 11.750 secs (user 9.900+0.000 sys 1.550+0.000)

  d. base python2 (line-buffered libc stdio)
  time: real 8.170 secs (user 6.990+0.000 sys 1.110+0.000)
  time: real 7.880 secs (user 6.830+0.000 sys 1.040+0.000)
  time: real 7.990 secs (user 6.920+0.000 sys 1.060+0.000)


On my machine:

base:
time: real 10.310 secs (user 9.070+0.000 sys 1.200+0.000)
time: real 10.290 secs (user 9.130+0.000 sys 1.040+0.000)
time: real 10.260 secs (user 9.320+0.000 sys 0.920+0.000)

this patch:
time: real 12.850 secs (user 11.300+0.000 sys 1.500+0.000)
time: real 12.960 secs (user 11.300+0.000 sys 1.580+0.000)
time: real 12.510 secs (user 10.900+0.000 sys 1.570+0.000)

unbuffered (my patch):
time: real 13.980 secs (user 11.710+0.000 sys 2.180+0.000)
time: real 14.570 secs (user 12.300+0.000 sys 2.230+0.000)
time: real 13.870 secs (user 11.610+0.000 sys 2.110+0.000)

base on Python 2 (line-buffered libc stdio):
time: real 10.950 secs (user 9.240+0.000 sys 1.670+0.000)
time: real 11.140 secs (user 9.490+0.000 sys 1.630+0.000)
time: real 11.290 secs (user 9.640+0.000 sys 1.620+0.000)

Python 2 is slower on my machine, as it was compiled with lessaggressive build flags. On Python 3, the results have the same order ason your machine, but the differences between them are larger. Overall, Ithink that the performance overhead is justified by the behavior fix.

Number of write() syscalls of the commands above:

% sudo bpftrace -e 'tracepoint:syscalls:sys_exit_write /comm =="python" \|| comm == "python3"/ { @dist[pid] = hist(args->ret); @count[pid] =count() }'


  a. base
  @count[252984]: 20810
  @dist[252984]:
  [32, 64)               1 |
         |
  [64, 128)              0 |
         |
  [128, 256)             0 |
         |
  [256, 512)             0 |
         |
  [512, 1K)          20346
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
  [1K, 2K)             462 |@
         |
  [2K, 4K)               0 |
         |
  [4K, 8K)               0 |
         |
  [8K, 16K)              1 |
         |

  b. this patch
  @count[252895]: 230220
  @dist[252895]:
  [16, 32)               9 |
         |
  [32, 64)           42297 |@@@@@@@@@@@@
         |
  [64, 128)         170530
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
  [128, 256)         17384 |@@@@@
         |

  c. unbuffered io
  @count[252934]: 460441
  @dist[252934]:
  [8, 16)           230220
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
  [16, 32)            1108 |
         |
  [32, 64)           76540 |@@@@@@@@@@@@@@@@@
         |
  [64, 128)         143892 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
         |
  [128, 256)          8679 |@
         |
  [256, 512)             0 |
         |
  [512, 1K)              0 |
         |
  [1K, 2K)               0 |
         |
  [2K, 4K)               0 |
         |
  [4K, 8K)               0 |
         |
  [8K, 16K)              1 |
         |
  [16K, 32K)             0 |
         |
  [32K, 64K)             1 |
         |

  d. base python2 (line-buffered libc stdio)
  @count[253265]: 230220
  @dist[253265]:
  [16, 32)               9 |
         |
  [32, 64)           42297 |@@@@@@@@@@@@
         |
  [64, 128)         170530
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
  [128, 256)         17384 |@@@@@
         |

You can see the number of write() syscalls is doubled for unbufferedio. Theremay be a command giving worse result than "hg status", but I don'tknow.


Benchmark result with non-tty device:

  % HGRCPATH=/dev/null python3 ./hg --cwd ~/mirrors/mozilla-central \
  --time --pager no status --all > /dev/null

  a. base
  time: real 3.300 secs (user 2.830+0.000 sys 0.460+0.000)
  time: real 3.300 secs (user 2.790+0.000 sys 0.500+0.000)
  time: real 3.340 secs (user 2.860+0.000 sys 0.480+0.000)

  b. this patch
  time: real 3.460 secs (user 2.900+0.000 sys 0.550+0.000)
  time: real 3.330 secs (user 2.850+0.000 sys 0.490+0.000)
  time: real 3.320 secs (user 2.880+0.000 sys 0.440+0.000)

So there's no significant performance degradation when stdout isn't atty.


diff --git a/mercurial/ui.py b/mercurial/ui.py
--- a/mercurial/ui.py
+++ b/mercurial/ui.py
@@ -563,6 +563,8 @@ class ui(object):
             )
             self.tracebackflag = self.configbool(b'ui', b'traceback')

self.logblockedtimes = self.configbool(b'ui',b'logblockedtimes')

+            # clear cache which depends on ui.nontty
+            self.__dict__.pop('_fout_needs_flush_per_line', None)

         if section in (None, b'trusted'):
             # update trust information
@@ -1026,6 +1028,7 @@ class ui(object):
     def fout(self, f):
         self._fout = f
         self._fmsgout, self._fmsgerr = _selectmsgdests(self)
+        self.__dict__.pop('_fout_needs_flush_per_line', None)

     @property
     def ferr(self):
@@ -1153,6 +1156,9 @@ class ui(object):
                     label = opts.get('label', b'')
                     msg = self.label(msg, label)
                 dest.write(msg)
+            # bytes stream can't be line-buffered on Python 3.

+ if self._fout_needs_flush_per_line andmsg.endswith(b'\n'):

+                dest.flush()
         except IOError as err:
             raise error.StdioError(err)
         finally:
@@ -1202,6 +1208,13 @@ class ui(object):
             # including stdout.

if dest is self._ferr and not getattr(self._ferr,'closed', False):

                 dest.flush()
+            # bytes stream can't be line-buffered on Python 3.
+            elif (
+                dest is self._fout
+                and self._fout_needs_flush_per_line
+                and msg.endswith(b'\n')
+            ):
+                dest.flush()
         except IOError as err:
             if dest is self._ferr and err.errno in (
                 errno.EPIPE,
@@ -1242,6 +1255,12 @@ class ui(object):
                 util.timer() - starttime
             ) * 1000

+    @util.propertycache
+    def _fout_needs_flush_per_line(self):

+ """True if fout should be line-buffered and needs explicitflush() to

+        behave as such"""
+        return pycompat.ispy3 and (self.pageractive or
self._isatty(self._fout))

This implements a behavior that might differ from that on Python 2. OnPython 2 on non-Windows systems, procutil opens stdout in line-bufferedmode if stdout is a TTY, no matter whether ui.nontty is set or not. I’mnot saying that this is the desirable behavior, but I’d like to haveconsistent behavior across Python versions.

The fact that we have to check `self.pageractive` shows that it is riskyto compute `self._fout_needs_flush_per_line` lazily, as TTY-ness of a FDcan change over time but for the purposes of checking whether the outputshould line-buffered, we have to check the TTY-ness at specific times.The case when `self.pageractive` is True might be the only case wherethe TTY-ness of the FD is off, but I think we should avoid that we haveto exhaustively enumerate all these cases.

In procutil, we already have a place where we know whether stdout shouldbe line-buffered and whether the line-buffering is not provided by thestream object (although the current code is buggy). We should avoidduplicating that logic. In general, we can assume that places thatcreate stream objects know if line-buffering should be used but is notsupported (e.g. open() nowadays prints a warning).

The question is how to communicate the information that the streamshould be flushed per line into the ui object. One way would be to set a`_needs_flush_per_line` attribute on the stream.

An alternative implementation approach would be to push the flushing outof ui by creating a wrapper around binary streams in procutil thatimplements the line buffering. It has the advantage that other users ofprocutil.stdout would also automatically flush correctly. This will havea small overhead, but it will likely end up in the noise and it doesn’tfundamentally change performance characteristics like unbuffered does.The code for the non-TTY case could stay untouched with this approach.

I can implement and benchmark these solutions, but I’m not sure I’llhave the time later today.

+
     def _isatty(self, fh):
         if self.configbool(b'ui', b'nontty'):
             return False
@@ -1260,6 +1279,8 @@ class ui(object):
             return self._fin, self._fout
         fin, fout = procutil.protectstdio(self._fin, self._fout)

self._finoutredirected = (fin, fout) != (self._fin,self._fout)

+        # clear cache which depends on isatty(fout)
+        self.__dict__.pop('_fout_needs_flush_per_line', None)
         return fin, fout

     def restorefinout(self, fin, fout):
@@ -1270,6 +1291,8 @@ class ui(object):

# protectfinout() won't create more than one duplicatedstreams,

         # so we can just turn the redirection flag off.
         self._finoutredirected = False
+        # clear cache which depends on isatty(fout)
+        self.__dict__.pop('_fout_needs_flush_per_line', None)

     @contextlib.contextmanager
     def protectedfinout(self):
@@ -1344,6 +1367,8 @@ class ui(object):
             # auto-detection of things being formatted.

self.setconfig(b'ui', b'formatted', wasformatted,b'pager')

             self.setconfig(b'ui', b'interactive', False, b'pager')
+            # clear cache which depends on pageractive
+            self.__dict__.pop('_fout_needs_flush_per_line', None)

# If pagermode differs from color.mode, reconfigure colornow that

             # pageractive is set.
diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py
--- a/mercurial/utils/procutil.py
+++ b/mercurial/utils/procutil.py
@@ -57,8 +57,7 @@ if isatty(stdout):
         # Windows doesn't support line buffering
         stdout = os.fdopen(stdout.fileno(), 'wb', 0)
     elif not pycompat.ispy3:

- # on Python 3, stdout (sys.stdout.buffer) is already linebuffered and

-        # buffering=1 is not handled in binary mode

+ # on Python 3, there's no way to make bytes streamline-buffered

         stdout = os.fdopen(stdout.fileno(), 'wb', 1)

 if pycompat.iswindows:

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: [PATCH] py3: make ui.write() call fout.flush() per line if fout is a tty or pager

Reply via email to