On 2020-06-07 13:55, Yuya Nishihara wrote:
# HG changeset patch
# User Yuya Nishihara <y...@tcha.org>
# Date 1591518045 -32400
# Sun Jun 07 17:20:45 2020 +0900
# Node ID ee3cd52b4b338e05b0a22cb87580d78f1ad7763d
# Parent 03ba7de6a8b9ab339a6dafab969e890228b4c2f1
py3: make ui.write() call fout.flush() per line if fout is a tty or
pager
Since sys.stdout.buffer of Python 3 can't be configured as a
line-buffered
stream, we need a workaround:
a. make callers do flush() where appropriate
b. make ui insert flush() automatically
c. use unbuffered stream
With my testing, inserting flush() automatically showed a better result
than
using unbuffered io, so this patch implements (b). Since ui._isatty()
call
isn't cheap, the result is cached. I also made ui.write() not scan the
entire string because it's unlikely that a string including '\n'
doesn't
end with '\n'.
Maybe we can backport a subset of this patch to stable to mitigate the
ui.status() issue.
All in all, I agree that inserting a flush() per line automatically is
the best approach for now, as it harmonizes the behavior towards what we
so far considered to be "the correct behavior" (line-buffered if the
stdout is a TTY, which works on Python 2 on UNIX).
So far, the implicit contract of ui.write() included that the stream is
flushed if the string includes a newline. If we backport the patch to
stable, it should definitively not include the optimization because it
breaks that contract. In default, we can have the optimization after we
ensured that all callers continue to work with the new contract. Maybe
it’s better to completely move the optimization (and the discussion) to
a follow-up patch.
Benchmark result:
% HGRCPATH=/dev/null python3 ./hg --cwd ~/mirrors/mozilla-central \
--time --pager no status --all
a. base
time: real 10.950 secs (user 8.760+0.000 sys 1.270+0.000)
time: real 9.450 secs (user 8.530+0.000 sys 0.910+0.000)
time: real 10.180 secs (user 9.290+0.000 sys 0.880+0.000)
b. this patch
time: real 11.890 secs (user 10.740+0.000 sys 1.140+0.000)
time: real 10.430 secs (user 9.410+0.000 sys 1.020+0.000)
time: real 10.820 secs (user 9.500+0.000 sys 1.300+0.000)
c. unbuffered io
time: real 12.000 secs (user 10.350+0.000 sys 1.640+0.000)
time: real 11.710 secs (user 10.170+0.000 sys 1.520+0.000)
time: real 11.750 secs (user 9.900+0.000 sys 1.550+0.000)
d. base python2 (line-buffered libc stdio)
time: real 8.170 secs (user 6.990+0.000 sys 1.110+0.000)
time: real 7.880 secs (user 6.830+0.000 sys 1.040+0.000)
time: real 7.990 secs (user 6.920+0.000 sys 1.060+0.000)
On my machine:
base:
time: real 10.310 secs (user 9.070+0.000 sys 1.200+0.000)
time: real 10.290 secs (user 9.130+0.000 sys 1.040+0.000)
time: real 10.260 secs (user 9.320+0.000 sys 0.920+0.000)
this patch:
time: real 12.850 secs (user 11.300+0.000 sys 1.500+0.000)
time: real 12.960 secs (user 11.300+0.000 sys 1.580+0.000)
time: real 12.510 secs (user 10.900+0.000 sys 1.570+0.000)
unbuffered (my patch):
time: real 13.980 secs (user 11.710+0.000 sys 2.180+0.000)
time: real 14.570 secs (user 12.300+0.000 sys 2.230+0.000)
time: real 13.870 secs (user 11.610+0.000 sys 2.110+0.000)
base on Python 2 (line-buffered libc stdio):
time: real 10.950 secs (user 9.240+0.000 sys 1.670+0.000)
time: real 11.140 secs (user 9.490+0.000 sys 1.630+0.000)
time: real 11.290 secs (user 9.640+0.000 sys 1.620+0.000)
Python 2 is slower on my machine, as it was compiled with less
aggressive build flags. On Python 3, the results have the same order as
on your machine, but the differences between them are larger. Overall, I
think that the performance overhead is justified by the behavior fix.
Number of write() syscalls of the commands above:
% sudo bpftrace -e 'tracepoint:syscalls:sys_exit_write /comm ==
"python" \
|| comm == "python3"/ { @dist[pid] = hist(args->ret); @count[pid] =
count() }'
a. base
@count[252984]: 20810
@dist[252984]:
[32, 64) 1 |
|
[64, 128) 0 |
|
[128, 256) 0 |
|
[256, 512) 0 |
|
[512, 1K) 20346
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1K, 2K) 462 |@
|
[2K, 4K) 0 |
|
[4K, 8K) 0 |
|
[8K, 16K) 1 |
|
b. this patch
@count[252895]: 230220
@dist[252895]:
[16, 32) 9 |
|
[32, 64) 42297 |@@@@@@@@@@@@
|
[64, 128) 170530
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[128, 256) 17384 |@@@@@
|
c. unbuffered io
@count[252934]: 460441
@dist[252934]:
[8, 16) 230220
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32) 1108 |
|
[32, 64) 76540 |@@@@@@@@@@@@@@@@@
|
[64, 128) 143892 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
|
[128, 256) 8679 |@
|
[256, 512) 0 |
|
[512, 1K) 0 |
|
[1K, 2K) 0 |
|
[2K, 4K) 0 |
|
[4K, 8K) 0 |
|
[8K, 16K) 1 |
|
[16K, 32K) 0 |
|
[32K, 64K) 1 |
|
d. base python2 (line-buffered libc stdio)
@count[253265]: 230220
@dist[253265]:
[16, 32) 9 |
|
[32, 64) 42297 |@@@@@@@@@@@@
|
[64, 128) 170530
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[128, 256) 17384 |@@@@@
|
You can see the number of write() syscalls is doubled for unbuffered
io. There
may be a command giving worse result than "hg status", but I don't
know.
Benchmark result with non-tty device:
% HGRCPATH=/dev/null python3 ./hg --cwd ~/mirrors/mozilla-central \
--time --pager no status --all > /dev/null
a. base
time: real 3.300 secs (user 2.830+0.000 sys 0.460+0.000)
time: real 3.300 secs (user 2.790+0.000 sys 0.500+0.000)
time: real 3.340 secs (user 2.860+0.000 sys 0.480+0.000)
b. this patch
time: real 3.460 secs (user 2.900+0.000 sys 0.550+0.000)
time: real 3.330 secs (user 2.850+0.000 sys 0.490+0.000)
time: real 3.320 secs (user 2.880+0.000 sys 0.440+0.000)
So there's no significant performance degradation when stdout isn't a
tty.
diff --git a/mercurial/ui.py b/mercurial/ui.py
--- a/mercurial/ui.py
+++ b/mercurial/ui.py
@@ -563,6 +563,8 @@ class ui(object):
)
self.tracebackflag = self.configbool(b'ui', b'traceback')
self.logblockedtimes = self.configbool(b'ui',
b'logblockedtimes')
+ # clear cache which depends on ui.nontty
+ self.__dict__.pop('_fout_needs_flush_per_line', None)
if section in (None, b'trusted'):
# update trust information
@@ -1026,6 +1028,7 @@ class ui(object):
def fout(self, f):
self._fout = f
self._fmsgout, self._fmsgerr = _selectmsgdests(self)
+ self.__dict__.pop('_fout_needs_flush_per_line', None)
@property
def ferr(self):
@@ -1153,6 +1156,9 @@ class ui(object):
label = opts.get('label', b'')
msg = self.label(msg, label)
dest.write(msg)
+ # bytes stream can't be line-buffered on Python 3.
+ if self._fout_needs_flush_per_line and
msg.endswith(b'\n'):
+ dest.flush()
except IOError as err:
raise error.StdioError(err)
finally:
@@ -1202,6 +1208,13 @@ class ui(object):
# including stdout.
if dest is self._ferr and not getattr(self._ferr,
'closed', False):
dest.flush()
+ # bytes stream can't be line-buffered on Python 3.
+ elif (
+ dest is self._fout
+ and self._fout_needs_flush_per_line
+ and msg.endswith(b'\n')
+ ):
+ dest.flush()
except IOError as err:
if dest is self._ferr and err.errno in (
errno.EPIPE,
@@ -1242,6 +1255,12 @@ class ui(object):
util.timer() - starttime
) * 1000
+ @util.propertycache
+ def _fout_needs_flush_per_line(self):
+ """True if fout should be line-buffered and needs explicit
flush() to
+ behave as such"""
+ return pycompat.ispy3 and (self.pageractive or
self._isatty(self._fout))
This implements a behavior that might differ from that on Python 2. On
Python 2 on non-Windows systems, procutil opens stdout in line-buffered
mode if stdout is a TTY, no matter whether ui.nontty is set or not. I’m
not saying that this is the desirable behavior, but I’d like to have
consistent behavior across Python versions.
The fact that we have to check `self.pageractive` shows that it is risky
to compute `self._fout_needs_flush_per_line` lazily, as TTY-ness of a FD
can change over time but for the purposes of checking whether the output
should line-buffered, we have to check the TTY-ness at specific times.
The case when `self.pageractive` is True might be the only case where
the TTY-ness of the FD is off, but I think we should avoid that we have
to exhaustively enumerate all these cases.
In procutil, we already have a place where we know whether stdout should
be line-buffered and whether the line-buffering is not provided by the
stream object (although the current code is buggy). We should avoid
duplicating that logic. In general, we can assume that places that
create stream objects know if line-buffering should be used but is not
supported (e.g. open() nowadays prints a warning).
The question is how to communicate the information that the stream
should be flushed per line into the ui object. One way would be to set a
`_needs_flush_per_line` attribute on the stream.
An alternative implementation approach would be to push the flushing out
of ui by creating a wrapper around binary streams in procutil that
implements the line buffering. It has the advantage that other users of
procutil.stdout would also automatically flush correctly. This will have
a small overhead, but it will likely end up in the noise and it doesn’t
fundamentally change performance characteristics like unbuffered does.
The code for the non-TTY case could stay untouched with this approach.
I can implement and benchmark these solutions, but I’m not sure I’ll
have the time later today.
+
def _isatty(self, fh):
if self.configbool(b'ui', b'nontty'):
return False
@@ -1260,6 +1279,8 @@ class ui(object):
return self._fin, self._fout
fin, fout = procutil.protectstdio(self._fin, self._fout)
self._finoutredirected = (fin, fout) != (self._fin,
self._fout)
+ # clear cache which depends on isatty(fout)
+ self.__dict__.pop('_fout_needs_flush_per_line', None)
return fin, fout
def restorefinout(self, fin, fout):
@@ -1270,6 +1291,8 @@ class ui(object):
# protectfinout() won't create more than one duplicated
streams,
# so we can just turn the redirection flag off.
self._finoutredirected = False
+ # clear cache which depends on isatty(fout)
+ self.__dict__.pop('_fout_needs_flush_per_line', None)
@contextlib.contextmanager
def protectedfinout(self):
@@ -1344,6 +1367,8 @@ class ui(object):
# auto-detection of things being formatted.
self.setconfig(b'ui', b'formatted', wasformatted,
b'pager')
self.setconfig(b'ui', b'interactive', False, b'pager')
+ # clear cache which depends on pageractive
+ self.__dict__.pop('_fout_needs_flush_per_line', None)
# If pagermode differs from color.mode, reconfigure color
now that
# pageractive is set.
diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py
--- a/mercurial/utils/procutil.py
+++ b/mercurial/utils/procutil.py
@@ -57,8 +57,7 @@ if isatty(stdout):
# Windows doesn't support line buffering
stdout = os.fdopen(stdout.fileno(), 'wb', 0)
elif not pycompat.ispy3:
- # on Python 3, stdout (sys.stdout.buffer) is already line
buffered and
- # buffering=1 is not handled in binary mode
+ # on Python 3, there's no way to make bytes stream
line-buffered
stdout = os.fdopen(stdout.fileno(), 'wb', 1)
if pycompat.iswindows:
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel