New issue 2624: Weird performance on pypy3 when reading from a text-mode file
https://bitbucket.org/pypy/pypy/issues/2624/weird-performance-on-pypy3-when-reading
Nathaniel Smith:
This little benchmark tries to estimate the speed of `file.read` by comparing
`seek(0); read()` versus `seek(0)`. If the file's opened in binary mode, then
things makes sense and PyPy is fast – CPython does ~400 ns/(seek+read) and
PyPy3 does ~70 ns/(seek+read). OTOH if the file's opened in text mode, then
CPython does ~5000 ns/(seek+read) (which seems a bit silly but not
implausible), and PyPy3 requires ~18,000 ns/(seek+read), which seems to suggest
something has gone wrong.
Even weirder, I found that PyPy3's speed was stable for any individual file,
but if I switched to a different file then sometimes the speed would change
dramatically. Like `/etc/passwd` gives me ~18,000 ns/(seek+read), but
`/etc/fstab` gives me ~6,700 ns/(seek+read), consistently. All the files I
tried are plain ASCII, but maybe there's something weird about the pattern of
newlines or something.
Possibly this is expected because Python 3's IO stack is just too complicated
or something, but I found it surprising that such a small simple loop would be
slower than CPython.
```python
import time
#COUNT = 1000000
#f = open("/etc/passwd", "rb")
COUNT = 100000
f = open("/etc/passwd", "rt")
while True:
start = time.monotonic()
for _ in range(COUNT):
f.seek(0)
f.read(10)
between = time.monotonic()
for _ in range(COUNT):
f.seek(0)
end = time.monotonic()
both = (between - start) / COUNT * 1e6
seek = (end - between) / COUNT * 1e6
read = both - seek
print("{:.2f} µs/(seek+read), {:.2f} µs/seek, estimate ~{:.2f} µs/read"
.format(both, seek, read))
```
_______________________________________________
pypy-issue mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-issue