Michael Fox added the comment: I looked into it a little and it looks like pyliblzma is a pure C extension whereas new lzma library wraps liblzma but the rest is python. In particular this happens for every line:
if size < 0: end = self._buffer.find(b"\n", self._buffer_offset) + 1 if end > 0: line = self._buffer[self._buffer_offset : end] self._buffer_offset = end self._pos += len(line) return line And while that doesn't look like a lot of overhead, it's definitely something. So, unless someone thinks that a pure C extension is the right technical direction, lzma in 3.4 is probably as fast as it's ever going to be. I will just use the workaround of piping in unxz regardless. On Sat, May 18, 2013 at 2:12 PM, Michael Fox <415...@gmail.com> wrote: > 3.4 is much better but still 4x slower than 2.7 > > m@air:~/q/topaz/parse_datalog$ time python2.7 lzmaperf.py > 102368 > > real 0m0.053s > user 0m0.052s > sys 0m0.000s > m@air:~/q/topaz/parse_datalog$ time > ~/tmp/cpython-23836f17e4a2/bin/python3.4 lzmaperf.py > 102368 > > real 0m0.229s > user 0m0.212s > sys 0m0.012s > > The bottleneck has moved here: > 102369 0.151 0.000 0.226 0.000 lzma.py:333(readline) > > I don't know if this is a strictly fair comparison. The lzma module > and pyliblzma may not be of the same quality. I've just come across a > real bug in pyliblzma. It doesn't apply to this test, but who knows > what shortcuts it's taking. > > Finally, here's a baseline: > > m@air:~/q/topaz/parse_datalog$ time xzcat bigfile.xz | wc -l > 102368 > > real 0m0.034s > user 0m0.024s > sys 0m0.016s > > On Sat, May 18, 2013 at 12:46 PM, Nadeem Vawda <rep...@bugs.python.org> wrote: >> >> Nadeem Vawda added the comment: >> >> Have you tried running the benchmark against the default (3.4) branch? >> There was some significant optimization work done in issue 16034, but >> the changes were not backported to 3.3. >> >> ---------- >> >> _______________________________________ >> Python tracker <rep...@bugs.python.org> >> <http://bugs.python.org/issue18003> >> _______________________________________ > > > > -- > > - > Michael -- - Michael ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18003> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com