Steve Newcomb added the comment: Oops. The correct url is sftp://coolheads.com/files/py-re-perform-276v2712/
On 09/01/2016 04:52 PM, Steve Newcomb wrote: > On 08/30/2016 12:46 PM, Raymond Hettinger wrote: >> Raymond Hettinger added the comment: >> >> It would be helpful if you ... make a small set of regular >> expressions that demonstrate the performance regression. >> > Done. Attachments: > > test.py : Code that exercises re.sub() and outputs a profile report. > > test_output_2.7.6.txt : Output of test.py under Python 2.7.6. > > test_output_2.7.12.txt : Output of test.py under Python 2.7.12. > > p17.188.htm -- test data: public information from the U.S. Internal > Revenue Service. > > Equivalent hardware was used in both cases. > > The outputs show that 2.7.12's re.sub() takes 1.2 times as long as > 2.7.6's. It's a significant difference, but... > > ...it was not the dramatic degradation I expected to find in this > exercise. Therefore I attempted to tease what I was looking for out > of the profile stats I already uploaded to this site, made from actual > production runs. My attempts are all found in an hg repository that > can be downloaded from > sftp://s...@coolheads.com//files/py-re-perform-276-2712 using password > bysIe20H . > > I do not feel the latter work took me where I wanted to go, and I > think the reason is that, at least for purposes of our application, > Python 2.7.12 has been so extensively refactored since Python 2.7.6. > So it's an apples-to-oranges comparison, apparently. Still, the > performance difference for re.sub() is quite dramatic , and re.sub() > is the only comparable function whose performance dramatically > worsened: in our application, 2.7.12's re.sub() takes 3.04 times as > long as 2.7.6's. > > The good news, of course, is that by and large the performance of the > other *comparable* functions largely improved, often dramatically. > But at least in our application, it doesn't come close to making up > for the degradation in re.sub(). > > My by-the-gut bottom line: somebody who really knows the re module > should take a deep look at re.sub(). Why would re.sub(), unlike all > others, take so much longer to run, while *every* other function in > the re module get (often much) faster? It feels like there's a bug > somewhere in re.sub(). > > Steve Newcomb > ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27898> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com