On Apr 7, 9:51 pm, Steven D'Aprano <ste...@remove.this.cybersource.com.au> wrote:
> This is one of the reasons we're so often suspicious of re solutions: > > >>> s = '# 1 Short offline Completed without error 00%' > >>> tre = Timer("re.split(' {2,}', s)", > > ... "import re; from __main__ import s")>>> tsplit = Timer("[x for x in > s.split(' ') if x.strip()]", > > ... "from __main__ import s") > > >>> re.split(' {2,}', s) == [x for x in s.split(' ') if x.strip()] > True > > >>> min(tre.repeat(repeat=5)) > 6.1224789619445801 > >>> min(tsplit.repeat(repeat=5)) > > 1.8338048458099365 I will confess that, in my zeal to defend re, I gave a simple one- liner, rather than the more optimized version: >>> from timeit import Timer >>> s = '# 1 Short offline Completed without error 00%' >>> tre = Timer("splitter(s)", ... "import re; from __main__ import s; splitter = re.compile(' {2,}').split") >>> tsplit = Timer("[x for x in s.split(' ') if x.strip()]", ... "from __main__ import s") >>> min(tre.repeat(repeat=5)) 1.893190860748291 >>> min(tsplit.repeat(repeat=5)) 2.0661051273345947 You're right that if you have an 800K byte string, re doesn't perform as well as split, but the delta is only a few percent. >>> s *= 10000 >>> min(tre.repeat(repeat=5, number=1000)) 15.331652164459229 >>> min(tsplit.repeat(repeat=5, number=1000)) 14.596404075622559 Regards, Pat -- http://mail.python.org/mailman/listinfo/python-list