On Wed, 07 Apr 2010 18:03:47 -0700, Patrick Maupin wrote: > BTW, although I find it annoying when people say "don't do that" when > "that" is a perfectly good thing to do, and although I also find it > annoying when people tell you what not to do without telling you what > *to* do,
Grant did give a perfectly good solution. > and although I find the regex solution to this problem to be > quite clean, the equivalent non-regex solution is not terrible, so I > will present it as well, for your viewing pleasure: > > >>> [x for x in '# 1 Short offline Completed without error > 00%'.split(' ') if x.strip()] > ['# 1', 'Short offline', ' Completed without error', ' 00%'] This is one of the reasons we're so often suspicious of re solutions: >>> s = '# 1 Short offline Completed without error 00%' >>> tre = Timer("re.split(' {2,}', s)", ... "import re; from __main__ import s") >>> tsplit = Timer("[x for x in s.split(' ') if x.strip()]", ... "from __main__ import s") >>> >>> re.split(' {2,}', s) == [x for x in s.split(' ') if x.strip()] True >>> >>> >>> min(tre.repeat(repeat=5)) 6.1224789619445801 >>> min(tsplit.repeat(repeat=5)) 1.8338048458099365 Even when they are correct and not unreadable line-noise, regexes tend to be slow. And they get worse as the size of the input increases: >>> s *= 1000 >>> min(tre.repeat(repeat=5, number=1000)) 2.3496899604797363 >>> min(tsplit.repeat(repeat=5, number=1000)) 0.41538596153259277 >>> >>> s *= 10 >>> min(tre.repeat(repeat=5, number=1000)) 23.739185094833374 >>> min(tsplit.repeat(repeat=5, number=1000)) 4.6444299221038818 And this isn't even one of the pathological O(N**2) or O(2**N) regexes. Don't get me wrong -- regexes are a useful tool. But if your first instinct is to write a regex, you're doing it wrong. [quote] A related problem is Perl's over-reliance on regular expressions that is exaggerated by advocating regex-based solution in almost all O'Reilly books. The latter until recently were the most authoritative source of published information about Perl. While simple regular expression is a beautiful thing and can simplify operations with string considerably, overcomplexity in regular expressions is extremly dangerous: it cannot serve a basis for serious, professional programming, it is fraught with pitfalls, a big semantic mess as a result of outgrowing its primary purpose. Diagnostic for errors in regular expressions is even weaker then for the language itself and here many things are just go unnoticed. [end quote] http://www.softpanorama.org/Scripting/Perlbook/Ch01/ place_of_perl_among_other_lang.shtml Even Larry Wall has criticised Perl's regex culture: http://dev.perl.org/perl6/doc/design/apo/A05.html -- Steven -- http://mail.python.org/mailman/listinfo/python-list