On 11/03/2015 12:15 AM, Steven D'Aprano wrote: > On Tue, 3 Nov 2015 03:23 pm, rurpy wrote: > >> Regular expressions should be learned by every programmer or by anyone >> who wants to use computers as a tool. They are a fundamental part of >> computer science and are used in all sorts of matching and searching >> from compilers down to your work-a-day text editor. > > You are absolutely right. > > If only regular expressions weren't such an overly-terse, cryptic > mini-language, with all but no debugging capabilities, they would be great. > > If only there wasn't an extensive culture of regular expression abuse within > programming communities, they would be fine. > > All technologies are open to abuse. But we don't say: > > Some people, when confronted with a problem, think "I know, I'll use > arithmetic." Now they have two problems. > > because abuse of arithmetic is rare. It's hard to misuse it, and while > arithmetic can be complicated, it's rare for programmers to abuse it. But > the same cannot be said for regexes -- they are regularly misused, abused, > and down-right hard to use right even when you have a good reason for using > them: > > http://www.thedailywtf.com/articles/Irregular_Expression > > http://blog.codinghorror.com/regex-use-vs-regex-abuse/ > > http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html
Thanks for pointing out three cases of misuse of regexes out of the approximately 375000000 [*] uses of regexes in the wild. I hope you're not dumb enough to think that constitutes significant evidence. Even worse, of the three only one was a real example. One of the others was machine-generated code, the other was a "look what you can do with regexes" example, not serious code. Here is an example of "abusing" python https://benkurtovic.com/2014/06/01/obfuscating-hello-world.html I wouldn't use this as evidence that Python is to be avoided. > If there is one person who has done more to create a regex culture, it is > Larry Wall, inventor of Perl. Even Larry Wall says that regexes are > overused and their syntax is harmful, and he has recreated them for Perl 6: > > http://www.perl.com/pub/2002/06/04/apo5.html You really should have read beyond the first paragraph. He proposes fixing regexes by adding even more special character combinations and making regexes even *more* powerful. (He turned them into full-blown parsers.) Nowhere does he advocate not using, or avoiding if possible, regexes as is the mantra in this list. Here is Larry's "recreation" that you are touting: http://design.perl6.org/S05.html Please explain to us how you think this "fix" addresses the complaints you and other Python anti-regexers have about regexes. I hope you also noted Larry's tongue-in-cheek writing style. Right after pointing out that some claim Perl is hard to read due largely to regex syntax, he writes: "Funny that other languages have been borrowing Perl's regular expressions as fast as they can..." So I don't think you can claim Larry Wall as a supporter of this list's anti-regex attitude beyond some superficial verbiage taken out of context. > Oh, and the icing on the cake, regexes can be a security vulnerability too: > https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS And here is a list of CVEs involving Python. There are (at time of writing) 190 of them. http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=python So if a security vulnerability is reason not to use regexes, we should all be *running* from Python. I sure you'll point out that most have been fixed. But you failed to point out that same is true of regex engines. From your source: "Notice, that not all algorithms are naïve, and actually Regex algorithms can be written in an efficient way." And in fact, again, had you looked beyond a headline that suited your purpose, you could have tried the "Evil Regexes" noted in that source and discovered none of them are a DoS in Python. Even were that not true, normal practice applies: if the input is untrusted then sanitize it, or mitigate the threat by imposing a timeout, etc. Not exactly a problem or solution unique to regexes. And common sense should tell you that since there are a lot of "try a regex" web sites, this is not a problem without a solution. And *certainly* not a reason not to use them in the *far* more common case when they *are* trusted because you are in control of them, Finally, preemptively, I'll repeat I acknowledge regexs are not the the optimum solution in every case where they could be used. But they are very useful when one passes the border of the trivial; and they are nowhere near as bad as routinely portrayed here. ---- [*] Yes, I made that number up. -- https://mail.python.org/mailman/listinfo/python-list