Re: Regular expressions

Christian Gollwitzer Thu, 05 Nov 2015 00:21:58 -0800

Am 05.11.15 um 06:59 schrieb ru...@yahoo.com:

Can you call yourself a well-rounded programmer without at least a basic
understanding of some regex library? Well, probably not. But that's part of
the problem with regexes. They have, to some degree, driven out potentially
better -- or at least differently bad -- pattern matching solutions, such
as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or
even alternative idioms, like Hypercard's "chunking" idioms.


Hmm, very good point.  I wonder why all those "potentially better"
solutions have not been more widely adopted?  A conspiracy by a
secret regex cabal?

I'm mostly on the pro-side of the regex discussion, but this IS a validpoint. regexes are not always a good way to express a pattern, even ifthe pattern is regular. The point is, that you can't build them upeasily piece-by-piece. Say, you want a regex like "first aninternational phone number, then a name, then a second phone number" -you will have to *repeat* the pattern for phone number twice. In morecomplex cases this can become a nightmare, like the monster that wasmentioned before to validate an email.


A better alternative, then, is PEG for example. You can easily write

pattern <- phone_number name phone_number
phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
name <-  [[:alpha:]]+

or something similar using a PEG parser. It has almost the samequantifiers as a Regex, is much more readable, runs in linear time overall inputs and can parse languages with the approximately the samecomplexity as the Knuth style parsers (LR(k) etc.), but withoutambiguity. I'm really astonished that PEG parsing is not bettersupported in the world of computing, instead most people choose to stickto the lexer+scanner combination

Finally, an anecdote from my "early" life of computing. In 1990, when Iwas 12 years old, I participated in an annual competition of computerscience for high school students. I was learning how to program withoutformal training, and solved one problem where a grammar was depicted asa flowchart and the task was to write parser for it, to check thevalidity of input strings. The grammar is depicted here (problem 1):


http://www.auriocus.de/StringKurs/RegEx/uebungen1.pdf

As a 12 year old, not knowing anything about pattern recognition, butthinking I was the king, as is usual for boys in that age, I sat downand manually constructed a recursive descent parser in a BASIC likelanguage. It had 1000 lines and took me a few weeks to get it correct.Finally the solution was accepted as working, but my participation wasrejected because the solutions lacked documentation. 16 years later Iused the problem for a course on string processing (that's what the PDFis for), and asked the students to solve it using regexes. My ownsolution consists of 67 characters, and it took me5 minutes to write itdown.

Admittedly, this problem is constructed, but solving similar tasks byregexes is still something that I need to do on a daily basis, when Iget data from other scientists in odd formats and I need to preprocessthem. I know people who use a spreadsheet and copy/paste millions ofdatapoints manually becasue they lack the knowledge of using such tools.


        Christian

--
https://mail.python.org/mailman/listinfo/python-list

Re: Regular expressions

Reply via email to