Am 05.11.15 um 06:59 schrieb ru...@yahoo.com:
Can you call yourself a well-rounded programmer without at least a basic
understanding of some regex library? Well, probably not. But that's part of
the problem with regexes. They have, to some degree, driven out potentially
better -- or at least differently bad -- pattern matching solutions, such
as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or
even alternative idioms, like Hypercard's "chunking" idioms.

Hmm, very good point.  I wonder why all those "potentially better"
solutions have not been more widely adopted?  A conspiracy by a
secret regex cabal?

I'm mostly on the pro-side of the regex discussion, but this IS a valid point. regexes are not always a good way to express a pattern, even if the pattern is regular. The point is, that you can't build them up easily piece-by-piece. Say, you want a regex like "first an international phone number, then a name, then a second phone number" - you will have to *repeat* the pattern for phone number twice. In more complex cases this can become a nightmare, like the monster that was mentioned before to validate an email.

A better alternative, then, is PEG for example. You can easily write

pattern <- phone_number name phone_number
phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
name <-  [[:alpha:]]+

or something similar using a PEG parser. It has almost the same quantifiers as a Regex, is much more readable, runs in linear time over all inputs and can parse languages with the approximately the same complexity as the Knuth style parsers (LR(k) etc.), but without ambiguity. I'm really astonished that PEG parsing is not better supported in the world of computing, instead most people choose to stick to the lexer+scanner combination

Finally, an anecdote from my "early" life of computing. In 1990, when I was 12 years old, I participated in an annual competition of computer science for high school students. I was learning how to program without formal training, and solved one problem where a grammar was depicted as a flowchart and the task was to write parser for it, to check the validity of input strings. The grammar is depicted here (problem 1):

http://www.auriocus.de/StringKurs/RegEx/uebungen1.pdf

As a 12 year old, not knowing anything about pattern recognition, but thinking I was the king, as is usual for boys in that age, I sat down and manually constructed a recursive descent parser in a BASIC like language. It had 1000 lines and took me a few weeks to get it correct. Finally the solution was accepted as working, but my participation was rejected because the solutions lacked documentation. 16 years later I used the problem for a course on string processing (that's what the PDF is for), and asked the students to solve it using regexes. My own solution consists of 67 characters, and it took me5 minutes to write it down.

Admittedly, this problem is constructed, but solving similar tasks by regexes is still something that I need to do on a daily basis, when I get data from other scientists in odd formats and I need to preprocess them. I know people who use a spreadsheet and copy/paste millions of datapoints manually becasue they lack the knowledge of using such tools.

        Christian

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to