Am 05.11.15 um 06:59 schrieb ru...@yahoo.com:
Can you call yourself a well-rounded programmer without at least a basic
understanding of some regex library? Well, probably not. But that's part of
the problem with regexes. They have, to some degree, driven out potentially
better -- or at least differently bad -- pattern matching solutions, such
as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or
even alternative idioms, like Hypercard's "chunking" idioms.
Hmm, very good point. I wonder why all those "potentially better"
solutions have not been more widely adopted? A conspiracy by a
secret regex cabal?
I'm mostly on the pro-side of the regex discussion, but this IS a valid
point. regexes are not always a good way to express a pattern, even if
the pattern is regular. The point is, that you can't build them up
easily piece-by-piece. Say, you want a regex like "first an
international phone number, then a name, then a second phone number" -
you will have to *repeat* the pattern for phone number twice. In more
complex cases this can become a nightmare, like the monster that was
mentioned before to validate an email.
A better alternative, then, is PEG for example. You can easily write
pattern <- phone_number name phone_number
phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
name <- [[:alpha:]]+
or something similar using a PEG parser. It has almost the same
quantifiers as a Regex, is much more readable, runs in linear time over
all inputs and can parse languages with the approximately the same
complexity as the Knuth style parsers (LR(k) etc.), but without
ambiguity. I'm really astonished that PEG parsing is not better
supported in the world of computing, instead most people choose to stick
to the lexer+scanner combination
Finally, an anecdote from my "early" life of computing. In 1990, when I
was 12 years old, I participated in an annual competition of computer
science for high school students. I was learning how to program without
formal training, and solved one problem where a grammar was depicted as
a flowchart and the task was to write parser for it, to check the
validity of input strings. The grammar is depicted here (problem 1):
http://www.auriocus.de/StringKurs/RegEx/uebungen1.pdf
As a 12 year old, not knowing anything about pattern recognition, but
thinking I was the king, as is usual for boys in that age, I sat down
and manually constructed a recursive descent parser in a BASIC like
language. It had 1000 lines and took me a few weeks to get it correct.
Finally the solution was accepted as working, but my participation was
rejected because the solutions lacked documentation. 16 years later I
used the problem for a course on string processing (that's what the PDF
is for), and asked the students to solve it using regexes. My own
solution consists of 67 characters, and it took me5 minutes to write it
down.
Admittedly, this problem is constructed, but solving similar tasks by
regexes is still something that I need to do on a daily basis, when I
get data from other scientists in odd formats and I need to preprocess
them. I know people who use a spreadsheet and copy/paste millions of
datapoints manually becasue they lack the knowledge of using such tools.
Christian
--
https://mail.python.org/mailman/listinfo/python-list