Dimitri Fontaine <dimi...@2ndquadrant.fr> writes: > Tom Lane <t...@sss.pgh.pa.us> writes: >> Yeah ... if you *don't* know the difference between a DFA and an NFA, >> you're likely to find yourself in over your head. Having said that,
> So, here's a paper I found very nice to get started into this subject: > http://swtch.com/~rsc/regexp/regexp1.html Yeah, I just found that this afternoon myself; it's a great intro. If you follow the whole sequence of papers (there are 4) you'll find out that this guy built a new regexp engine for Google, and these papers are basically introducing/defending its design. It turns out they've released it under a BSD-ish license, so for about half a minute I was thinking there might be a new contender for something we could adopt. But there turn out to be at least two killer reasons why we won't: * it's in C++ not C * it doesn't support backrefs, as well as a few other features that maybe aren't as interesting but still would represent compatibility gotchas if they went away. Too bad. But the papers are well worth reading. One thing I took away from them is that it's possible to do capturing parens, though not backrefs, without back-tracking. Spencer's code treats both of those features as "messy" (ie, slow, because they force use of the NFA-style backtracking search code). So there might be reason to reimplement the parens-but-no-backrefs case using some ideas from these papers. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers