Re: [HACKERS] Future of our regular expression code

Tom Lane Sat, 18 Feb 2012 15:56:23 -0800

Dimitri Fontaine <[email protected]> writes:
> Tom Lane <[email protected]> writes:
>> Yeah ... if you *don't* know the difference between a DFA and an NFA,
>> you're likely to find yourself in over your head.  Having said that,


> So, here's a paper I found very nice to get started into this subject:
>   http://swtch.com/~rsc/regexp/regexp1.html

Yeah, I just found that this afternoon myself; it's a great intro.

If you follow the whole sequence of papers (there are 4) you'll find out
that this guy built a new regexp engine for Google, and these papers are
basically introducing/defending its design.  It turns out they've
released it under a BSD-ish license, so for about half a minute I was
thinking there might be a new contender for something we could adopt.
But there turn out to be at least two killer reasons why we won't:
* it's in C++ not C
* it doesn't support backrefs, as well as a few other features that
  maybe aren't as interesting but still would represent compatibility
  gotchas if they went away.
Too bad.  But the papers are well worth reading.  One thing I took away
from them is that it's possible to do capturing parens, though not
backrefs, without back-tracking.  Spencer's code treats both of those
features as "messy" (ie, slow, because they force use of the NFA-style
backtracking search code).  So there might be reason to reimplement
the parens-but-no-backrefs case using some ideas from these papers.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Future of our regular expression code

Reply via email to