Lars T. Kyllingstad wrote:
On Thu, 17 Jun 2010 21:44:03 -0700, Andrei Alexandrescu wrote:

There are currently two regexen in the standard library. The older one,
std.regexp, is time-tested but only works with UTF8 and has a clunkier
API. The newer one, std.regex, is newer and isolates the engine from the
matches (and therefore can reuse and cache engines easier), and supports
all character widths. But it's less tested and doesn't have that great
of an interface because it pretty much inherits the existing one.

I wish to improve regex handling in Phobos. The most important
improvement is not in the interface - it's in the engine. The current
engine is adequate but nothing to write home about, and for simple
regexen is markedly slower than equivalent hand-written code (e.g.
matching whitespace). One great opportunity would be for D to leverage
its uncanny compile-time evaluation abilities and offer a regex that
parses the pattern during compilation:

foreach (s; splitter(line, sregex!",[ \t\r]*")) { ... }

Such a static regex could be simpler than a full-blown regex with
captures and backreferences etc., but it would have guaranteed
performance (e.g. it would be an automaton instead of a backtracking
engine) and would be darn fast because it would generate custom code for
each regex pattern.

See related work:

http://google-opensource.blogspot.com/2010/03/re2-principled-approach-
to-regular.html
If we get as far as implementing what RE2 can do with compile-time
evaluation, people will definitely notice.

If there's anyone who'd want to tackle such a project (for Phobos or
not), I highly encourage you to do so.


There is the 'scregexp' project on dsource:

    http://www.dsource.org/projects/scregexp/

It's D1/Tango, but maybe it could be adapted to D2/Phobos? It would at least serve as a starting point for anyone wanting to try their hand at doing this.

scregexp includes the following requirement within the license:

"Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution."

That would need to be changed before inclusion in Phobos. It looks like
there are three people in the copyright notice: Walter Bright, Marton
Papp, and yidabu. Does anyone know Marton's email address?


Andrei

Reply via email to