On Thu, 17 Jun 2010 21:44:03 -0700, Andrei Alexandrescu wrote: > There are currently two regexen in the standard library. The older one, > std.regexp, is time-tested but only works with UTF8 and has a clunkier > API. The newer one, std.regex, is newer and isolates the engine from the > matches (and therefore can reuse and cache engines easier), and supports > all character widths. But it's less tested and doesn't have that great > of an interface because it pretty much inherits the existing one. > > I wish to improve regex handling in Phobos. The most important > improvement is not in the interface - it's in the engine. The current > engine is adequate but nothing to write home about, and for simple > regexen is markedly slower than equivalent hand-written code (e.g. > matching whitespace). One great opportunity would be for D to leverage > its uncanny compile-time evaluation abilities and offer a regex that > parses the pattern during compilation: > > foreach (s; splitter(line, sregex!",[ \t\r]*")) { ... } > > Such a static regex could be simpler than a full-blown regex with > captures and backreferences etc., but it would have guaranteed > performance (e.g. it would be an automaton instead of a backtracking > engine) and would be darn fast because it would generate custom code for > each regex pattern. > > See related work: > > http://google-opensource.blogspot.com/2010/03/re2-principled-approach- to-regular.html > > If we get as far as implementing what RE2 can do with compile-time > evaluation, people will definitely notice. > > If there's anyone who'd want to tackle such a project (for Phobos or > not), I highly encourage you to do so.
There is the 'scregexp' project on dsource: http://www.dsource.org/projects/scregexp/ It's D1/Tango, but maybe it could be adapted to D2/Phobos? It would at least serve as a starting point for anyone wanting to try their hand at doing this. -Lars