Re: FReD is ready for review

Dmitry Olshansky Tue, 13 Sep 2011 15:25:29 -0700

On 14.09.2011 1:55, Vladimir Panteleev wrote:

On Wed, 14 Sep 2011 00:49:47 +0300, Dmitry Olshansky
<dmitry.o...@gmail.com> wrote:

However when it comes to some serious business like matching charsets
and doing ambiguous control flow it starts losing to more superior
design of the default engine.


Ah, okay. I thought compile-time regex was the same as the default
engine, but without the runtime parsing and bytecode interpretation(?).

Let me explain a bit. The default as chosen is fairly advanced NFAengine which features:- never backtracks and consequently decodes each UTF8/16/32 characteronly once;- as side effect doesn't support fully combinatorial backreferences(why would someone use 'em anyway);

        - guaranteed O(N*M) time on input with M being "size" of pattern;

This also opens up matching directly on buffered streams... but back tothe point.

There is also a slower traditional backtracking engine, it still fasterthen many other engines out there but it ebbs out on some patterns like.*.*.*Z in text with no 'Z', precisely because of combinatorial natureof backtracking.

Then C-T is built on top of backtracking, just because it was mucheasier this way :)It's indeed replacing bytecode interpretations with hardcoded code whichis exceptionally fast, but the prime time consumer in real patterns isusually this memoize/return. C-T regex is still a bit experimentalthough and going to be improved over time.

Last but not least there is also completely dumb (as in dead simple andfast) kickstart engine that does all of heavy lifting iff pattern startswith something more or less constant (e.g. it does 50% of job in regex-dna).

And you can just parse at compile time and use default run-time enginefor matching:

enum r = regex("whatever");


--
Dmitry Olshansky

Re: FReD is ready for review

Reply via email to