Re: Degenerate Regex Case

2015-04-26 Thread Guillaume via Digitalmars-d-learn
On Saturday, 25 April 2015 at 09:30:55 UTC, Dmitry Olshansky 
wrote:


A quick investigation shows that it gets stuck at the end of 
pattern compilation stage.


The problem is that as a last pass D's regex goes to optimize 
the pattern to construct simple bit-scanning engine as 
approximation for prefix of original pattern. And that process 
is a lot like Thompson NFA ... _BUT_ the trick of merging 
equivalent threads wasn't applied there.


So in short: file a bug, optimizer absolutely should do 
de-duplication of threads.


---
Dmitry Olshansky


Thanks for your help, I'll go file a bug.


Re: Degenerate Regex Case

2015-04-25 Thread Dmitry Olshansky via Digitalmars-d-learn

On Friday, 24 April 2015 at 18:28:16 UTC, Guillaume wrote:
Hello, I'm trying to make a regex comparison with D, based off 
of this article: https://swtch.com/~rsc/regexp/regexp1.html


I've written my code like so:

import std.stdio, std.regex;

void main(string argv[]) {

string m = argv[1];
	auto p = 
ctRegex!(a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaa);

if (match(m, p)) {
writeln(match);
} else {
writeln(no match);
}

}




And the compiler goes into swap. Doing it at runtime is no 
better. I was under the impression that this particular regex 
was used for showcasing the Thompson NFA which D claims to be 
using.




A quick investigation shows that it gets stuck at the end of 
pattern compilation stage.


The problem is that as a last pass D's regex goes to optimize the 
pattern to construct simple bit-scanning engine as approximation 
for prefix of original pattern. And that process is a lot like 
Thompson NFA ... _BUT_ the trick of merging equivalent threads 
wasn't applied there.


So in short: file a bug, optimizer absolutely should do 
de-duplication of threads.



The golang code version of this runs fine, which makes me think 
that maybe D isn't using the correct regex engine for this 
particular regex. Or perhaps I'm using this wrong?


It uses 2 kinds of engines, run-time one is Thompson NFA. 
Compile-time is (for now) still backtracking.


---
Dmitry Olshansky


Re: Degenerate Regex Case

2015-04-24 Thread TheFlyingFiddle via Digitalmars-d-learn

On Friday, 24 April 2015 at 18:28:16 UTC, Guillaume wrote:
Hello, I'm trying to make a regex comparison with D, based off 
of this article: https://swtch.com/~rsc/regexp/regexp1.html


I've written my code like so:

import std.stdio, std.regex;

void main(string argv[]) {

string m = argv[1];
	auto p = 
ctRegex!(a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaa);

if (match(m, p)) {
writeln(match);
} else {
writeln(no match);
}

}

And the compiler goes into swap. Doing it at runtime is no 
better. I was under the impression that this particular regex 
was used for showcasing the Thompson NFA which D claims to be 
using.


The regex 
a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaa 
can be simplified to a{30,60} (if i counted correctly).


The regex a{30,60} works fine.

[Speculation]
I don't have a good understanding of how D's regex engine work 
but I am guessing that it does not do any simplification of the 
regex input causing it to generate larger engines for each 
additional ? symbol. Thus needing more memory. Eventually as in 
this case the compiler runs out of memory.