On Mon, 28 Jan 2013 19:40:17 +0100, Joerg Sonnenberger <jo...@britannica.bec.de> wrote:

On Mon, Jan 28, 2013 at 07:26:32PM +0100, j. v. d. hoff wrote:
On Mon, 28 Jan 2013 18:22:42 +0100, Joerg Sonnenberger
<jo...@britannica.bec.de> wrote:

>On Mon, Jan 28, 2013 at 06:09:57PM +0100, j. van den hoff wrote:
>>On Mon, 28 Jan 2013 17:34:44 +0100, Joerg Sonnenberger
>><jo...@britannica.bec.de> wrote:
>>
>>>On Mon, Jan 28, 2013 at 08:50:56AM -0500, Richard Hipp wrote:
>>>>The regular expression matching in
>>>>www.fossil-scm.org/fossil/artifact/c8fb75a1615f is also
>>>>lightweight and it
>>>>supports | and it is usually as fast or faster than grep in my tests
>>>>(though there are some cases for which grep is faster).  The
>>regexp.c in
>>>>fossil uses a NFA which gives worst case performance of O(NM)
>>where N is
>>>>the size of the input text to be matched and M is the size of
>>>>the regular
>>>>expression.  Perl regular expressions and lua-regexp.c take
>>exponential
>>>>time for some (admittedly obscure) regular expressions.  On the
>>>>other hand,
>>>>Perl regular expressions are more complete, and both Lua and
>>>>Perl allow you
>>>>to do substitutions, which the regexp.c file in Fossil does not.
>>>
>>>The Lua implementation starts to perform very badly as soon as you have
>>>wild cards at the beginning of the pattern. Those aren't even
>>obscure...
>>
>>just a guess: you did not "anchor" the pattern, i.e. you used
>>something like ".*foo" instead of "^.*foo"?
>
>Anchoring doesn't help if you want to explicitly match wild card chars
>at or near the beginning. With a NFA or DFA based implementation, it

I'm too slow: could you give an example where anchoring would come
in the way of what you really want to match?

You don't understand me. Anchoring helps, if you can use it to avoid

right. as I said: I'm too slow.

initial wild cards or limit the length of backtracking. It doesn't help
to avoid the exponential edge cases with .*foo patterns though.

I understand/know that ".*foo" will take (too) much time. my point/question is/was: "^.*foo" seemingly matches everything which is matched by ".*foo". (and if matching is greedy (is that the term?), than ".*foo" would also return exactly the same matching string as "^.*foo", making the two actually equivalent patterns, if I understand correctly).
but in any case if the question is simply
"match or no match?" (we are not talking about capturing subexpressions and doing substitutions but "grepping" functionality for fossil, right?), the anchored pattern could always serve as a dropin replacement of ".*foo", AFAICS (even if the reverse where not true).

this would not prevent,
that people run into the exponential run time problem when using the "naive" pattern instead the anchored one, but this could be explained by a FAQ entry making
the problem practically irrelevant. or do I still miss the relevant point?

j.


Joerg
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


--
Using Opera's revolutionary email client: http://www.opera.com/mail/
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to