Re: [pcre-dev] JIT regression

ph10 Tue, 28 May 2019 04:09:51 -0700

On Mon, 27 May 2019, Zoltán Herczeg wrote:

> that is strategical difference. You don't know the input from the
> pattern, and your input has no a-d characters. The interpreter only
> searches 'a', while jit searches two characters: 'a' and 'd' which
> distance is two. The latter is more complicated, but works better for
> random input. You can see the difference here:


The interpreter searches for 'a' using mamchr(); if it finds 'a' it then 
does a second search for 'd' (again using memchr()) before running the 
match. If you set no_start_optimize, to disable these optimizations, 
there is a huge penalty. 

> ./pcre2test -tm
> PCRE2 version 10.34-RC1 2019-04-22
>   re> /abcd/
> data> \[012345678a]{2000}
> Match time 0.1659 milliseconds
> No match
> data>
>   re> /abcd/jit
> data> \[012345678a]{2000}
> Match time 0.0027 milliseconds
> No match

Thanks for posting that example. I've just noticed that an improvement
may be possible in the interpreter - the search for 'd' happens only if
the subject is quite short, because searching very long strings takes
time - or at least it does when memchr() is not used. It cannot be used
for 16-bit and 32-bit strings, and originally it was not used for 8-bit
strings. I will do some experiments to see if that restriction can be
lifted for 8-bit strings (and if it improves performance).

Philip

-- 
Philip Hazel
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] JIT regression

Reply via email to