On Mon, 27 May 2019, Zoltán Herczeg wrote: > that is strategical difference. You don't know the input from the > pattern, and your input has no a-d characters. The interpreter only > searches 'a', while jit searches two characters: 'a' and 'd' which > distance is two. The latter is more complicated, but works better for > random input. You can see the difference here:
The interpreter searches for 'a' using mamchr(); if it finds 'a' it then does a second search for 'd' (again using memchr()) before running the match. If you set no_start_optimize, to disable these optimizations, there is a huge penalty. > ./pcre2test -tm > PCRE2 version 10.34-RC1 2019-04-22 > re> /abcd/ > data> \[012345678a]{2000} > Match time 0.1659 milliseconds > No match > data> > re> /abcd/jit > data> \[012345678a]{2000} > Match time 0.0027 milliseconds > No match Thanks for posting that example. I've just noticed that an improvement may be possible in the interpreter - the search for 'd' happens only if the subject is quite short, because searching very long strings takes time - or at least it does when memchr() is not used. It cannot be used for 16-bit and 32-bit strings, and originally it was not used for 8-bit strings. I will do some experiments to see if that restriction can be lifted for 8-bit strings (and if it improves performance). Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev