On Mon, 3 Aug 2009, Ronen Hod wrote: > I encountered serious performance issues so I had to do a quick fix > for my needs. It is not thread-safe, it has assumptions regarding the > number of states (which are sufficient for me), and I am not 100% sure > that I understand all the implications (maybe it can be optimized > further). My application as a whole runs 4x faster now, and I assume > that pcre_dfa_exec() is ~10x faster on my data. Attached is the code > with the changes. Enable/Disable them using: "#define > CRESCENDO_CHECK_FOR_DUPLICATES".
I have now studied your patch, and I understand how you are getting a speed up. Unfortunately, I cannot install the patch in PCRE because of the problems you mention: it is not thread-safe and it has assumptions about the number of states. I will try to think about other ways of speeding up the duplicate checking that are thread-safe and do not make any assumptions, though I have a feeling that this will not be easy. In the meantime, I hope you have read Jeffrey Friedl's book "Mastering Regular Expressions" and made sure that the patterns you are using are as optimised at possible. Thanks for posting your code and bringing this area of performance to my attention. Philip -- Philip Hazel -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
