On Mon, Jan 4, 2021, at 04:49, Tom Lane wrote: >Over the holiday break I've been fooling with some regex performance >improvements.
Cool! I've also been fooling with regex performance over the years myself, not in the PostgreSQL code, but in general. More specifically, to first DFA-minimize the regex, and then to generate LLVMIR directly from the graph. Perhaps some of the ideas could be interesting to look at. Here is a live demo: https://compiler.org/reason-re-nfa/src/index.html One idea that I came up with myself is the "merge_linear" step, where when possible, multiple characters are read in the same operation. Not sure if other regex JIT engines does this, but it makes quite a difference for large regexes where you have long strings. Note: There is no support for capture groups, back-references, etc, but | + * () [] [^] works. /Joel