Karan Sikka <karanssi...@gmail.com> writes: >> Having said that, I've had a bee in my bonnet for a long time about >> removing per-row setup cost for repetitive regex matches, and >> whatever infrastructure that needs would work for this too.
> What are the per-row setup costs for regex matches? Well, they're pretty darn high if you have more active regexps than will fit in that cache, and even if you don't, the cache lookup seems a bit inefficient. What I'd really like to do is get rid of that cache in favor of having a way to treat a precompiled regexp as a constant. I think this is probably possible via inventing a "regexp" datatype, which we make the declared RHS input type for the ~ operator, and give it an implicit cast from text so that existing queries don't break. The compiled regexp tree structure contains pointers so it could never go to disk, but now that we have the "expanded datum" infrastructure you could imagine that the on-disk representation is the same as text but we support adding a compiled tree to it in-memory. Or maybe we just need a smarter cache mechanism in regexp.c. A cache like that might be the only way to deal with a query using variable patterns (e.g, pattern argument coming from a table column). But it seems like basically the wrong approach for the common case of a constant pattern. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers