Andrew Dunstan wrote:


Tom Lane wrote:
Andrew Dunstan <[EMAIL PROTECTED]> writes:
... It turns out (according to the analysis) that the only time we actually need to use NextChar is when we are matching an "_" in a like/ilike pattern.

I thought we'd determined that advancing bytewise for "%" was also risky,
in two cases:

1. Multibyte character set that is not UTF8 (more specifically, does not
have a guarantee that first bytes and not-first bytes are distinct)

I thought we disposed of the idea that there was a problem with charsets that didn't do first byte special.

And Dennis said:

Tom Lane skrev:
You could imagine trying to do
% a byte at a time (and indeed that's what I'd been thinking it did)
but that gets you out of sync which breaks the _ case.

It is only when you have a pattern like '%_' when this is a problem and we could detect this and do byte by byte when it's not. Now we check (*p == '\\') || (*p == '_') in each iteration when we scan over characters for '%', and we could do it once and have different loops for the two cases.

That's pretty much what the patch does now - It never tries to match a single byte when it sees "_", whether or not preceeded by "%".

cheers

andrew




---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to