Re: [PATCHES] UTF8MatchText

Andrew Dunstan Mon, 21 May 2007 06:36:20 -0700


[EMAIL PROTECTED] wrote:

Doh, you're right ... but on third thought, what happens with a pattern
containing "%_"?  If % tries to advance bytewise then we'll be trying to
apply NextChar in the middle of a data character, and bad things ensue.


Right, when you have '_' after a '%' you need to make sure the '%'
advances full characters. In my suggestion the test if '_' (or '\') come
after the '%' is done once and it select which of the two loops to use,
the one that do byte stepping or the one with NextChar.

It's difficult to know for sure that we have thought about all the corner
cases. I hope the gain is worth the effort.. :-)


Yes, I came to the same conclusion about how to restructure the code.

The current code contains this:

           while (tlen > 0)
           {
               /*
                * Optimization to prevent most recursion: don't recurse
                * unless first pattern char might match this text char.
                */
               if (CHAREQ(t, p) || (*p == '\\') || (*p == '_'))
               {
                   int         matched = MatchText(t, tlen, p, plen);

                   if (matched != LIKE_FALSE)
                       return matched; /* TRUE or ABORT */
               }

               NextChar(t, tlen);
           }

The code appears to date from v 1.23 of like.c way back in 2001. I'm notsure I agree with the comment, though. In the first place, the invarianttests should not be in the loop, I think, and I'll hoist them out asDennis suggests. But why are we doing that CHAREQ? If it succeeds we'lljust do it again when we recurse, I think.


cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [PATCHES] UTF8MatchText

Reply via email to