On Fri, Nov 15, 2024 at 11:42 PM Peter Eisentraut <[email protected]> wrote: > > On 15.11.24 05:26, jian he wrote: > > /* > > * Now build a substring of the text and try to match it against > > * the subpattern. t is the start of the text, t1 is one past the > > * last byte. We start with a zero-length string. > > */ > > t1 = t > > t1len = tlen; > > for (;;) > > { > > int cmp; > > CHECK_FOR_INTERRUPTS(); > > cmp = pg_strncoll(subpat, subpatlen, t, (t1 - t), locale); > > > > select '.foo.' LIKE '_oo' COLLATE ign_punct; > > pg_strncoll's iteration of the first 4 argument values. > > oo 2 foo. 0 > > oo 2 foo. 1 > > oo 2 foo. 2 > > oo 2 foo. 3 > > oo 2 foo. 4 > > > > seems there is a shortcut/optimization. > > if subpat don't have wildcard(percent sign, underscore) > > then we can have less pg_strncoll calls? > > How would you do that? You need to try all combinations to find one > that matches. >
we can optimize when trailing (last character) is not wildcards.
SELECT 'Ha12foo' LIKE '%foo' COLLATE ignore_accents;
within the for loop
for(;;)
{
int cmp;
CHECK_FOR_INTERRUPTS();
....
}
pg_strncoll comparison will become
Ha12foo foo
a12foo foo
12foo foo
2foo foo
foo foo
it's safe because in MatchText we have:
else if (*p == '%')
{
while (tlen > 0)
{
if (GETCHAR(*t, locale) == firstpat || (locale && !locale->deterministic))
{
int matched = MatchText(t, tlen, p, plen, locale);
if (matched != LIKE_FALSE)
return matched; /* TRUE or ABORT */
}
NextChar(t, tlen);
}
}
please check attached.
> > minimum case to trigger error within GenericMatchText
> > since no related tests.
> > create table t1(a text collate case_insensitive, b text collate "C");
> > insert into t1 values ('a','a');
> > select a like b from t1;
>
> This results in
>
> ERROR: 42P22: could not determine which collation to use for LIKE
> HINT: Use the COLLATE clause to set the collation explicitly.
>
> which is the expected behavior.
>
sorry, didn't mention it clearly, i mean we can add it to the regress test.
v8-0001-LIKE-with-nondeterministic-collations-no-trail.no-cfbot
Description: Binary data
