Re: [PATCH] fix GIN index search sometimes losing results

Pavel Borisov Sun, 17 May 2020 12:54:24 -0700

Hi, all!
Below is my variant how to patch Gin-Gist weights issue:
1. First of all I propose to shift from previously Gin's own TS_execute
variant and leave only two: TS_execute with bool result and bool type
callback and ternary TS_execute_recurse with ternary callback. I suppose
all legacy consistent callers can still use bool via provided wrapper.
2. I integrated logic for indexes which do not support weights and
positions inside (which gives MAYBE in certain cases on negation) inside
previous TS_execute_recurse function called with additional flag for this
class of indexes.
3. Check function for GIST and GIN now gives ternary result and is called
with ternary type callback. I think in future nothing prevents smoothly
shifting callback functions, check functions and even TS_execute result to
ternary.


So I also send my variant patch for review and discussion.

Regards,
Pavel Borisov

вс, 17 мая 2020 г. в 03:14, Tom Lane <t...@sss.pgh.pa.us>:

> I wrote:
> > I think the root of the problem is that if we have a query using
> > weights, and we are testing tsvector data that lacks positions/weights,
> > we can never say there's definitely a match.  I don't see any decently
> > clean way to fix this without redefining the TSExecuteCallback API
> > to return a tri-state YES/NO/MAYBE result, because really we need to
> > decide that it's MAYBE at the level of processing the QI_VAL node,
> > not later on.  I'd tried to avoid that in e81e5741a, but maybe we
> > should just bite that bullet, and not worry about whether there's
> > any third-party code providing its own TSExecuteCallback routine.
> > codesearch.debian.net suggests that there are no external callers
> > of TS_execute, so maybe we can get away with that.
>
> 0001 attached is a proposed patch that does it that way.  Given the
> API break involved, it's not quite clear what to do with this.
> ISTM we have three options:
>
> 1. Ignore the API issue and back-patch.  Given the apparent lack of
> external callers of TS_execute, maybe we can get away with that;
> but I wonder if we'd get pushback from distros that have automatic
> ABI-break detectors in place.
>
> 2. Assume we can't backpatch, but it's still OK to slip this into
> v13.  (This option clearly has a limited shelf life, but I think
> we could get away with it until late beta.)
>
> 3. Assume we'd better hold this till v14.
>
> I find #3 unduly conservative, seeing that this is clearly a bug
> fix, but on the other hand #1 is a bit scary.  Aside from the API
> issue, it's not impossible that this has introduced some corner
> case behavioral changes that we'd consider to be new bugs rather
> than bug fixes.
>
> Anyway, some notes for reviewers:
>
> * The core idea of the patch is to make the TS_execute callbacks
> have ternary results and to insist they return TS_MAYBE in any
> case where the correct result is uncertain.
>
> * That fixes the bug at hand, and it also allows getting rid of
> some kluges at higher levels.  The GIN code no longer needs its
> own TS_execute_ternary implementation, and the GIST code no longer
> needs to suppose that it can't trust NOT results.
>
> * I put some effort into not leaking memory within tsvector_op.c's
> checkclass_str and checkcondition_str.  (The final output array
> can still get leaked, I believe.  Fixing that seems like material
> for a different patch, and it might not be worth any trouble.)
>
> * The new test cases in tstypes.sql are to verify that we didn't
> change behavior of the basic tsvector @@ tsquery code.  There wasn't
> any coverage of these cases before, and the logic for checkclass_str
> without position info had to be tweaked to preserve this behavior.
>
> * The new cases in tsearch verify that the GIN and GIST code gives
> the same results as the basic operator.
>
> Now, as for the 0002 patch attached: after 0001, the only TS_execute()
> callers that are not specifying TS_EXEC_CALC_NOT are hlCover(),
> which I'd already complained is probably a bug, and the first of
> the two calls in tsrank.c's Cover().  It seems difficult to me to
> argue that it's not a bug for Cover() to process NOT in one call
> but not the other --- moreover, if there was any argument for that
> once upon a time, it probably falls to the ground now that (a) we
> have a less buggy implementation of NOT and (b) the presence of
> phrase queries significantly raises the importance of not taking
> short-cuts.  Therefore, 0002 attached rips out the TS_EXEC_CALC_NOT
> flag and has TS_execute compute NOT expressions accurately all the
> time.
>
> As it stands, 0002 changes no regression test results, which I'm
> afraid speaks more to our crummy test coverage than anything else;
> tests that exercise those two functions with NOT-using queries
> would easily show that there is a difference.
>
> Even if we decide to back-patch 0001, I would not suggest
> back-patching 0002, as it's more nearly a definitional change
> than a bug fix.  But I think it's a good idea anyway.
>
> I'll stick this in the queue for the July commitfest, in case
> anybody wants to review it.
>
>                         regards, tom lane
>
>

gin-gist-weight-patch-v3.diff
Description: Binary data

Re: [PATCH] fix GIN index search sometimes losing results

Reply via email to