Hi, all! Below is my variant how to patch Gin-Gist weights issue: 1. First of all I propose to shift from previously Gin's own TS_execute variant and leave only two: TS_execute with bool result and bool type callback and ternary TS_execute_recurse with ternary callback. I suppose all legacy consistent callers can still use bool via provided wrapper. 2. I integrated logic for indexes which do not support weights and positions inside (which gives MAYBE in certain cases on negation) inside previous TS_execute_recurse function called with additional flag for this class of indexes. 3. Check function for GIST and GIN now gives ternary result and is called with ternary type callback. I think in future nothing prevents smoothly shifting callback functions, check functions and even TS_execute result to ternary.
So I also send my variant patch for review and discussion. Regards, Pavel Borisov вс, 17 мая 2020 г. в 03:14, Tom Lane <t...@sss.pgh.pa.us>: > I wrote: > > I think the root of the problem is that if we have a query using > > weights, and we are testing tsvector data that lacks positions/weights, > > we can never say there's definitely a match. I don't see any decently > > clean way to fix this without redefining the TSExecuteCallback API > > to return a tri-state YES/NO/MAYBE result, because really we need to > > decide that it's MAYBE at the level of processing the QI_VAL node, > > not later on. I'd tried to avoid that in e81e5741a, but maybe we > > should just bite that bullet, and not worry about whether there's > > any third-party code providing its own TSExecuteCallback routine. > > codesearch.debian.net suggests that there are no external callers > > of TS_execute, so maybe we can get away with that. > > 0001 attached is a proposed patch that does it that way. Given the > API break involved, it's not quite clear what to do with this. > ISTM we have three options: > > 1. Ignore the API issue and back-patch. Given the apparent lack of > external callers of TS_execute, maybe we can get away with that; > but I wonder if we'd get pushback from distros that have automatic > ABI-break detectors in place. > > 2. Assume we can't backpatch, but it's still OK to slip this into > v13. (This option clearly has a limited shelf life, but I think > we could get away with it until late beta.) > > 3. Assume we'd better hold this till v14. > > I find #3 unduly conservative, seeing that this is clearly a bug > fix, but on the other hand #1 is a bit scary. Aside from the API > issue, it's not impossible that this has introduced some corner > case behavioral changes that we'd consider to be new bugs rather > than bug fixes. > > Anyway, some notes for reviewers: > > * The core idea of the patch is to make the TS_execute callbacks > have ternary results and to insist they return TS_MAYBE in any > case where the correct result is uncertain. > > * That fixes the bug at hand, and it also allows getting rid of > some kluges at higher levels. The GIN code no longer needs its > own TS_execute_ternary implementation, and the GIST code no longer > needs to suppose that it can't trust NOT results. > > * I put some effort into not leaking memory within tsvector_op.c's > checkclass_str and checkcondition_str. (The final output array > can still get leaked, I believe. Fixing that seems like material > for a different patch, and it might not be worth any trouble.) > > * The new test cases in tstypes.sql are to verify that we didn't > change behavior of the basic tsvector @@ tsquery code. There wasn't > any coverage of these cases before, and the logic for checkclass_str > without position info had to be tweaked to preserve this behavior. > > * The new cases in tsearch verify that the GIN and GIST code gives > the same results as the basic operator. > > Now, as for the 0002 patch attached: after 0001, the only TS_execute() > callers that are not specifying TS_EXEC_CALC_NOT are hlCover(), > which I'd already complained is probably a bug, and the first of > the two calls in tsrank.c's Cover(). It seems difficult to me to > argue that it's not a bug for Cover() to process NOT in one call > but not the other --- moreover, if there was any argument for that > once upon a time, it probably falls to the ground now that (a) we > have a less buggy implementation of NOT and (b) the presence of > phrase queries significantly raises the importance of not taking > short-cuts. Therefore, 0002 attached rips out the TS_EXEC_CALC_NOT > flag and has TS_execute compute NOT expressions accurately all the > time. > > As it stands, 0002 changes no regression test results, which I'm > afraid speaks more to our crummy test coverage than anything else; > tests that exercise those two functions with NOT-using queries > would easily show that there is a difference. > > Even if we decide to back-patch 0001, I would not suggest > back-patching 0002, as it's more nearly a definitional change > than a bug fix. But I think it's a good idea anyway. > > I'll stick this in the queue for the July commitfest, in case > anybody wants to review it. > > regards, tom lane > >
gin-gist-weight-patch-v3.diff
Description: Binary data