On Wed, Mar 12, 2014 at 8:02 PM, Heikki Linnakangas <hlinnakan...@vmware.com > wrote:
> On 02/26/2014 11:25 PM, Alexander Korotkov wrote: > >> On Thu, Feb 27, 2014 at 1:07 AM, Alexander Korotkov <aekorot...@gmail.com >> >wrote: >> >> On Thu, Feb 20, 2014 at 1:48 PM, Heikki Linnakangas < >>> hlinnakan...@vmware.com> wrote: >>> >>> On 02/09/2014 12:11 PM, Alexander Korotkov wrote: >>>> >>>> I've rebased catalog changes with last master. Patch is attached. I've >>>>> rerun my test suite with both last master ('committed') and attached >>>>> patch ('ternary-consistent'). >>>>> >>>>> >>>> Thanks! >>>> >>>> >>>> method | sum >>>> >>>>> ------------------------+------------------ >>>>> committed | 143491.715000001 >>>>> fast-scan-11 | 126916.111999999 >>>>> fast-scan-light | 137321.211 >>>>> fast-scan-light-heikki | 138168.028000001 >>>>> master | 446976.288 >>>>> ternary-consistent | 125923.514 >>>>> >>>>> I explain regression in last master by change of MAX_MAYBE_ENTRIES >>>>> from 8 >>>>> to 4. >>>>> >>>>> >>>> Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make >>>> sure we get similar behavior in Tomas' tests that used 6 search terms. >>>> But >>>> I always felt that it was too large for real queries, once we have the >>>> catalog changes, that's why I lowered to 4 when committing. If an >>>> opclass >>>> benefits greatly from fast scan, it should provide the ternary >>>> consistent >>>> function, and not rely on the shim implementation. >>>> >>>> >>>> I'm not sure about decision to reserve separate procedure number for >>>> >>>>> ternary consistent. Probably, it would be better to add ginConfig >>>>> method. >>>>> It would be useful for post 9.4 improvements. >>>>> >>>>> >>>> Hmm, it might be useful for an opclass to provide both, a boolean and >>>> ternary consistent function, if the boolean version is significantly >>>> more >>>> efficient when all the arguments are TRUE/FALSE. OTOH, you could also >>>> do a >>>> quick check through the array to see if there are any MAYBE arguments, >>>> within the consistent function. But I'm inclined to keep the >>>> possibility to >>>> provide both versions. As long as we support the boolean version at all, >>>> there's not much difference in terms of the amount of code to support >>>> having them both for the same opclass. >>>> >>>> A ginConfig could be useful for many other things, but I don't think >>>> it's >>>> worth adding it now. >>>> >>>> >>>> What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck? >>>> We discussed that earlier, but didn't reach any conclusion. That needs >>>> to >>>> be clarified in the docs. One possibility is to document that they're >>>> equivalent. Another is to forbid one of them. Yet another is to assign a >>>> different meaning to each. >>>> >>>> I've been thinking that it might be useful to define them so that a >>>> MAYBE >>>> result from the tri-consistent function means that it cannot decide if >>>> you >>>> have a match or not, because some of the inputs were MAYBE. And >>>> TRUE+recheck means that even if all the MAYBE inputs were passed as >>>> TRUE or >>>> FALSE, the result would be the same, TRUE+recheck. The practical >>>> difference >>>> would be that if the tri-consistent function returns TRUE+recheck, >>>> ginget.c >>>> wouldn't need to bother fetching the other entries, it could just return >>>> the entry with recheck=true immediately. While with MAYBE result, it >>>> would >>>> fetch the other entries and call tri-consistent again. ginget.c doesn't >>>> currently use the tri-consistent function that way - it always fetches >>>> all >>>> the entries for a potential match before calling tri-consistent, but it >>>> could. I had it do that in some of the patch versions, but Tomas' >>>> testing >>>> showed that it was a big loss on some queries, because the consistent >>>> function was called much more often. Still, something like that might be >>>> sensible in the future, so it might be good to distinguish those cases >>>> in >>>> the API now. Note that ginarrayproc is already using the return values >>>> like >>>> that: in GinContainedStrategy, it always returns TRUE+recheck >>>> regardless of >>>> the inputs, but in other cases it uses GIN_MAYBE. >>>> >>> >>> >>> Next revision of patch is attached. >>> >>> In this version opclass should provide at least one consistent function: >>> binary or ternary. It's expected to achieve best performance when opclass >>> provide both of them. However, tests shows opposite :( I've to recheck it >>> carefully. >>> >>> >> However, it's not! >> This revision of patch completes my test case in 123330 ms. This is >> slightly faster than previous revision. >> > > Great. Committed, I finally got around to it. > > Some minor changes: I reworded the docs and also updated the table of > support functions in xindex.sgml. I refactored the query in opr_sanity.sql, > to make it easier to read, and to check more carefully that the required > support functions are present. I also added a runtime check to avoid > crashing if neither is present. > > A few things we ought to still discuss: > > * I just noticed that the dummy trueTriConsistentFn returns GIN_MAYBE, > rather than GIN_TRUE. The equivalent boolean version returns 'true' without > recheck. Is that a typo, or was there some reason for the discrepancy? > Actually, there is not difference in current implementation, But I implemented it so that trueTriConsistentFn can correctly work with shimBoolConsistentFn. In this case it should return GIN_MAYBE in case when it have no GIN_MAYBE in the input (as analogue of setting recheck flag). So, it could return GIN_TRUE only if it checked that input has GIN_MAYBE. However, checking would be just wasting of cycles. So I end up with just GIN_MAYBE :-) > * Come to think of it, I'm not too happy with the name GinLogicValue. It's > too vague. It ought to include "ternary" or "tri-value" or something like > that. How about renaming it to "GinTernaryValue" or just "GinTernary"? Any > better suggestion for the name? > Probably we could call this just "TernaryValue" hoping that one day it would be useful outside of gin? * This patch added a triConsistent function for array and tsvector > opclasses. Were you planning to submit a patch to do that for the rest of > the opclasses, like pg_trgm? (it's getting awfully late for that...) Yes. I can try to get it into 9.4 if it's possible. ------ With best regards, Alexander Korotkov.