This is a good approach indeed, Lucene does this too. Le dim. 1 oct. 2023, 19:33, Walter Underwood <wun...@wunderwood.org> a écrit :
> At Infoseek, the engine checked the terms in frequency order, with the > most rare term first. If the conjunction reached zero matches at any point, > it stopped checking. > > This might be a related but more general approach. > > That was almost 30 years ago, so any patents are long-expired. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > On Oct 1, 2023, at 10:12 AM, Adrien Grand <jpou...@gmail.com> wrote: > > I agree that it would save work in that case, but this query should be > very fast anyway. > > On the other hand, if term1, term2 and term3 have 10M matches each, the > conjunction will need to check if the current candidate match is > NO_MORE_DOCS millions of times even though this would only happen once. > > In general it's better to have less overhead for expensive queries and > more overhead for cheap queries than the other way around. > > Le dim. 1 oct. 2023, 17:35, YouPeng Yang <yypvsxf19870...@gmail.com> a > écrit : > >> Hi Adrien >> suppose that conjunction query like (term1 AND term2 AND term3) ,if >> the term1 does not exist ,and then the loop execution may cause >> unnecessary overhead.(sorry I have not yet find out whether there is any >> filter work before the doNext().. >> >> Best Regard >> >> Adrien Grand <jpou...@gmail.com> 于2023年10月1日周日 22:30写道: >> >>> Hello, >>> >>> This change would be correct, but it would only save work when the >>> conjunction is exhausted, and add overhead otherwise? >>> >>> Le sam. 30 sept. 2023, 16:20, YouPeng Yang <yypvsxf19870...@gmail.com> >>> a écrit : >>> >>>> Hi >>>> I am reading the code of class ConjunctionDISI .and about the method >>>> nextDoc , Suppose that the sub-disi is emtpy in the lead1.lead2,should >>>> there be that it can return immediately when the input doc==NO_MORE_DOCS? >>>> >>>> >>>> private int doNext(int doc) throws IOException { >>>> advanceHead: >>>> for (; ; ) { >>>> assert doc == lead1.docID(); >>>> //assumpt doc==NO_MORE_DOCS ,it return >>>> if(doc==NO_MORE_DOCS){ >>>> return NO_MORE_DOCS; >>>> } >>>> // find agreement between the two iterators with the lower costs >>>> // we special case them because they do not need the >>>> // 'other.docID() < doc' check that the 'others' iterators need >>>> final int next2 = lead2.advance(doc); >>>> if (next2 != doc) { >>>> doc = lead1.advance(next2); >>>> if(doc==NO_MORE_DOCS){ >>>> return NO_MORE_DOCS; >>>> } >>>> if (next2 != doc) { >>>> continue; >>>> } >>>> } >>>> ..left omited... >>>> } >>>> >>> >