Is there any reasons for this why PointInsetQuery return NO_MORE_DOCS instead of NULL while NULL can terminate early.
hacker win7 hackersw...@gmail.com > On Sep 29, 2020, at 11:59, hacker win7 <hackersw...@gmail.com> wrote: > > Thanks Adrien Grand > > We store long numbers in our points, in our search service, most search > requests look like this: > > id-match AND range match AND string match AND …. (Clause count is high) > > Id is long number and we find most searches of id-match query have no points > at all but the subsequent SubQueries still evaluate match, there is some > extra cost for these searches which ought to be terminated early in id-match > SubQuery. > > This lead to our most search requests usually spend extra cost to response > > And we find that in string match style of TernWeight scorer() before return > scorer, in the getTermsEnum() it would call termsEnum.seekExact(term.bytes()) > to check the value is exists or not, if not exists the value then return null > and return null outer for TermWeight.scorer() > > This is confused to me that the string-match of TermQuery the soccer() return > null if the value is not exists. However, the id-match of PointInSetQuery the > scorer() dose not return null if the value is not exists, as you can see > because DocIdSetBuilder.build() would build a length=1 of NO_MOARE_DOCS of > array > > For this info, I check the earliest version of DocIdSetBuilder in the > LUCENE-5938, the build() can return null, as the comment says: “This method > may return <tt>null</tt> if no documents were added to this” > > I’m not sure why this changes. > > hacker win7 > hackersw...@gmail.com <mailto:hackersw...@gmail.com> > > > >> On Sep 28, 2020, at 21:06, Adrien Grand <jpou...@gmail.com >> <mailto:jpou...@gmail.com>> wrote: >> >> What are you storing in your points? If you are storing numbers, I wonder if >> a better approach to this problem might be to start leveraging >> IndexOrDocValuesQuery and scorerSupplier() for point-in-set queries like we >> did for range queries. >> >> The approach you suggested would help in some cases, but I'm a bit unhappy >> that it would be quite fragile, e.g. SubQuery1 AND SubQuery2 might become >> faster as we could save evaluating matches of SubQuery2 but SubQuery2 AND >> SubQuery1 would still be slow. >> >> On Mon, Sep 28, 2020 at 5:02 AM hacker win7 <hackersw...@gmail.com >> <mailto:hackersw...@gmail.com>> wrote: >> Hi Lucene developers, >> >> In Lucene-7.7.0, I find that in `PointInSetQuery.createWeight()`, and in the >> method `scorer()` after `values.intersect()`, if the `result.bitSet` is >> null, then the `result.build()` would use `concat()` to generate a Buffer >> and the length is 1. And the element of array is `NO_MORE_DOCS`. >> >> Why not return null in the method `scorer()` if `result.bitSet` is null ? >> >> In the following case: >> >> SubQuery1 AND SubQuery2 AND SubQuery3 …... >> >> BooleanWeight.scorerSupplier() -> >> >> the first subScorer of query is PointInSetQuery -> >> >> scorer() -> >> >> after intersect if result.bitSet is null, there is no specific point value >> at all then return null -> >> >> >> This will terminate early in the BooleanWeight.scoreSupplier() because >> subScore is null and the boolean clause is required >> >> The following SubQuery2, SubQuery3, SubQuery4 …. Need not to call >> scorerSupplier() to build scorer >> >> >> hacker win7 >> hackersw...@gmail.com <mailto:hackersw...@gmail.com> >> >> >> >> >> >> -- >> Adrien >