Re: PointInSetQuery dose not terminate early if DocIdSetBuilder's bitSet is null

hacker win7 Fri, 16 Oct 2020 04:39:24 -0700

Is there any reasons for this why PointInsetQuery return NO_MORE_DOCS instead 
of NULL while NULL can terminate early.



hacker win7
hackersw...@gmail.com



> On Sep 29, 2020, at 11:59, hacker win7 <hackersw...@gmail.com> wrote:
> 
> Thanks Adrien Grand
> 
> We store long numbers in our points, in our search service, most search 
> requests look like this:
> 
> id-match AND range match AND string match AND …. (Clause count is high)
> 
> Id is long number and we find most searches of id-match query have no points 
> at all but the subsequent SubQueries still evaluate match, there is some 
> extra cost for these searches which ought to be terminated early in id-match 
> SubQuery.
> 
> This lead to our most search requests usually spend extra cost to response
> 
> And we find that in string match style of TernWeight scorer() before return 
> scorer, in the getTermsEnum() it would call termsEnum.seekExact(term.bytes()) 
> to check the value is exists or not, if not exists the value then return null 
> and return null outer for TermWeight.scorer()
> 
> This is confused to me that the string-match of TermQuery the soccer() return 
> null if the value is not exists. However, the id-match of PointInSetQuery the 
> scorer() dose not return null if the value is not exists, as you can see 
> because DocIdSetBuilder.build() would build a length=1 of NO_MOARE_DOCS of 
> array
> 
> For this info, I check the earliest version of DocIdSetBuilder in the 
> LUCENE-5938, the build() can return null, as the comment says: “This method 
> may return <tt>null</tt> if no documents were added to this”
> 
> I’m not sure why this changes.
> 
> hacker win7
> hackersw...@gmail.com <mailto:hackersw...@gmail.com>
> 
> 
> 
>> On Sep 28, 2020, at 21:06, Adrien Grand <jpou...@gmail.com 
>> <mailto:jpou...@gmail.com>> wrote:
>> 
>> What are you storing in your points? If you are storing numbers, I wonder if 
>> a better approach to this problem might be to start leveraging 
>> IndexOrDocValuesQuery and scorerSupplier() for point-in-set queries like we 
>> did for range queries.
>> 
>> The approach you suggested would help in some cases, but I'm a bit unhappy 
>> that it would be quite fragile, e.g. SubQuery1 AND SubQuery2 might become 
>> faster as we could save evaluating matches of SubQuery2 but SubQuery2 AND 
>> SubQuery1 would still be slow.
>> 
>> On Mon, Sep 28, 2020 at 5:02 AM hacker win7 <hackersw...@gmail.com 
>> <mailto:hackersw...@gmail.com>> wrote:
>> Hi Lucene developers,
>> 
>> In Lucene-7.7.0, I find that in `PointInSetQuery.createWeight()`, and in the 
>> method `scorer()` after `values.intersect()`, if the `result.bitSet` is 
>> null, then the `result.build()` would use `concat()` to generate a Buffer 
>> and the length is 1. And the element of array is `NO_MORE_DOCS`.
>> 
>> Why not return null in the method `scorer()` if `result.bitSet` is null ?
>> 
>> In the following case:
>> 
>> SubQuery1 AND SubQuery2 AND SubQuery3 …...
>> 
>> BooleanWeight.scorerSupplier() -> 
>> 
>> the first subScorer of query is PointInSetQuery -> 
>> 
>> scorer() ->
>> 
>> after intersect if result.bitSet is null, there is no specific point value 
>> at all then return null ->
>> 
>> 
>> This will terminate early in the BooleanWeight.scoreSupplier() because 
>> subScore is null and the boolean clause is required
>> 
>> The following SubQuery2, SubQuery3, SubQuery4 …. Need not to call 
>> scorerSupplier() to build scorer
>> 
>> 
>> hacker win7
>> hackersw...@gmail.com <mailto:hackersw...@gmail.com>
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Adrien
>

Re: PointInSetQuery dose not terminate early if DocIdSetBuilder's bitSet is null

Reply via email to