Re: Boolean Scorer

Atri Sharma Mon, 21 Jun 2021 06:38:36 -0700

TBH, the proposal sound like an overkill to me - IndexSearcher's
concurrency should be good enough (unless you are searching a single large
segment)


On Mon, 21 Jun 2021, 19:04 Adrien Grand, <[email protected]> wrote:

> It should be possible to make something like this work. The main issue is
> that Lucene has the expectation that a (Bulk)Scorer is consumed in the
> thread where it was pulled, so this would require substantial changes to
> how BooleanScorer currently operates I believe.
>
> I'd be curious to know why you are looking into this rather than passing
> an Executor to IndexSearcher so that it can search segments concurrently.
> Is it not providing enough concurrency for you?
>
> On Mon, Jun 21, 2021 at 9:46 AM Arihant Samar <[email protected]> wrote:
>
>> Hi,
>> There is a function "ScoreWindowIntoBitSetAndReplay" in
>> "BooleanScorer.java" which runs over all the scorers.
>> I was wondering if we can use multi-threading here with numScorers
>> threads. Anyways we are using a special OrCollector here which updates the
>> matching array and the score in the buckets of 2048 docs. So we can use a
>> Reentrant lock for synchronization in the collector.
>>
>> I just wanted reviews on this since I tried this and some tests were not
>> passing. So if you could tell what is wrong in this approach, I
>> would appreciate it.
>>
>> Thanking You in advance,
>> Arihant.
>>
>> On Tue, 15 Jun 2021, 19:05 Adrien Grand, <[email protected]> wrote:
>>
>>> Glad it helped. :)
>>>
>>> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <[email protected]> wrote:
>>>
>>>> Thanks for this explanation Adrien! I'd been wondering about this a bit
>>>> myself since seeing that DrillSideways also implements a TAAT approach (in
>>>> addition to a doc-at-a-time approach). This really helps clear that up.
>>>> Appreciate you taking the time to explain!
>>>>
>>>> Cheers,
>>>> -Greg
>>>>
>>>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <[email protected]> wrote:
>>>>
>>>>> Hello Arihant,
>>>>>
>>>>> The Scorer for disjunctions uses a heap data structure that needs to
>>>>> be reordered upon every hit. While reordering heaps is efficient as it 
>>>>> runs
>>>>> in logarithmic time, the fact that it needs to run on every document might
>>>>> add non-negligible overhead. BooleanScorer tries to work around this
>>>>> overhead by scoring large windows of documents in a more TAAT
>>>>> (term-at-a-time) fashion so that Lucene only needs to reorder the heap
>>>>> every 2048 doc IDs (the hardcoded window size).
>>>>>
>>>>> This paper gives a bit more context:
>>>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>>>>> see section 4 in particular.
>>>>>
>>>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi ,
>>>>>>
>>>>>> I am new here . I would like to know what is the exact optimisation
>>>>>> carried out in “Boolean Scorer.java” code which led to a separate class 
>>>>>> for
>>>>>> resolving Boolean Queries in bulk documents. I could not find any 
>>>>>> material
>>>>>> in the documentation for this as well, hence I decided to ask here.
>>>>>>
>>>>>>
>>>>>> Thanking you in advance,
>>>>>>
>>>>>> Arihant.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>>>>> Windows 10
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Adrien
>>>>>
>>>>
>>>
>>> --
>>> Adrien
>>>
>>
>
> --
> Adrien
>

Re: Boolean Scorer

Reply via email to