There was a Jira relating to GPU acceleration where it was mentioned that
Boolean Scorer has possibilities of GPU usage.
 So I was just checking first with multithreading in Java itself and
thought that this function may be amenable to parallelization.
Hence I was just giving it a try.
Will this not be useful if there are very long Boolean queries with a lot
of SHOULD clauses although I have no clue if this is a common situation.

I just need one more little help. Although some of the tests do give the
error Adrien mentioned that docs should be collected in the same thread
they were generated, but some tests also give wrong scores itself. Do you
see anything wrong in the synchronization I have done?
The synchronization I have done is basically creating an array of
matching.length size of Reentrant locks and just running the function
"ScoreWindowIntoBitSetAndReplay " with numScorer threads instead of the for
loop.
/// in BooleanScorer.java -> OrCollector -> collect function
Lock[idx].lock();
matching[idx] |= 1L << i;
final Bucket bucket = buckets[i];
bucket.freq++;
bucket.score += scorer.score();
Lock[idx].unlock();



On Mon, 21 Jun 2021 at 19:04, Adrien Grand <[email protected]> wrote:

> It should be possible to make something like this work. The main issue is
> that Lucene has the expectation that a (Bulk)Scorer is consumed in the
> thread where it was pulled, so this would require substantial changes to
> how BooleanScorer currently operates I believe.
>
> I'd be curious to know why you are looking into this rather than passing
> an Executor to IndexSearcher so that it can search segments concurrently.
> Is it not providing enough concurrency for you?
>
> On Mon, Jun 21, 2021 at 9:46 AM Arihant Samar <[email protected]> wrote:
>
>> Hi,
>> There is a function "ScoreWindowIntoBitSetAndReplay" in
>> "BooleanScorer.java" which runs over all the scorers.
>> I was wondering if we can use multi-threading here with numScorers
>> threads. Anyways we are using a special OrCollector here which updates the
>> matching array and the score in the buckets of 2048 docs. So we can use a
>> Reentrant lock for synchronization in the collector.
>>
>> I just wanted reviews on this since I tried this and some tests were not
>> passing. So if you could tell what is wrong in this approach, I
>> would appreciate it.
>>
>> Thanking You in advance,
>> Arihant.
>>
>> On Tue, 15 Jun 2021, 19:05 Adrien Grand, <[email protected]> wrote:
>>
>>> Glad it helped. :)
>>>
>>> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <[email protected]> wrote:
>>>
>>>> Thanks for this explanation Adrien! I'd been wondering about this a bit
>>>> myself since seeing that DrillSideways also implements a TAAT approach (in
>>>> addition to a doc-at-a-time approach). This really helps clear that up.
>>>> Appreciate you taking the time to explain!
>>>>
>>>> Cheers,
>>>> -Greg
>>>>
>>>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <[email protected]> wrote:
>>>>
>>>>> Hello Arihant,
>>>>>
>>>>> The Scorer for disjunctions uses a heap data structure that needs to
>>>>> be reordered upon every hit. While reordering heaps is efficient as it 
>>>>> runs
>>>>> in logarithmic time, the fact that it needs to run on every document might
>>>>> add non-negligible overhead. BooleanScorer tries to work around this
>>>>> overhead by scoring large windows of documents in a more TAAT
>>>>> (term-at-a-time) fashion so that Lucene only needs to reorder the heap
>>>>> every 2048 doc IDs (the hardcoded window size).
>>>>>
>>>>> This paper gives a bit more context:
>>>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>>>>> see section 4 in particular.
>>>>>
>>>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi ,
>>>>>>
>>>>>> I am new here . I would like to know what is the exact optimisation
>>>>>> carried out in “Boolean Scorer.java” code which led to a separate class 
>>>>>> for
>>>>>> resolving Boolean Queries in bulk documents. I could not find any 
>>>>>> material
>>>>>> in the documentation for this as well, hence I decided to ask here.
>>>>>>
>>>>>>
>>>>>> Thanking you in advance,
>>>>>>
>>>>>> Arihant.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>>>>> Windows 10
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Adrien
>>>>>
>>>>
>>>
>>> --
>>> Adrien
>>>
>>
>
> --
> Adrien
>

Reply via email to