Hi, There is a function "ScoreWindowIntoBitSetAndReplay" in "BooleanScorer.java" which runs over all the scorers. I was wondering if we can use multi-threading here with numScorers threads. Anyways we are using a special OrCollector here which updates the matching array and the score in the buckets of 2048 docs. So we can use a Reentrant lock for synchronization in the collector.
I just wanted reviews on this since I tried this and some tests were not passing. So if you could tell what is wrong in this approach, I would appreciate it. Thanking You in advance, Arihant. On Tue, 15 Jun 2021, 19:05 Adrien Grand, <[email protected]> wrote: > Glad it helped. :) > > On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <[email protected]> wrote: > >> Thanks for this explanation Adrien! I'd been wondering about this a bit >> myself since seeing that DrillSideways also implements a TAAT approach (in >> addition to a doc-at-a-time approach). This really helps clear that up. >> Appreciate you taking the time to explain! >> >> Cheers, >> -Greg >> >> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <[email protected]> wrote: >> >>> Hello Arihant, >>> >>> The Scorer for disjunctions uses a heap data structure that needs to be >>> reordered upon every hit. While reordering heaps is efficient as it runs in >>> logarithmic time, the fact that it needs to run on every document might add >>> non-negligible overhead. BooleanScorer tries to work around this overhead >>> by scoring large windows of documents in a more TAAT (term-at-a-time) >>> fashion so that Lucene only needs to reorder the heap every 2048 doc IDs >>> (the hardcoded window size). >>> >>> This paper gives a bit more context: >>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf, >>> see section 4 in particular. >>> >>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <[email protected]> >>> wrote: >>> >>>> Hi , >>>> >>>> I am new here . I would like to know what is the exact optimisation >>>> carried out in “Boolean Scorer.java” code which led to a separate class for >>>> resolving Boolean Queries in bulk documents. I could not find any material >>>> in the documentation for this as well, hence I decided to ask here. >>>> >>>> >>>> Thanking you in advance, >>>> >>>> Arihant. >>>> >>>> >>>> >>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for >>>> Windows 10 >>>> >>>> >>>> >>> >>> >>> -- >>> Adrien >>> >> > > -- > Adrien >
