TBH, the proposal sound like an overkill to me - IndexSearcher's concurrency should be good enough (unless you are searching a single large segment)
On Mon, 21 Jun 2021, 19:04 Adrien Grand, <[email protected]> wrote: > It should be possible to make something like this work. The main issue is > that Lucene has the expectation that a (Bulk)Scorer is consumed in the > thread where it was pulled, so this would require substantial changes to > how BooleanScorer currently operates I believe. > > I'd be curious to know why you are looking into this rather than passing > an Executor to IndexSearcher so that it can search segments concurrently. > Is it not providing enough concurrency for you? > > On Mon, Jun 21, 2021 at 9:46 AM Arihant Samar <[email protected]> wrote: > >> Hi, >> There is a function "ScoreWindowIntoBitSetAndReplay" in >> "BooleanScorer.java" which runs over all the scorers. >> I was wondering if we can use multi-threading here with numScorers >> threads. Anyways we are using a special OrCollector here which updates the >> matching array and the score in the buckets of 2048 docs. So we can use a >> Reentrant lock for synchronization in the collector. >> >> I just wanted reviews on this since I tried this and some tests were not >> passing. So if you could tell what is wrong in this approach, I >> would appreciate it. >> >> Thanking You in advance, >> Arihant. >> >> On Tue, 15 Jun 2021, 19:05 Adrien Grand, <[email protected]> wrote: >> >>> Glad it helped. :) >>> >>> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <[email protected]> wrote: >>> >>>> Thanks for this explanation Adrien! I'd been wondering about this a bit >>>> myself since seeing that DrillSideways also implements a TAAT approach (in >>>> addition to a doc-at-a-time approach). This really helps clear that up. >>>> Appreciate you taking the time to explain! >>>> >>>> Cheers, >>>> -Greg >>>> >>>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <[email protected]> wrote: >>>> >>>>> Hello Arihant, >>>>> >>>>> The Scorer for disjunctions uses a heap data structure that needs to >>>>> be reordered upon every hit. While reordering heaps is efficient as it >>>>> runs >>>>> in logarithmic time, the fact that it needs to run on every document might >>>>> add non-negligible overhead. BooleanScorer tries to work around this >>>>> overhead by scoring large windows of documents in a more TAAT >>>>> (term-at-a-time) fashion so that Lucene only needs to reorder the heap >>>>> every 2048 doc IDs (the hardcoded window size). >>>>> >>>>> This paper gives a bit more context: >>>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf, >>>>> see section 4 in particular. >>>>> >>>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi , >>>>>> >>>>>> I am new here . I would like to know what is the exact optimisation >>>>>> carried out in “Boolean Scorer.java” code which led to a separate class >>>>>> for >>>>>> resolving Boolean Queries in bulk documents. I could not find any >>>>>> material >>>>>> in the documentation for this as well, hence I decided to ask here. >>>>>> >>>>>> >>>>>> Thanking you in advance, >>>>>> >>>>>> Arihant. >>>>>> >>>>>> >>>>>> >>>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for >>>>>> Windows 10 >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Adrien >>>>> >>>> >>> >>> -- >>> Adrien >>> >> > > -- > Adrien >
