[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-09-05 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-528262274
 
 
   Thank you for merging and reviewing, @jpountz !
   and thank you @jimczi  and @mikemccand for reviewing!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-09-05 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-528237811
 
 
   Thanks for approving, @mikemccand !
   
   @jpountz , @jimczi Does this look ok to merge now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-09-04 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-527765667
 
 
   Folks, post the discussion above, I am assuming this is ready to merge?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-09-03 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-527515974
 
 
   
   > In the unsorted index case, where we skip by impacts once we collect more 
than the 1000 by default, we are also still correct because we continue 
collecting in that slice, just skipping by impact based on that thread's 
private PQ bottom. We can make further improvements e.g. to share the global PQ 
bottom across all searcher threads, but that should come later.
   
   Yes, that is on the line. Next up will be a PR with shared global PQ :)
   
   > 
   > So net/net I think the change is correct, and should be a big performance 
gain for concurrent searching. Sorry for the confusion ;)
   
   No sweat, thank you for reviewing!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-09-03 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-527464137
 
 
   @jpountz I updated the PR per your comments -- please take a look and let me 
know if it seems fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-09-02 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-527285002
 
 
   > > https://issues.apache.org/jira/browse/LUCENE-8681
   > 
   > I am not sure I see how that solves the problem?
   > 
   > The core change targeted in this PR is allowing `TOTAL_HITS_THRESHOLD` to 
be accurately counted across all slices. Today, we collect 
`TOTAL_HITS_THRESHOLD` per slice, which is not what the API definition is. Post 
this PR, we will collect `TOTAL_HITS_THRESHOLD` in aggregate. 
`TOTAL_HITS_THRESHOLD`' s definition does not guarantee any order of collection 
of hits in the concurrent case -- we inadvertently define one today by 
collection the threshold number of hits per slice.
   > 
   > RE: Proration, I believe that is a custom logic that can be added on top 
of this change. In any case, the proration logic also works on a bunch of 
static values + fudge factors, so it can go wrong and we might end up 
collecting lesser hits from a more valuable segment. To help prevent this 
scenario, I believe proration might also do well to build upon this PR and use 
the shared counter. But, I am unable to see why proration and accurate counting 
across slices are mutually exclusive.
   > 
   > In any case, unlike proration, this PR does not propose any algorithmic 
changes to the way collection is done -- it simply reduces extra work done 
across slices that we do not even advertise today, so might be something that 
the user is unaware of.
   
   To summarize my monologue, this PR is aimed at accurate counting of hits 
across all slices -- whereas proration targets a different use case of  trying 
to "distributing" hits across slices based on some parameters.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-09-02 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-527284505
 
 
   > https://issues.apache.org/jira/browse/LUCENE-8681
   
   I am not sure I see how that solves the problem?
   
   The core change targeted in this PR is allowing `TOTAL_HITS_THRESHOLD` to be 
accurately counted across all slices. Today, we collect `TOTAL_HITS_THRESHOLD` 
per slice, which is not what the API definition is. Post this PR, we will 
collect `TOTAL_HITS_THRESHOLD` in aggregate. `TOTAL_HITS_THRESHOLD`' s 
definition does not guarantee any order of collection of hits in the concurrent 
case -- we inadvertently define one today by collection the threshold number of 
hits per slice.
   
   RE: Proration, I believe that is a custom logic that can be added on top of 
this change. In any case, the proration logic also works on a bunch of static 
values + fudge factors, so it can go wrong and we might end up collecting 
lesser hits from a more valuable segment. To help prevent this scenario, I 
believe proration might also do well to build upon this PR and use the shared 
counter. But, I am unable to see why proration and accurate counting across 
slices are mutually exclusive.
   
   In any case, unlike proration, this PR does not propose any algorithmic 
changes to the way collection is done -- it simply reduces extra work done 
across slices that we do not even advertise today, so might be something that 
the user is unaware of.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-09-02 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-527109501
 
 
   > > I think the output I pasted above does mention the tasks run?
   > 
   > Hmm the first few `luceneutil` results you posted didn't seem to state 
which task (`wikimedium10k`)? Or maybe I missed it, sorry ... in general when 
posting benchmarks it's vital to give enough details and use/share public 
benchmark tools/data (like `luceneutil` and `wikimedia`'s corpus) so that 
someone else could go and replicate your results.
   
   +1, thanks for the input -- will ensure that henceforth.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-31 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-526845639
 
 
   > I'm trying to understand the behavior change Lucene users will see with 
this, when using concurrent searching for one query (passing `ExecutorService` 
to `IndexSearcher`):
   > 
   > It looks like with the change such users will see their search precisely 
when the total collected hits exceeds the limit (1000 by default?), versus 
today where we will try to collect 1000 per segment and then reduce that to the 
top 1000 overall? So this means the results will change depending on thread 
execution/timing?
   
   Looking at the documentation around `TOTAL_HITS_THRESHOLD`, I see that it 
intends to restrict the number of documents scored in total before the query is 
early terminated. If we do a single threaded search today, that is the behavior 
we get. However, for concurrent search, we actually look at N * 
`TOTAL_HITS_THRESHOLD`, where N is the number of slices. So, I believe that we 
are not doing the advertised behavior for concurrent searches in the status 
quo. This change should fix that.
   
   However, you are correct that thread timing will come into play here -- 
different slices may have different contributions to the overall number of 
hits. However, since we are anyways not scoring all documents, I do not believe 
we offer any guarantees on the documents that we return -- even today, the best 
documents might be the ones which just came in and hence are on the last 
segments to be traversed, so never even get looked. WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-31 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-526845091
 
 
   > Please don't run "real" benchmarks with `wikimedium10k` -- that may be 
fast to execute but the results are nonsense since dominant time is query 
setup, versus visiting postings lists that you'd see with `wikimediumall`. For 
real results, use `wikimediumall` or `wikibigall`.
   
   Thanks for highlighting that.
   
   >  Also, always report which task you ran when you post results :)
   
   I think the output I pasted above does mention the tasks run?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-30 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-526540019
 
 
   Thanks @jimczi !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-29 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-526179514
 
 
   @jimczi Updated per our discussion. Although, a user can potentially 
implement `HitsThresholdChecker` and give his custom limits, but that would be 
an overkill for just a different value for a limit, so agree with your 
proposal. Please let me know your thoughts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-29 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-526164337
 
 
   > Package protected might not be a good idea after all, sorry, users that 
wants to set the `totalHitsThreshold` different than the default would need to 
access the `GlobalHitsThresholdChecker`. Maybe we can have a single class and 
expose two static functions that creates anonymous impl: `create(int)` and 
`createShared(int)` ? This way users can still modify totalHitsThreshold and 
share it if they use the managed search ?
   
   If I understood correctly, the class would be public and return instances of 
local checker or global checker with the threshold set as the parameter passed 
in to the static methods?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-29 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-526157715
 
 
   @jimczi Updated


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-29 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-526152703
 
 
   Ran precommit on the latest iteration -- came in clean.
   
   @jimczi Looks good to be committed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-29 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-526122213
 
 
   > > Also, I believe having the abstraction will let future implementations 
customize the threshold logic without needing to make any core changes or 
introduce a new collector, hence we should let the threshold checker 
implementations be public. WDYT?
   > 
   > What kind of customization you have in mind ? 
   
   Since this is now a `BooleanSupplier`, collectors can choose when to early 
terminate (a hacky idea might be to just terminate after some time has elapsed? 
Not saying it is a good idea though).
   
   >  IMO having all this classes and functions that require 
HitsThresholdChecker as package protected would be enough and would not add to 
the overall complexity of running a simple search using an executor or not ?
   
   Package protected is a good idea -- updated. Does it look fine now?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-29 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-526113533
 
 
   > > So, essentially, we make implicit CollectorManager implementations owned 
by IndexSearcher instead of separate implementation classes?
   > 
   > Yes I think it would be easier and we don't really need to expose 
`HitsThresholdChecker` in this case, it's just an impl detail that is used by 
the parallel and the normal top docs collector ?
   
   Not really, it would be shared across both `TopFieldCollector and 
`TopScoreDocCollector`. Also, I believe having the abstraction will let future 
implementations customize the threshold logic without needing to make any core 
changes or introduce a new collector, hence we should let the threshold checker 
implementations be public. WDYT?
   
   Updated per comments -- let me know if it looks fine


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-29 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-526102145
 
 
   > > That would require a CollectorManager implementation
   > 
   > Not necessarily, you could create a `GlobalSharedHitsThresholdChecker` and 
use it in the manager impl of `IndexSearcher#searchAfter` if the executor is 
not null and the number of slices is > 1 ?
   > In fact I wonder if the collector manager impls 
(`SharedHitCountFieldCollectorManager`, ...) are required, we could use the 
`GlobalSharedHitsThresholdChecker` internally in `IndexSearcher` without adding 
a dedicated manager impl ?
   
   So, essentially, we make implicit `CollectorManager` implementations owned 
by `IndexSearcher` instead of separate implementation classes?
   
   I am fine with this -- I do not really have a strong preference either way.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-29 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-526075360
 
 
   > I wonder if the shared count should be used automatically in 
`IndexSearcher#searchAfter` if the executor is not null and the number of 
slices > 1 ?
   
   That would require a `CollectorManager` implementation that passes in 
`GlobalSharedHitsThresholdChecker` and then merges the documents across slices. 
I have implemented that -- should be able to raise a follow up PR today
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-28 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-525754830
 
 
   @jimczi Thanks, I updated per your comments. Please let me know if it seems 
fine.
   
   RE: benchmarks, the numbers I posted earlier were on wikimedium10k. I did a 
run of wikimedium2m and saw the following:
   
   TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
  HighTermMonthSort  256.49 (14.6%)  240.49 (13.0%)   
-6.2% ( -29% -   24%)
  HighTermDayOfYearSort  140.97  (8.4%)  137.26  (8.6%)   
-2.6% ( -18% -   15%)
 Fuzzy2   67.54 (20.4%)   65.90 (19.3%)   
-2.4% ( -34% -   46%)
MedTerm 1430.97  (4.8%) 1404.31  (4.2%)   
-1.9% ( -10% -7%)
 AndHighLow  859.98  (5.2%)  847.48  (4.1%)   
-1.5% ( -10% -8%)
Respell  111.18  (2.4%)  110.17  (2.5%)   
-0.9% (  -5% -4%)
Prefix3  278.56  (3.3%)  276.16  (2.5%)   
-0.9% (  -6% -5%)
LowTerm 1635.42  (5.3%) 1628.34  (3.2%)   
-0.4% (  -8% -8%)
 Fuzzy1   65.18 (11.5%)   64.94 (12.6%)   
-0.4% ( -21% -   26%)
   Wildcard  184.51  (3.0%)  183.97  (3.3%)   
-0.3% (  -6% -6%)
 HighPhrase  150.25  (2.8%)  149.83  (2.9%)   
-0.3% (  -5% -5%)
   PKLookup  124.37  (3.6%)  124.04  (3.4%)   
-0.3% (  -7% -7%)
   HighTerm  991.92  (4.7%)  990.52  (4.7%)   
-0.1% (  -9% -9%)
   BrowseDayOfYearTaxoFacets 9293.85  (4.9%) 9291.07  (5.4%)   
-0.0% (  -9% -   10%)
LowSloppyPhrase  143.41  (1.9%)  143.38  (2.1%)   
-0.0% (  -3% -4%)
  BrowseMonthSSDVFacets   43.43  (0.9%)   43.42  (1.0%)   
-0.0% (  -1% -1%)
  MedPhrase   93.36  (1.5%)   93.41  (2.2%)
0.0% (  -3% -3%)
   HighSloppyPhrase   43.97  (2.4%)   43.99  (2.4%)
0.1% (  -4% -5%)
MedSloppyPhrase  108.07  (2.1%)  108.15  (1.8%)
0.1% (  -3% -4%)
   BrowseDayOfYearSSDVFacets   38.85  (0.6%)   38.88  (0.5%)
0.1% (  -1% -1%)
  LowPhrase  181.55  (1.8%)  181.82  (1.6%)
0.1% (  -3% -3%)
MedSpanNear   66.17  (4.4%)   66.28  (3.0%)
0.2% (  -7% -8%)
   HighSpanNear  110.26  (3.3%)  110.57  (2.6%)
0.3% (  -5% -6%)
LowSpanNear   84.69  (1.9%)   84.94  (2.0%)
0.3% (  -3% -4%)
   HighIntervalsOrdered   47.21  (0.9%)   47.37  (1.5%)
0.3% (  -1% -2%)
AndHighHigh  118.46  (4.3%)  119.01  (3.9%)
0.5% (  -7% -8%)
 AndHighMed  187.97  (3.8%)  188.85  (3.4%)
0.5% (  -6% -8%)
 OrHighHigh   79.07  (3.0%)   79.45  (2.5%)
0.5% (  -4% -6%)
 IntNRQ  238.67  (4.3%)  239.97  (3.5%)
0.5% (  -6% -8%)
  OrHighLow  429.26  (4.4%)  432.55  (3.3%)
0.8% (  -6% -8%)
  OrHighMed  170.16  (3.0%)  171.65  (2.8%)
0.9% (  -4% -6%)
   BrowseDateTaxoFacets   11.28  (2.6%)   11.39  (3.7%)
0.9% (  -5% -7%)
  BrowseMonthTaxoFacets 9470.41  (6.2%) 9584.95  (4.7%)
1.2% (  -9% -   12%)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-28 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-525720523
 
 
   @jimczi Updated the PR with implementation for `TopScoreDocsCollector'. 
Haven't added the corresponding `CollectorManager` yet though, I presume that 
can be done separately?
   
   Luceneutil numbers on the latest version:
   
 TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
MedSloppyPhrase  498.64  (6.9%)  484.94 (16.3%)   
-2.7% ( -24% -   21%)
   HighIntervalsOrdered  303.56 (10.5%)  296.03 (11.8%)   
-2.5% ( -22% -   22%)
   BrowseDateTaxoFacets 3366.36  (5.2%) 3328.48  (5.4%)   
-1.1% ( -11% -9%)
   HighSloppyPhrase  500.45  (8.4%)  496.32  (9.1%)   
-0.8% ( -16% -   18%)
 IntNRQ 1108.56  (3.3%) 1104.31  (4.6%)   
-0.4% (  -7% -7%)
 Fuzzy1  321.49  (4.8%)  320.92  (4.1%)   
-0.2% (  -8% -9%)
 HighPhrase  571.71  (9.5%)  571.13  (8.8%)   
-0.1% ( -16% -   20%)
Respell  386.71  (8.4%)  386.35  (8.1%)   
-0.1% ( -15% -   17%)
AndHighHigh  807.05  (2.5%)  806.66  (4.2%)   
-0.0% (  -6% -6%)
  HighTermDayOfYearSort  733.99  (4.4%)  733.77  (5.6%)   
-0.0% (  -9% -   10%)
  OrHighLow 1158.52  (1.7%) 1159.52  (3.6%)
0.1% (  -5% -5%)
   BrowseDayOfYearTaxoFacets 8364.69  (2.9%) 8375.25  (2.8%)
0.1% (  -5% -5%)
  BrowseMonthSSDVFacets 1999.10  (2.9%) 2005.14  (3.8%)
0.3% (  -6% -7%)
  BrowseMonthTaxoFacets 8403.68  (2.5%) 8430.34  (3.5%)
0.3% (  -5% -6%)
MedTerm 3481.42  (3.0%) 3495.13  (3.0%)
0.4% (  -5% -6%)
  MedPhrase  813.12  (4.0%)  816.40  (4.0%)
0.4% (  -7% -8%)
   BrowseDayOfYearSSDVFacets 1762.24  (2.9%) 1770.51  (4.5%)
0.5% (  -6% -8%)
  OrHighMed  408.00  (3.8%)  409.94  (4.7%)
0.5% (  -7% -9%)
   PKLookup  148.60  (3.5%)  149.33  (6.3%)
0.5% (  -9% -   10%)
  HighTermMonthSort 1869.60 (12.7%) 1879.81 (13.1%)
0.5% ( -22% -   30%)
   Wildcard  898.93  (3.8%)  903.87  (3.7%)
0.5% (  -6% -8%)
   HighTerm 2075.22  (3.2%) 2088.85  (2.9%)
0.7% (  -5% -6%)
  LowPhrase  610.73  (4.6%)  617.56  (3.8%)
1.1% (  -6% -9%)
LowSloppyPhrase  825.44  (3.2%)  835.72  (3.3%)
1.2% (  -5% -7%)
LowSpanNear 1388.73  (2.2%) 1409.14  (2.8%)
1.5% (  -3% -6%)
 AndHighLow 3047.72  (5.5%) 3096.11  (5.9%)
1.6% (  -9% -   13%)
MedSpanNear  750.19  (2.9%)  763.02  (2.0%)
1.7% (  -3% -6%)
LowTerm 3670.47  (3.6%) 3736.25  (4.9%)
1.8% (  -6% -   10%)
   HighSpanNear  401.49  (5.4%)  408.94  (4.5%)
1.9% (  -7% -   12%)
Prefix3  390.52  (9.0%)  397.98  (7.5%)
1.9% ( -13% -   20%)
 OrHighHigh  319.58 (15.3%)  326.27  (9.9%)
2.1% ( -20% -   32%)
 AndHighMed 1011.67  (3.5%) 1035.03  (3.8%)
2.3% (  -4% -9%)
 Fuzzy2   37.75 (25.1%)   39.47 (24.4%)
4.6% ( -35% -   72%)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-28 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-525695991
 
 
   > > Looks like the performance is consistent and we see no degradation. WDYT?
   > 
   > I don't think that these tasks use a `TopFieldDocCollector` with early 
termination ? This is why I said that it would be easier to test with the 
`TopScoreDocCollector`.
   
   I thought some of the tasks do, but apparently not.
   
   Should we commit this then, and then follow up on the other JIRA that I 
opened?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-28 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-525634541
 
 
   I opened a JIRA for the follow up : 
https://issues.apache.org/jira/browse/LUCENE-8958
   
   Will post a patch once we merge this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-27 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-525287168
 
 
   @jimczi  Thanks for reviewing the PR.
   
   I will follow up with a JIRA and a PR for `TopScoreDocCollector` that does 
the same as this one.
   
   I ran Luceneutil a bunch of times and took an average of performances, the 
numbers look like:
   
   TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
   BrowseDateTaxoFacets 3445.32  (5.0%) 3293.96  (5.5%)   
-4.4% ( -14% -6%)
Prefix3  218.41 (23.7%)  212.36 (17.4%)   
-2.8% ( -35% -   50%)
Respell  287.17  (7.8%)  280.86  (8.6%)   
-2.2% ( -17% -   15%)
MedTerm 3415.85  (4.4%) 3349.44  (3.6%)   
-1.9% (  -9% -6%)
   Wildcard  236.80 (12.5%)  232.30 (12.4%)   
-1.9% ( -23% -   26%)
AndHighHigh  498.85 (10.9%)  489.48 (10.1%)   
-1.9% ( -20% -   21%)
  HighTermDayOfYearSort  600.10  (3.7%)  588.84  (2.2%)   
-1.9% (  -7% -4%)
  OrHighMed  590.54  (2.9%)  581.16  (4.3%)   
-1.6% (  -8% -5%)
LowTerm 3943.78  (2.8%) 3882.24  (3.4%)   
-1.6% (  -7% -4%)
LowSloppyPhrase 1689.48  (2.5%) 1669.56  (2.2%)   
-1.2% (  -5% -3%)
   BrowseDayOfYearTaxoFacets 8421.12  (1.7%) 8334.95  (2.5%)   
-1.0% (  -5% -3%)
   PKLookup  146.25  (4.7%)  144.86  (5.4%)   
-0.9% ( -10% -9%)
  OrHighLow 1043.32  (2.1%) 1033.71  (2.7%)   
-0.9% (  -5% -3%)
 IntNRQ 1419.85  (2.2%) 1407.79  (1.7%)   
-0.8% (  -4% -3%)
 AndHighLow 2100.43  (2.7%) 2083.03  (2.9%)   
-0.8% (  -6% -4%)
   HighSpanNear  258.87 (18.4%)  256.75 (22.3%)   
-0.8% ( -35% -   48%)
 AndHighMed  936.63  (2.5%)  929.39  (2.0%)   
-0.8% (  -5% -3%)
  HighTermMonthSort 1668.46  (2.2%) 1656.43  (3.0%)   
-0.7% (  -5% -4%)
MedSpanNear 1157.98  (2.5%) 1150.21  (2.5%)   
-0.7% (  -5% -4%)
  MedPhrase  515.46  (6.3%)  512.41  (6.0%)   
-0.6% ( -12% -   12%)
  LowPhrase  761.37  (2.4%)  757.55  (2.1%)   
-0.5% (  -4% -4%)
   BrowseDayOfYearSSDVFacets 1802.01  (2.7%) 1794.09  (1.8%)   
-0.4% (  -4% -4%)
LowSpanNear 1460.48  (3.2%) 1454.29  (3.2%)   
-0.4% (  -6% -6%)
  BrowseMonthTaxoFacets 8490.66  (1.8%) 8459.18  (2.2%)   
-0.4% (  -4% -3%)
MedSloppyPhrase  672.44  (9.3%)  670.70  (4.8%)   
-0.3% ( -13% -   15%)
   HighTerm 1356.56  (2.5%) 1354.39  (2.0%)   
-0.2% (  -4% -4%)
 OrHighHigh  580.03 (11.5%)  579.81 (11.0%)   
-0.0% ( -20% -   25%)
 Fuzzy1  268.95  (6.4%)  269.28  (5.8%)
0.1% ( -11% -   13%)
  BrowseMonthSSDVFacets 2025.43  (1.7%) 2029.44  (2.5%)
0.2% (  -3% -4%)
 HighPhrase  466.42 (14.7%)  469.01 (13.4%)
0.6% ( -23% -   33%)
   HighIntervalsOrdered  389.06 (13.4%)  391.67 (13.8%)
0.7% ( -23% -   32%)
   HighSloppyPhrase  771.58 (11.8%)  777.44  (8.9%)
0.8% ( -17% -   24%)
 Fuzzy2   88.96 (18.0%)   90.19 (12.7%)
1.4% ( -24% -   39%)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-25 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-524725795
 
 
   @jimczi Looks like I partially misunderstood your earlier comments -- sorry 
about that. I have raised a version with the approach we discussed earlier -- 
does this look like what you had in mind?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-23 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-524385885
 
 
   > The reason for that is since `AtomicInteger.get()` is not guaranteed to be 
thread safe, so that was a hack way to read the value safely. If you notice, we 
do increment the counter when a hit is collected. However, the specific 
location you are mentioning is just for reading without a lock -- do an 
`incrementAndGet`, then decrement the phantom increment that we just did
   
   Hmm, I am thinking if we should just use `getAcquire` instead of this ugly 
dance, since that would ensure that loads and stores are not reordered and thus 
serialized?
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-23 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-524382441
 
 
   
   > I don't see what the modification is. The early termination logic is the 
same, the only diff is that the counter to check totalHitsThreshold is global 
in one case and local in the current impl. What I propose is to decorrelate the 
local totalHits that is needed to count the number of documents visited by the 
collector and the counter to check the `totalHitsThreshold`. A simple 
`BooleanSupplier` that checks if updateMinScore can be called won't change any 
existing functionality and you can hide this implementation detail by keeping 
the TopFieldCollector#create as they are today. Did I miss something ?
   
   No, you did not. As I mentioned earlier (sorry if I was not being clear), 
the reason I did not do it the way you proposed was an excessive paranoid to 
not touch a widely used family of collectors when adding this functionality, 
until I demonstrate it to the community and get feedback. Based on our 
discussion above, I implicitly understand that you are fine with the approached 
proposed in the PR and believe it is safe enough to be added to the core 
`TopFieldCollector`? If that is the case, I will raise another iteration, 
integrating the same into the parent class.
   
   
   > Yes, why do you need to decrement the counter, any visited document should 
increment ?
   
   The reason for that is since `AtomicInteger.get()` is not guaranteed to be 
thread safe, so that was a hack way to read the value safely. If you notice, we 
do increment the counter when a hit is collected. However, the specific 
location you are mentioning is just for reading without a lock -- do an 
`incrementAndGet`, then decrement the phantom increment that we just did


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-23 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-524335774
 
 
   > Sorry I don't follow. The logic for early termination should be the same 
in `TopFieldCollector` than the one you put in `SharedHitCountFieldCollector`. 
   
   Yes, that is correct, but adding it to `TopFieldCollector` would mean that 
it is a part of `SimpleFieldCollector` and `PagingFieldCollector` code paths as 
well, and I was conservatively not modifying existing functionality (not 
atleast until we discussed it in the community).
   
   > I also don't understand why you decrement the global total hits counter ? 
Shouldn't it be equivalent to a shared totalHits in the current impl ?
   
   Are you referring to the case when the global hits counter is decremented 
right when we cross the threshold?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-23 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-524332090
 
 
   @jimczi I did consider doing that, but went down this route to explicitly 
decouple the early termination code path from the standard implementations of 
TopFieldCollector. I was unsure if we should directly enable existing (and 
future) field collectors to work with early termination.
   
   WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-22 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-524157568
 
 
   Any thoughts on this one? This seems like a reasonable way to avoid 
collecting extra hits with minimal overhead


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search

2019-08-06 Thread GitBox
atris commented on issue #823: LUCENE-8939: Introduce Shared Count Early 
Termination In Parallel Search
URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-518522813
 
 
   cc @jpountz 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org