[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2021-01-07 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260646#comment-17260646
 ] 

Munendra S N commented on SOLR-10732:
-

Thanks [~mgibney] for the latest patch. I will go through the patch to see if 
we can include it in 8.8 release

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch, SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-11 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248077#comment-17248077
 ] 

Michael Gibney commented on SOLR-10732:
---

Thanks, [~munendrasn], that makes sense. I don't have any objections, and your 
change is straightforward and low-risk.

Following up on my comment about optimizing "higher up in the program logic, to 
prune as much execution as possible (and when it's clearer how/why we got the 
point of having an empty domain)", I started trying to approach this from the 
top down and quickly got rather deeper into it than I'd intended. I'll post the 
resulting patch here for the sake of comparison; feel free to incorporate any 
aspects as you see fit, but to be clear I think it'd be perfectly reasonable to 
go ahead with your initial patch.

The new patch optimizes wrt configured {{mincount}} (where applicable) and 
covers all legacy facet types (I think). It's still pretty straightforward, but 
compared to your initial proposal it's admittedly more complex. It prunes 
potentially quite a bit more execution I think; but notably, as a consequence 
of optimizing higher up, and wrt {{mincount}} (as opposed to only 
{{domain.size()==0}}), it carries potential risks that your patch doesn't. 
(also note: it will fail precommit due to some {{nocommit}} comments on some 
assertions temporarily present to check for functional parity with your initial 
patch).



> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-09 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246665#comment-17246665
 ] 

Munendra S N commented on SOLR-10732:
-

{quote}if perhaps you arrived at working on this because you encountered 
performance issues that would be addressed by this change{quote}
I had few tickets in backburner from long time and had some spare time during 
this holiday rooster so, was trying cleanup few of those tickets, this is one 
of those tickets.

If there are any objections, would be happy to address else I would like 
proceed and commit this probably during weekend.

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-09 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246655#comment-17246655
 ] 

Michael Gibney commented on SOLR-10732:
---

Yes, I saw the comment; I guess I interpreted the comment more as 
brainstorming, and was wondering whether you had done any further evaluation of 
the practical merits of the change (or if perhaps you arrived at working on 
this because you encountered performance issues that would be addressed by this 
change,  etc...).

I agree that the potential {{SolrIndexSearcher.getDocSet(Query, DocSet)}} 
optimization, if pursued, should get its own issue.

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-08 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246305#comment-17246305
 ] 

Munendra S N commented on SOLR-10732:
-

{quote}I'm curious, Munendra S N – were you able to perceive a performance 
benefit with these changes? Where these optimizations are located, afaict they 
optimize edge cases, and the query-building they prevent (if I'm reading right) 
is generally pretty lightweight (e.g., TermQuery ...).{quote}
Changes here are based on this 
[comment|https://issues.apache.org/jira/browse/SOLR-10727?focusedCommentId=16020247&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16020247].
 As you said, this tries to avoid additional object creation and computation 
for some edge cases and based on my understanding, it helps especially in case 
facet queries or group facets

{quote}By way of contrast (wrt complexity/benefit tradeoff), at the leaf level 
it looks like SolrIndexSearcher.getDocSet(Query, DocSet) could be optimized in 
a way analogous to what SOLR-10727 does for SolrIndexSearcher.numDocs(Query, 
DocSet), avoiding filterCache pollution {quote}
+1, If there is possibility to improve/optimize it. We should definitely do it 
but I think it should be handled in its own issue

{quote}or maybe also higher up in the program logic, to prune as much execution 
as possible (and when it's clearer how/why we got the point of having an empty 
domain). The changes here seem to be building in mid-level "shot in the dark" 
safeguards, where it's relatively unclear what's going on.{quote}
Initially, planned to make these changes in getFacetCounts which would handle 
the case for intervalFacet and heatmap but realized changes would be too 
cluttered so, decided to delegate handling this case respective types. Let me 
know if this could be simplified and probably handle other facets too

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-08 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246106#comment-17246106
 ] 

Michael Gibney commented on SOLR-10732:
---

I'm curious, [~munendrasn] -- were you able to perceive a performance benefit 
with these changes? Where these optimizations are located, afaict they optimize 
edge cases, and the query-building they prevent (if I'm reading right) is 
generally pretty lightweight (e.g., {{TermQuery}} ...).

It seems like it makes most sense to optimize this kind of thing either at the 
leaf level (i.e., in {{SolrIndexSearcher.numDocs(...)}} -- already done in 
SOLR-10727) or maybe also higher up in the program logic, to prune as much 
execution as possible (and when it's clearer how/why we got the point of having 
an empty domain). The changes here seem to be building in mid-level "shot in 
the dark" safeguards, where it's relatively unclear what's going on.

By way of contrast (wrt complexity/benefit tradeoff), at the leaf level it 
looks like {{SolrIndexSearcher.getDocSet(Query, DocSet)}} could be optimized in 
a way analogous to what SOLR-10727 does for {{SolrIndexSearcher.numDocs(Query, 
DocSet)}}, avoiding filterCache pollution ...

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-06 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244970#comment-17244970
 ] 

Munendra S N commented on SOLR-10732:
-

I have created the Github PR and requested for the review

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-05 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244595#comment-17244595
 ] 

David Smiley commented on SOLR-10732:
-

+1 to what you're doing, but at least one maybe two of the spots in this patch 
are redundant / don't save anything because the line you modified was _not_ 
creating a Query.  I think in {{getListedTermCounts}} where the stream was.  
BTW PRs are kinder on the reviewers than old patch files.

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-04 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244419#comment-17244419
 ] 

Munendra S N commented on SOLR-10732:
-

 [^SOLR-10732.patch] 
All tests are passing. I will wait for feedback, if any before committing 

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org