[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty
[ https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260646#comment-17260646 ] Munendra S N commented on SOLR-10732: - Thanks [~mgibney] for the latest patch. I will go through the patch to see if we can include it in 8.8 release > potential optimizations in callers of SolrIndexSearcher.numDocs when docset > is empty > > > Key: SOLR-10732 > URL: https://issues.apache.org/jira/browse/SOLR-10732 > Project: Solr > Issue Type: Improvement >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-10732.patch, SOLR-10732.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > spin off of SOLR-10727... > {quote} > ...why not (also) optimize it slightly higher up and completely avoid the > construction of the Query objects? (and in some cases: additional overhead) > for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i > found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the > first line of that method was {{if (0 == subset.size()) return 0}} then we'd > not only optimize away the SolrIndexSearcher hit, but also fetching the > SchemaField & building the range query (not to mention the much more > expensive {{getGroupedFacetQueryCount}} in the grouping case) > At a glance, most other callers of > {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this > way as well -- at a minimum to eliminate Query parsing/construction. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty
[ https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248077#comment-17248077 ] Michael Gibney commented on SOLR-10732: --- Thanks, [~munendrasn], that makes sense. I don't have any objections, and your change is straightforward and low-risk. Following up on my comment about optimizing "higher up in the program logic, to prune as much execution as possible (and when it's clearer how/why we got the point of having an empty domain)", I started trying to approach this from the top down and quickly got rather deeper into it than I'd intended. I'll post the resulting patch here for the sake of comparison; feel free to incorporate any aspects as you see fit, but to be clear I think it'd be perfectly reasonable to go ahead with your initial patch. The new patch optimizes wrt configured {{mincount}} (where applicable) and covers all legacy facet types (I think). It's still pretty straightforward, but compared to your initial proposal it's admittedly more complex. It prunes potentially quite a bit more execution I think; but notably, as a consequence of optimizing higher up, and wrt {{mincount}} (as opposed to only {{domain.size()==0}}), it carries potential risks that your patch doesn't. (also note: it will fail precommit due to some {{nocommit}} comments on some assertions temporarily present to check for functional parity with your initial patch). > potential optimizations in callers of SolrIndexSearcher.numDocs when docset > is empty > > > Key: SOLR-10732 > URL: https://issues.apache.org/jira/browse/SOLR-10732 > Project: Solr > Issue Type: Improvement >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-10732.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > spin off of SOLR-10727... > {quote} > ...why not (also) optimize it slightly higher up and completely avoid the > construction of the Query objects? (and in some cases: additional overhead) > for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i > found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the > first line of that method was {{if (0 == subset.size()) return 0}} then we'd > not only optimize away the SolrIndexSearcher hit, but also fetching the > SchemaField & building the range query (not to mention the much more > expensive {{getGroupedFacetQueryCount}} in the grouping case) > At a glance, most other callers of > {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this > way as well -- at a minimum to eliminate Query parsing/construction. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty
[ https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246665#comment-17246665 ] Munendra S N commented on SOLR-10732: - {quote}if perhaps you arrived at working on this because you encountered performance issues that would be addressed by this change{quote} I had few tickets in backburner from long time and had some spare time during this holiday rooster so, was trying cleanup few of those tickets, this is one of those tickets. If there are any objections, would be happy to address else I would like proceed and commit this probably during weekend. > potential optimizations in callers of SolrIndexSearcher.numDocs when docset > is empty > > > Key: SOLR-10732 > URL: https://issues.apache.org/jira/browse/SOLR-10732 > Project: Solr > Issue Type: Improvement >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-10732.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > spin off of SOLR-10727... > {quote} > ...why not (also) optimize it slightly higher up and completely avoid the > construction of the Query objects? (and in some cases: additional overhead) > for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i > found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the > first line of that method was {{if (0 == subset.size()) return 0}} then we'd > not only optimize away the SolrIndexSearcher hit, but also fetching the > SchemaField & building the range query (not to mention the much more > expensive {{getGroupedFacetQueryCount}} in the grouping case) > At a glance, most other callers of > {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this > way as well -- at a minimum to eliminate Query parsing/construction. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty
[ https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246655#comment-17246655 ] Michael Gibney commented on SOLR-10732: --- Yes, I saw the comment; I guess I interpreted the comment more as brainstorming, and was wondering whether you had done any further evaluation of the practical merits of the change (or if perhaps you arrived at working on this because you encountered performance issues that would be addressed by this change, etc...). I agree that the potential {{SolrIndexSearcher.getDocSet(Query, DocSet)}} optimization, if pursued, should get its own issue. > potential optimizations in callers of SolrIndexSearcher.numDocs when docset > is empty > > > Key: SOLR-10732 > URL: https://issues.apache.org/jira/browse/SOLR-10732 > Project: Solr > Issue Type: Improvement >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-10732.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > spin off of SOLR-10727... > {quote} > ...why not (also) optimize it slightly higher up and completely avoid the > construction of the Query objects? (and in some cases: additional overhead) > for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i > found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the > first line of that method was {{if (0 == subset.size()) return 0}} then we'd > not only optimize away the SolrIndexSearcher hit, but also fetching the > SchemaField & building the range query (not to mention the much more > expensive {{getGroupedFacetQueryCount}} in the grouping case) > At a glance, most other callers of > {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this > way as well -- at a minimum to eliminate Query parsing/construction. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty
[ https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246305#comment-17246305 ] Munendra S N commented on SOLR-10732: - {quote}I'm curious, Munendra S N – were you able to perceive a performance benefit with these changes? Where these optimizations are located, afaict they optimize edge cases, and the query-building they prevent (if I'm reading right) is generally pretty lightweight (e.g., TermQuery ...).{quote} Changes here are based on this [comment|https://issues.apache.org/jira/browse/SOLR-10727?focusedCommentId=16020247&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16020247]. As you said, this tries to avoid additional object creation and computation for some edge cases and based on my understanding, it helps especially in case facet queries or group facets {quote}By way of contrast (wrt complexity/benefit tradeoff), at the leaf level it looks like SolrIndexSearcher.getDocSet(Query, DocSet) could be optimized in a way analogous to what SOLR-10727 does for SolrIndexSearcher.numDocs(Query, DocSet), avoiding filterCache pollution {quote} +1, If there is possibility to improve/optimize it. We should definitely do it but I think it should be handled in its own issue {quote}or maybe also higher up in the program logic, to prune as much execution as possible (and when it's clearer how/why we got the point of having an empty domain). The changes here seem to be building in mid-level "shot in the dark" safeguards, where it's relatively unclear what's going on.{quote} Initially, planned to make these changes in getFacetCounts which would handle the case for intervalFacet and heatmap but realized changes would be too cluttered so, decided to delegate handling this case respective types. Let me know if this could be simplified and probably handle other facets too > potential optimizations in callers of SolrIndexSearcher.numDocs when docset > is empty > > > Key: SOLR-10732 > URL: https://issues.apache.org/jira/browse/SOLR-10732 > Project: Solr > Issue Type: Improvement >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-10732.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > spin off of SOLR-10727... > {quote} > ...why not (also) optimize it slightly higher up and completely avoid the > construction of the Query objects? (and in some cases: additional overhead) > for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i > found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the > first line of that method was {{if (0 == subset.size()) return 0}} then we'd > not only optimize away the SolrIndexSearcher hit, but also fetching the > SchemaField & building the range query (not to mention the much more > expensive {{getGroupedFacetQueryCount}} in the grouping case) > At a glance, most other callers of > {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this > way as well -- at a minimum to eliminate Query parsing/construction. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty
[ https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246106#comment-17246106 ] Michael Gibney commented on SOLR-10732: --- I'm curious, [~munendrasn] -- were you able to perceive a performance benefit with these changes? Where these optimizations are located, afaict they optimize edge cases, and the query-building they prevent (if I'm reading right) is generally pretty lightweight (e.g., {{TermQuery}} ...). It seems like it makes most sense to optimize this kind of thing either at the leaf level (i.e., in {{SolrIndexSearcher.numDocs(...)}} -- already done in SOLR-10727) or maybe also higher up in the program logic, to prune as much execution as possible (and when it's clearer how/why we got the point of having an empty domain). The changes here seem to be building in mid-level "shot in the dark" safeguards, where it's relatively unclear what's going on. By way of contrast (wrt complexity/benefit tradeoff), at the leaf level it looks like {{SolrIndexSearcher.getDocSet(Query, DocSet)}} could be optimized in a way analogous to what SOLR-10727 does for {{SolrIndexSearcher.numDocs(Query, DocSet)}}, avoiding filterCache pollution ... > potential optimizations in callers of SolrIndexSearcher.numDocs when docset > is empty > > > Key: SOLR-10732 > URL: https://issues.apache.org/jira/browse/SOLR-10732 > Project: Solr > Issue Type: Improvement >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-10732.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > spin off of SOLR-10727... > {quote} > ...why not (also) optimize it slightly higher up and completely avoid the > construction of the Query objects? (and in some cases: additional overhead) > for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i > found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the > first line of that method was {{if (0 == subset.size()) return 0}} then we'd > not only optimize away the SolrIndexSearcher hit, but also fetching the > SchemaField & building the range query (not to mention the much more > expensive {{getGroupedFacetQueryCount}} in the grouping case) > At a glance, most other callers of > {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this > way as well -- at a minimum to eliminate Query parsing/construction. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty
[ https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244970#comment-17244970 ] Munendra S N commented on SOLR-10732: - I have created the Github PR and requested for the review > potential optimizations in callers of SolrIndexSearcher.numDocs when docset > is empty > > > Key: SOLR-10732 > URL: https://issues.apache.org/jira/browse/SOLR-10732 > Project: Solr > Issue Type: Improvement >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-10732.patch > > Time Spent: 10m > Remaining Estimate: 0h > > spin off of SOLR-10727... > {quote} > ...why not (also) optimize it slightly higher up and completely avoid the > construction of the Query objects? (and in some cases: additional overhead) > for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i > found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the > first line of that method was {{if (0 == subset.size()) return 0}} then we'd > not only optimize away the SolrIndexSearcher hit, but also fetching the > SchemaField & building the range query (not to mention the much more > expensive {{getGroupedFacetQueryCount}} in the grouping case) > At a glance, most other callers of > {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this > way as well -- at a minimum to eliminate Query parsing/construction. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty
[ https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244595#comment-17244595 ] David Smiley commented on SOLR-10732: - +1 to what you're doing, but at least one maybe two of the spots in this patch are redundant / don't save anything because the line you modified was _not_ creating a Query. I think in {{getListedTermCounts}} where the stream was. BTW PRs are kinder on the reviewers than old patch files. > potential optimizations in callers of SolrIndexSearcher.numDocs when docset > is empty > > > Key: SOLR-10732 > URL: https://issues.apache.org/jira/browse/SOLR-10732 > Project: Solr > Issue Type: Improvement >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-10732.patch > > > spin off of SOLR-10727... > {quote} > ...why not (also) optimize it slightly higher up and completely avoid the > construction of the Query objects? (and in some cases: additional overhead) > for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i > found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the > first line of that method was {{if (0 == subset.size()) return 0}} then we'd > not only optimize away the SolrIndexSearcher hit, but also fetching the > SchemaField & building the range query (not to mention the much more > expensive {{getGroupedFacetQueryCount}} in the grouping case) > At a glance, most other callers of > {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this > way as well -- at a minimum to eliminate Query parsing/construction. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty
[ https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244419#comment-17244419 ] Munendra S N commented on SOLR-10732: - [^SOLR-10732.patch] All tests are passing. I will wait for feedback, if any before committing > potential optimizations in callers of SolrIndexSearcher.numDocs when docset > is empty > > > Key: SOLR-10732 > URL: https://issues.apache.org/jira/browse/SOLR-10732 > Project: Solr > Issue Type: Improvement >Reporter: Chris M. Hostetter >Priority: Major > Attachments: SOLR-10732.patch > > > spin off of SOLR-10727... > {quote} > ...why not (also) optimize it slightly higher up and completely avoid the > construction of the Query objects? (and in some cases: additional overhead) > for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i > found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the > first line of that method was {{if (0 == subset.size()) return 0}} then we'd > not only optimize away the SolrIndexSearcher hit, but also fetching the > SchemaField & building the range query (not to mention the much more > expensive {{getGroupedFacetQueryCount}} in the grouping case) > At a glance, most other callers of > {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this > way as well -- at a minimum to eliminate Query parsing/construction. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org