[ https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886151#comment-16886151 ]
Gus Heck commented on SOLR-13125: --------------------------------- The idea behind this patch is interesting. Unless I misunderstand the intent, the idea is to short circuit the response collection when the TRA collection names tell us that further responses will all return docs that are too far down the result list to ever be included. Unfortunately I don't think this patch does that. Issues I see: * This patch overrides finishStage() instead of handleResponses, which means that by the time your logic runs all responses have been received. * I don't see logic to handle values for the start parameter * Also not sure I like the tests checking debug messages rather than actual code behavior. That could get out of sync. In any case, it's unclear to me if this can be handled in a search component without core changes, even if you override handleResponses() instead, you can't stop SearchHandler from looping and attempting to take() the results of every request that was sent (unless you throw an exception, but that wont be good). What you would need to do is somehow influence the futures that solr is waiting on to return early and empty once your request has been filled up from the most recent collections. (see org/apache/solr/handler/component/HttpShardHandler.java:281). Baring that, you could perhaps find a way to empty the pending queue, but that means you still have to wait for at least one uninteresting request to complete. The futures themselves would be waiting on the org/apache/solr/handler/component/HttpShardHandler.java:201. call to makeLoadBalancedRequest(), so I think this optimization requires the addition of an explicit short-circuit enabling hook. Possibly this could be a new method for SearchComponents to override, but we need to think some about how that would play with assumptions of existing code some. > Optimize Queries when sorting by router.field > --------------------------------------------- > > Key: SOLR-13125 > URL: https://issues.apache.org/jira/browse/SOLR-13125 > Project: Solr > Issue Type: Sub-task > Reporter: mosh > Priority: Minor > Attachments: SOLR-13125-no-commit.patch, SOLR-13125.patch, > SOLR-13125.patch, SOLR-13125.patch > > Time Spent: 10m > Remaining Estimate: 0h > > We are currently testing TRA using Solr 7.7, having >300 shards in the alias, > with much growth in the coming months. > The "hot" data(in our case, more recent) will be stored on stronger > nodes(SSD, more RAM, etc). > A proposal of optimizing queries sorted by router.field(the field which TRA > uses to route the data to the correct collection) has emerged. > Perhaps, in queries which are sorted by router.field, Solr could be smart > enough to wait for the more recent collections, and in case the limit was > reached cancel other queries(or just not block and wait for the results)? > For example: > When querying a TRA which with a filter on a different field than > router.field, but sorting by router.field desc, limit=100. > Since this is a TRA, solr will issue queries for all the collections in the > alias. > But to optimize this particular type of query, Solr could wait for the most > recent collection in the TRA, see whether the result set matches or exceeds > the limit. If so, the query could be returned to the user without waiting for > the rest of the shards. If not, the issuing node will block until the second > query returns, and so forth, until the limit of the request is reached. > This might also be useful for deep paging, querying each collection and only > skipping to the next once there are no more results in the specified > collection. > Thoughts or inputs are always welcome. > This is just my two cents, and I'm always happy to brainstorm. > Thanks in advance. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org