[ https://issues.apache.org/jira/browse/SOLR-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859610#comment-13859610 ]
Hoss Man commented on SOLR-5595: -------------------------------- Based on my understanding of the code, there are 3 major (overlapping) changes that could be made to help improve/clarify solr's distributed sorting code: ---- 1) leveraging "fillFields" The basic premise behind all of the work done in QueryComponent's "doFieldSortValues" method is summarized in this comment at the top of the method... {noformat} // The query cache doesn't currently store sort field values, and SolrIndexSearcher doesn't // currently have an option to return sort field values. Because of this, we // take the documents given and re-derive the sort values. {noformat} While the query cache issue is certainly still true, improvements at the IndexSearcher level now make it possible to request that the TopDocCollector also record the sort values for each doc it collects -- these are available in the FieldDoc objects returned. SOLR-5463 is already taking avantage of this feature for cursor based searching -- but that also bypasses the cache (for a variety of reasons). if we enhance the query reesult cache to also preserve the sort values for each doc in the DocList, then the same "fillFields" feature could be used to pull back all of the sort values. This would pretty much completely eliminate the need for 90% of the work currently done in doFieldSortValues -- and should be much faster since we'll be re-using the sort values already generated during the actual sorting, we won't need to hit the index again to re-derive them. ---- 2) Let "fillFields" provide the score if needed for sorting Assuming we start using IndexSearcher's "fillField" option, then we could probably simplify some of the logic in QueryComponent regarding sorting by score. doFieldSortValues currently can't generate the score, so the coordinator has to ask for it explicitly in the fl so it can be used with merging. These special edge cases could probably be removed, and the scores would come back along with the other sort values. ---- 3) eliminate ShardDoc.sortFieldValues and use FieldDoc.fields When a node is coordinating a distributed request, QueryComponent.mergeIds collects the docs returned by each shard into "ShardDoc" objects which have a sortFieldValues property containing the full list of all sort values (of all docs returned by that shard) tacked on to it in a convoluted nested structure that makes very little sense when looking at the code. But ShardDoc already extends FieldDoc which has a "fields" array designed to store the sort fields. If mergeIds just populated the "fields" of each ShardDoc based on the sort_values returned from the shard, then the mergeIds method could be a lot simplier and the code would be a lot clearer to read. It should also be possible to eliminate most/all of ShardFieldSortedHitQueue and instead leverage the logic in FieldValueHitQueue directly. > Distributed Sort: potential performance improvements & code readabiliity > ------------------------------------------------------------------------ > > Key: SOLR-5595 > URL: https://issues.apache.org/jira/browse/SOLR-5595 > Project: Solr > Issue Type: Improvement > Reporter: Hoss Man > > A lot of the work solr currently does for dealing with distributed sorting > was built based on older limitations in Lucene that no longer exist. There > are opportunities to simplify the code significantly, which should result in > a speed up -- the biggest blocker at this point is some caching related > questions. > I'll post my specific thoughts in a comment > (This is inspired by some things I noticed working on SOLR-5463 - I didn't > want to convolute that issue with these performance improvement ideas which > could be dealt with separately) -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org