[
https://issues.apache.org/jira/browse/SOLR-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859610#comment-13859610
]
Hoss Man commented on SOLR-5595:
--------------------------------
Based on my understanding of the code, there are 3 major (overlapping) changes
that could be made to help improve/clarify solr's distributed sorting code:
----
1) leveraging "fillFields"
The basic premise behind all of the work done in QueryComponent's
"doFieldSortValues" method is summarized in this comment at the top of the
method...
{noformat}
// The query cache doesn't currently store sort field values, and
SolrIndexSearcher doesn't
// currently have an option to return sort field values. Because of this, we
// take the documents given and re-derive the sort values.
{noformat}
While the query cache issue is certainly still true, improvements at the
IndexSearcher level now make it possible to request that the TopDocCollector
also record the sort values for each doc it collects -- these are available in
the FieldDoc objects returned.
SOLR-5463 is already taking avantage of this feature for cursor based searching
-- but that also bypasses the cache (for a variety of reasons). if we enhance
the query reesult cache to also preserve the sort values for each doc in the
DocList, then the same "fillFields" feature could be used to pull back all of
the sort values.
This would pretty much completely eliminate the need for 90% of the work
currently done in doFieldSortValues -- and should be much faster since we'll be
re-using the sort values already generated during the actual sorting, we won't
need to hit the index again to re-derive them.
----
2) Let "fillFields" provide the score if needed for sorting
Assuming we start using IndexSearcher's "fillField" option, then we could
probably simplify some of the logic in QueryComponent regarding sorting by
score. doFieldSortValues currently can't generate the score, so the
coordinator has to ask for it explicitly in the fl so it can be used with
merging. These special edge cases could probably be removed, and the scores
would come back along with the other sort values.
----
3) eliminate ShardDoc.sortFieldValues and use FieldDoc.fields
When a node is coordinating a distributed request, QueryComponent.mergeIds
collects the docs returned by each shard into "ShardDoc" objects which have a
sortFieldValues property containing the full list of all sort values (of all
docs returned by that shard) tacked on to it in a convoluted nested structure
that makes very little sense when looking at the code. But ShardDoc already
extends FieldDoc which has a "fields" array designed to store the sort fields.
If mergeIds just populated the "fields" of each ShardDoc based on the
sort_values returned from the shard, then the mergeIds method could be a lot
simplier and the code would be a lot clearer to read. It should also be
possible to eliminate most/all of ShardFieldSortedHitQueue and instead leverage
the logic in FieldValueHitQueue directly.
> Distributed Sort: potential performance improvements & code readabiliity
> ------------------------------------------------------------------------
>
> Key: SOLR-5595
> URL: https://issues.apache.org/jira/browse/SOLR-5595
> Project: Solr
> Issue Type: Improvement
> Reporter: Hoss Man
>
> A lot of the work solr currently does for dealing with distributed sorting
> was built based on older limitations in Lucene that no longer exist. There
> are opportunities to simplify the code significantly, which should result in
> a speed up -- the biggest blocker at this point is some caching related
> questions.
> I'll post my specific thoughts in a comment
> (This is inspired by some things I noticed working on SOLR-5463 - I didn't
> want to convolute that issue with these performance improvement ideas which
> could be dealt with separately)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]