[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826056#comment-13826056 ]
Hoss Man commented on SOLR-5463: -------------------------------- I've been reading up on the internals of IndexSearcher.searchAfter and the associated PagingFieldCollector used (as well as some of the problems encountered in SOLR-1726) and I'm not convinced it could be a slam dunk to try and use them directly in Solr: * IndexSearcher.searchAfter/PagingFieldCollector relies on the "client" (ie: Solr) passing back the FieldDoc of the last doc returned, and has expectations that the (lucene) docid contained in that FieldDoc will be meaningful ** We could perhaps serialize a representation of the "last" FieldDoc to include the the response of each request, and the deserialize that into a suitable imposter object on the "searchAfter" request -- but there is still the problem of the internal docid which will be missleading in a multishard distributed solr setup) * There are a varity of code paths in SolrIndexSearcher for executing searches and it's not immediately obvious (to me) if/when it would make sense to augment each of those paths with PagingFieldCollector (see yonik's comment in SOLR-1726 about faceting). With that in mind, the approach i'm going to pursue (largely for my own sanity) is: * Attempt a minimally invasive straw man implimentation of "searchAfter" type functionality that works in distributed mode -- ideally w/o modifying any existing Solr code. * Use this straw man implementation to sanity check that the end user API is useful * Build up good comprehensive (passing) tests against this straw man * circle back and revisit the implementation details looking for oportunities to: ** refactor to eliminate similar code duplication ** improve performance My current idea is to implement this straw man solution using a new SearchComponent that would run _after_ QueryComponent, along hte lines of... * prepare: ** No-Op unless "searchAfter" param is specified *** Use some marker value to mean "first page" ** assert that start==0 (doesn't make sense when using searchAfter) ** assert that uniqueKey is one of the sort fields (to ensure consistent ordering) ** if searchAfter param value indicates this is not the first request: *** deserialize the token it into a list of sort values *** add a new PostFilter that restricts to documents based on those values and the sort directions (same basic logic as PagingFieldCollector) * process: ** No-Op unless "searchAfter" param is specified ** do nothing if this is a shard request ** for regular old single node solr requests: serialize the sort values of the last doc in the Doc List (that QueryComponent has already built) and put it in the response as the "next" searchAfter token * finishStage: ** No-Op unless "searchAfter" param is specified and stage is "DONE" ** serialize the sort values of the last doc in the Doc List (that QueryComponent already merged) and put it in the response as the "next" searchAfter token > Provide cursor/token based "searchAfter" support that works with arbitrary > sorting (ie: "deep paging") > ------------------------------------------------------------------------------------------------------ > > Key: SOLR-5463 > URL: https://issues.apache.org/jira/browse/SOLR-5463 > Project: Solr > Issue Type: New Feature > Reporter: Hoss Man > Assignee: Hoss Man > > I'd like to revist a solution to the problem of "deep paging" in Solr, > leveraging an HTTP based API similar to how IndexSearcher.searchAfter works > at the lucene level: require the clients to provide back a token indicating > the sort values of the last document seen on the previous "page". This is > similar to the "cursor" model I've seen in several other REST APIs that > support "pagnation" over a large sets of results (notable the twitter API and > it's "since_id" param) except that we'll want something that works with > arbitrary multi-level sort critera that can be either ascending or descending. > SOLR-1726 laid some initial ground work here and was commited quite a while > ago, but the key bit of argument parsing to leverage it was commented out due > to some problems (see comments in that issue). It's also somewhat out of > date at this point: at the time it was commited, IndexSearcher only supported > searchAfter for simple scores, not arbitrary field sorts; and the params > added in SOLR-1726 suffer from this limitation as well. > --- > I think it would make sense to start fresh with a new issue with a focus on > ensuring that we have deep paging which: > * supports arbitrary field sorts in addition to sorting by score > * works in distributed mode -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org