[
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-5463:
---------------------------
Attachment: SOLR-5463__straw_man.patch
Ok, updated patch making the change in user semantics I mentioned wanting to
try last week. Best way to explain it is with a walk through of a simple
example (note: if you try the current strawman code, the "numFound" and "start"
values returned in the docList don't match what i've pasted in the examples
below -- these examples show what the final results should look like in the
finished solution)
Initial requests using searchAfter should always start with a totem value of
"{{\*}}"
{code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&searchAfter=*}
{
"responseHeader":{
"status":0,
"QTime":2},
"response":{"numFound":32,"start":-1,"docs":[
// ...20 docs here...
]
},
"nextSearchAfter":"AoEjTk9L"}
{code}
The {{nextSearchAfter}} token returned by this request tells us what to use in
the second request...
{code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&searchAfter=AoEjTk9L}
{
"responseHeader":{
"status":0,
"QTime":7},
"response":{"numFound":32,"start":-1,"docs":[
// ...12 docs here...
]
},
"nextSearchAfter":"AoEoMDU3OUIwMDI="}
{code}
Since this result block contains fewer rows then were requested, the client
could automatically stop, but the {{nextSearchAfter}} is still returned, and
it's still safe to request a subsequent page (this is the fundemental diff from
the previous patches, where {{nextSearchAfter}} was set to {{null}} anytime the
code could tell there were no more results ...
{code:title=http://localhost:8983/solr/deep?q=*:*&wt=json&indent=true&rows=20&fl=id,price&sort=id+desc&searchAfter=AoEoMDU3OUIwMDI=}
{
"responseHeader":{
"status":0,
"QTime":1},
"response":{"numFound":32,"start":-1,"docs":[]
},
"nextSearchAfter":"AoEoMDU3OUIwMDI="}
{code}
Note that in this case, with no docs included in the response, the
{{nextSearchAfter}} totem is the same as the input.
For some sorts this makes it possible for clients to "resume" a full walk of
all documents matching a query -- picking up where they let off if more
documents are added to the index that match (for example: when doing an
ascending sort on a numeric uniqueKey field that always increases as new docs
are added, sorting by a timestamp field (asc) indicating when documents are
crawled, etc...)
This also works as you would expect for searches that don't match any
documents...
{code:title=http://localhost:8983/solr/deep?q=text:bogus&rows=20&sort=id+desc&searchAfter=*}
{
"responseHeader":{
"status":0,
"QTime":21},
"response":{"numFound":0,"start":-1,"docs":[]
},
"nextSearchAfter":"*"}
{code}
> Provide cursor/token based "searchAfter" support that works with arbitrary
> sorting (ie: "deep paging")
> ------------------------------------------------------------------------------------------------------
>
> Key: SOLR-5463
> URL: https://issues.apache.org/jira/browse/SOLR-5463
> Project: Solr
> Issue Type: New Feature
> Reporter: Hoss Man
> Assignee: Hoss Man
> Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch,
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch,
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch,
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch,
> SOLR-5463__straw_man.patch
>
>
> I'd like to revist a solution to the problem of "deep paging" in Solr,
> leveraging an HTTP based API similar to how IndexSearcher.searchAfter works
> at the lucene level: require the clients to provide back a token indicating
> the sort values of the last document seen on the previous "page". This is
> similar to the "cursor" model I've seen in several other REST APIs that
> support "pagnation" over a large sets of results (notable the twitter API and
> it's "since_id" param) except that we'll want something that works with
> arbitrary multi-level sort critera that can be either ascending or descending.
> SOLR-1726 laid some initial ground work here and was commited quite a while
> ago, but the key bit of argument parsing to leverage it was commented out due
> to some problems (see comments in that issue). It's also somewhat out of
> date at this point: at the time it was commited, IndexSearcher only supported
> searchAfter for simple scores, not arbitrary field sorts; and the params
> added in SOLR-1726 suffer from this limitation as well.
> ---
> I think it would make sense to start fresh with a new issue with a focus on
> ensuring that we have deep paging which:
> * supports arbitrary field sorts in addition to sorting by score
> * works in distributed mode
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]