[jira] [Updated] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")

Hoss Man (JIRA) Mon, 09 Dec 2013 15:22:29 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hoss Man updated SOLR-5463:
---------------------------

    Attachment: SOLR-5463__straw_man.patch

Ok, updated patch making the change in user semantics I mentioned wanting to 
try last week.  Best way to explain it is with a walk through of a simple 
example (note: if you try the current strawman code, the "numFound" and "start" 
values returned in the docList don't match what i've pasted in the examples 
below -- these examples show what the final results should look like in the 
finished solution)

Initial requests using searchAfter should always start with a totem value of 
"{{\*}}"

{code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&searchAfter=*}
{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "response":{"numFound":32,"start":-1,"docs":[
      // ...20 docs here...
    ]
  },
  "nextSearchAfter":"AoEjTk9L"}
{code}

The {{nextSearchAfter}} token returned by this request tells us what to use in 
the second request...

{code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&searchAfter=AoEjTk9L}
{
  "responseHeader":{
    "status":0,
    "QTime":7},
  "response":{"numFound":32,"start":-1,"docs":[
      // ...12 docs here...
    ]
  },
  "nextSearchAfter":"AoEoMDU3OUIwMDI="}
{code}

Since this result block contains fewer rows then were requested, the client 
could automatically stop, but the {{nextSearchAfter}} is still returned, and 
it's still safe to request a subsequent page (this is the fundemental diff from 
the previous patches, where {{nextSearchAfter}} was set to {{null}} anytime the 
code could tell there were no more results ...

{code:title=http://localhost:8983/solr/deep?q=*:*&wt=json&indent=true&rows=20&fl=id,price&sort=id+desc&searchAfter=AoEoMDU3OUIwMDI=}
{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "response":{"numFound":32,"start":-1,"docs":[]
  },
  "nextSearchAfter":"AoEoMDU3OUIwMDI="}
{code}

Note that in this case, with no docs included in the response, the 
{{nextSearchAfter}} totem is the same as the input.

For some sorts this makes it possible for clients to "resume" a full walk of 
all documents matching a query -- picking up where they let off if more 
documents are added to the index that match (for example: when doing an 
ascending sort on a numeric uniqueKey field that always increases as new docs 
are added, sorting by a timestamp field (asc) indicating when documents are 
crawled, etc...)

This also works as you would expect for searches that don't match any 
documents...

{code:title=http://localhost:8983/solr/deep?q=text:bogus&rows=20&sort=id+desc&searchAfter=*}
{
  "responseHeader":{
    "status":0,
    "QTime":21},
  "response":{"numFound":0,"start":-1,"docs":[]
  },
  "nextSearchAfter":"*"}
{code}


> Provide cursor/token based "searchAfter" support that works with arbitrary 
> sorting (ie: "deep paging")
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5463
>                 URL: https://issues.apache.org/jira/browse/SOLR-5463
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>         Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch
>
>
> I'd like to revist a solution to the problem of "deep paging" in Solr, 
> leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
> at the lucene level: require the clients to provide back a token indicating 
> the sort values of the last document seen on the previous "page".  This is 
> similar to the "cursor" model I've seen in several other REST APIs that 
> support "pagnation" over a large sets of results (notable the twitter API and 
> it's "since_id" param) except that we'll want something that works with 
> arbitrary multi-level sort critera that can be either ascending or descending.
> SOLR-1726 laid some initial ground work here and was commited quite a while 
> ago, but the key bit of argument parsing to leverage it was commented out due 
> to some problems (see comments in that issue).  It's also somewhat out of 
> date at this point: at the time it was commited, IndexSearcher only supported 
> searchAfter for simple scores, not arbitrary field sorts; and the params 
> added in SOLR-1726 suffer from this limitation as well.
> ---
> I think it would make sense to start fresh with a new issue with a focus on 
> ensuring that we have deep paging which:
> * supports arbitrary field sorts in addition to sorting by score
> * works in distributed mode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")

Reply via email to