[jira] [Commented] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")

Hoss Man (JIRA) Tue, 10 Dec 2013 15:38:34 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844820#comment-13844820
 ]


Hoss Man commented on SOLR-5463:
--------------------------------

bq. I think that error message should include the param name (cursor) that 
couldn't be parsed.

Agreed ... the current error text is basically just a placeholder, ideally it 
should be something like...

{code}
Unable to parse cursor param: value must either be '*' or the cursorContinue 
value from a previous search: NOK
{code}

bq. Also, maybe it would be useful to include a prefix that will (probably) 
never be used in unique ids, to visually identify the cursor as such: like 
always perpending '*'?

Hmmm, I'm not sure if that's really worth the added bytes & parsing. 

If folks really felt like the param name should be "searchAfter" then i could 
certainly see the value in having some clear prefix, since the param name might 
lead folks to assuming they know what hte input should be; but with "cursor" i 
don't think we need to worry as much about people assuming they know what to 
put there, and with a clear error message instructing people how to get a valid 
cursor (from cursorContinue), that seems good enough. (right?)

bq. the Base64-encoded text is used verbatim, including the trailing padding 
'=' characters - these could be stripped out for external use (since they're 
there just to make the string length divisible by four), and then added back 
before Base64-decoding. In a URL non-metacharacter '='-s look weird, since 
they're already used to separate param names and values.

Interesting idea ... again: i'm not sure how i feel about the added overhead to 
the parsing just to shorten the totem -- especially since clients will always 
need to safely url encode anyway since Base64 strings can also include "+"

However....  

In the current patch, I used the base64 utility class Solr already had (used by 
BinaryField and a few other places).  But your suggestion reminds me that 
commons codec's Base64 class (jar already used by solr) supports a "url safe" 
variant of base64 (which looks like it's defined in RFC 4648?)...

https://commons.apache.org/proper/commons-codec/javadocs/api-release/org/apache/commons/codec/binary/Base64.html#encodeBase64URLSafeString(byte[])

...something to consider.

----

One other comment i got from a coworker offline was why I liked 
{{cursorContinue}} instead of {{nextCursor}} or {{cursorNext}}.  My thinking 
was that since 'cursor', (as a concept) is a noun, "next cursor" might suggest 
that it was a (different) cursor then the one currently in use.  I don't want 
people to think these strings are _names_ of cursors, and they re-use the same 
name until they are done with it.  I want to make it clear that to _continue_ 
fetching results from this cursor, you have to specify the new value.

Would "{{cursorAdvance}}" convey that better then {{cursorContinue}} ?

> Provide cursor/token based "searchAfter" support that works with arbitrary 
> sorting (ie: "deep paging")
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5463
>                 URL: https://issues.apache.org/jira/browse/SOLR-5463
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>         Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
> SOLR-5463__straw_man.patch
>
>
> I'd like to revist a solution to the problem of "deep paging" in Solr, 
> leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
> at the lucene level: require the clients to provide back a token indicating 
> the sort values of the last document seen on the previous "page".  This is 
> similar to the "cursor" model I've seen in several other REST APIs that 
> support "pagnation" over a large sets of results (notable the twitter API and 
> it's "since_id" param) except that we'll want something that works with 
> arbitrary multi-level sort critera that can be either ascending or descending.
> SOLR-1726 laid some initial ground work here and was commited quite a while 
> ago, but the key bit of argument parsing to leverage it was commented out due 
> to some problems (see comments in that issue).  It's also somewhat out of 
> date at this point: at the time it was commited, IndexSearcher only supported 
> searchAfter for simple scores, not arbitrary field sorts; and the params 
> added in SOLR-1726 suffer from this limitation as well.
> ---
> I think it would make sense to start fresh with a new issue with a focus on 
> ensuring that we have deep paging which:
> * supports arbitrary field sorts in addition to sorting by score
> * works in distributed mode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")

Reply via email to