[ 
https://issues.apache.org/jira/browse/COUCHDB-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092003#comment-13092003
 ] 

Paul Joseph Davis commented on COUCHDB-834:
-------------------------------------------

@Prudhvi

No its not a bug. You can basically think of this if you switched all of your 
emit calls from emit(key, val) to emit([key, doc._id], val) and then just 
change all of your startkey values to [startkey, docid].

The important part to remember here is that this is extremely simple. Consider 
a large sorted array. All that the various key related options are doing is 
defining a slice of this array to return. At it's most basic this is how all 
indexing works. You just need to find the part in a sorted list that is 
relevant to your query.

In this particular case, its just important that sorting only looks at as much 
of a key as is necessary to make a decision. Given something like these two 
keys:

    [1, 2, 3, 100]
    [1, 2, 4, 0]

We have to look at the first three positions to determine the sorting. The 
first position is equal, so we check the second which is also equal, then the 
third position finally tells us that 3 < 4 and we can stop looking. The values 
100 and 0 will never be considered in defining the sort order between *these 
two* keys. If a third key came in that was [1, 2, 3, 99] then we would have to 
compare 99 < 100 to figure out that it goes first in the list.

The startkey_docid parameter is slightly special here. Internally all index 
keys are stored as a 2-tuple of {Key, DocId} for bookkeeping so that we can do 
incremental map/reduce. This also allows HTTP requests to differentiate between 
identical keys coming from multiple documents. But as in the example above, the 
DocId will never be consulted unless the Keys were identical.

@Randall

I think you said that backwards. The only issue that's similar is that 
startkey_docid has no effect if startkey isn't specified. That could be a 400, 
but whenever I try and make the HTTP query parsing strict people tell me to 
Relax and I die a little inside.

> startkey_docid/endkey_docid don't work without an exact startkey/endkey match
> -----------------------------------------------------------------------------
>
>                 Key: COUCHDB-834
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-834
>             Project: CouchDB
>          Issue Type: Bug
>          Components: JavaScript View Server
>    Affects Versions: 1.0
>            Reporter: Mathias Meyer
>
> This issue popped up when I wanted to paginate through a list of documents 
> using a combined array key, using a startkey and endkey that's based solely 
> on the first part of said key. First part is a reference to a different 
> document, second part is a timestamp to keep the list sorted by creation 
> time. The list of documents can be fetched using startkey=["key"] and 
> endkey=["key", {}]
> Now, I wanted to add pagination to this list, only fetching so many documents 
> starting at startkey_docid, which failed using this setup. It seems (and Jan 
> validated that assumption by analyzing the source) that both startkey needs 
> to be an exact match for startkey_docid to have any effect. If there's no 
> exact match, CouchDB will silently ignore the startkey_docid, a behaviour 
> that's undocumented and to be quite frank, unintuitive.
> Consider the following two documents, both pointing to the same other_id:
> {"_id": "one", "other_id": "other", "second_key": "one"}
> {"_id": "two", "other_id": "other", "second_key": "two"}
> And a simple map/reduce function that just emits the combined key:
> {
>    "other_documents": {
>        "reduce": "_sum",
>        "map": "          function(doc) { \n emit([doc.other_id, 
> doc.second_key], 1);\n  }\n"
>    }
> }
> Querying the view like this gives the expected results:
> curl 
> 'http://localhost:5984/startkey_bug/_design/other_documents/_view/other_documents?reduce=false&startkey=\["other"\]&endkey=\["other",\{\}\]'
> {"total_rows":2,"offset":0,"rows":[
> {"id":"one","key":["other","one"],"value":1},
> {"id":"two","key":["other","two"],"value":1}
> ]}
> If I add in a startkey_docid of two, I'd expect CouchDB to skip to the second 
> result in the list, skipping the first, but it doesn't:
> curl 
> 'http://localhost:5984/startkey_bug/_design/other_documents/_view/other_documents?reduce=false&startkey=\["other"\]&endkey=\["other",\{\}\]&startkey_docid=two'
> {"total_rows":2,"offset":0,"rows":[
> {"id":"one","key":["other","one"],"value":1},
> {"id":"two","key":["other","two"],"value":1}
> ]}
> However, it does what I'd expect when I specify an exact startkey (the endkey 
> is still the same):
> curl 
> 'http://localhost:5984/startkey_bug/_design/other_documents/_view/other_documents?reduce=false&startkey=\["other","one"\]&endkey=\["other",\{\}\]&startkey_docid=two'
> {"total_rows":2,"offset":1,"rows":[
> {"id":"two","key":["other","two"],"value":1}
> ]}
> If you add in an exact endkey, the situation doesn't change, and the result 
> is as expected.
> Having an exact startkey is an acceptable workaround, but I'd still say this 
> behaviour is not intuitive, and either should be fixed to work the same in 
> all of the above situations. If not, at least the documentation should 
> properly reflect these situation, explaining the proper workarounds.
> Update: I just checked how this works out when using descending=true, the 
> same is true for the swapped endkey and startkey parameters. Specifying and 
> endkey_docid requires to specify an exact endkey match.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to