Re: [DISCUSS] Streaming API in CouchDB 4.0

Robert Samuel Newson Thu, 23 Apr 2020 14:32:10 -0700

On versioning, I've not seen a better article than this one: 
https://www.troyhunt.com/your-api-versioning-is-wrong-which-is/


For _changes, definitely agree we should be including it in this discussion, it 
is the only endpoint with, in theory, an eternal response, and I think that's a 
bug not a feature these days. CouchDB exists in a wider ecosystem (and often 
behind a load balancer), it would be good to define an upper bound on how long 
you can listen before being forced to query again.

B.

> On 23 Apr 2020, at 22:15, Paul Davis <paul.joseph.da...@gmail.com> wrote:
> 
> I'd agree that my initial reaction to cursor was that its not a great
> fit, but there does seem to be the existing name used in the greater
> REST world for this sort of pagination so I'm not concerned about
> using that terminology.
> 
> I'm generally on board with allowing and setting some default sane
> limits on pages. We probably should have done that quite awhile ago
> after moving to native clustering and now that we have FDB limits I
> think it makes even more sense to have an API that does not lend
> itself to crazy errors when people are just trying to poke at an API.
> 
> I think we're all on board that one of the goals is to make sure that
> clients don't accidentally misinterpret a response. That is, we're
> trying to be quite diligent that a user doesn't get 1000 rows and not
> realize there's another 10 that were beyond the limit. The bookmark
> approach with hard caps seems like a generally fine approach to me.
> The current approach users extra URL path segments to try and avoid
> this confusion. I wonder if we should consider starting to properly
> version our API using one of the many schemes that are used. Having
> read through a few articles I don't have a very clear favorite though.
> 
> As to this particular proposal I do see a couple issues:
> 
> `total` - We can do this in most cases fairly easily. Though it's a
> bit odd for continuous changes.
> 
> `complete` - I'm not sure whether this is entirely possible given the
> API that FDB presents us. Specifically, when we set a range and we get
> back exactly $num_rows in the response, if the data set ended at
> exactly that page I don't think the `more` flag from fdb would tell us
> that. So we'd have a clunky UX there where we say not complete but the
> next page is empty. That's also not to mention that depending on
> whether we're looking at snapshots and so on that there's no way for
> us to know between stateless requests whether there were more rows
> added to the end.
> 
> `page` - This one is just hard/impossible to calculate. FDB doesn't
> provide us with offsets or even an efficient "about how many rows in
> this range?" type queries so providing that would be both inaccurate
> and fairly difficult/expensive to calculate. In some cases I think we
> could have something maybe close that didn't suck too badly, but it'd
> also fall down for changes as well due to the way that updates reorder
> them.
> 
> `update_seq` - I'm just not sure on when this would be useful or what
> it would refer to. Maybe a version stamp of the last change for that
> request? If we had a future API that asked for a snapshot access then
> maybe? But if we did do something there with versionstamps or read
> versions I'd expect that to come with the rest of the API.
> 
> For the bookmark fields:
> 
> `direction` vs `descending` seems like a field duplication to me.
> 
> `page` - This would seem to suggest we could skip to a certain
> location in the results numerically which we are not able to do with
> the FDB API.
> 
> `last_key` vs `start_key` seems like a field duplication. We don't
> need to know where things started I don't think. Just where to start
> from and where to end.
> 
> `update_seq` - is same as earlier. Not entirely sure on the intent there.
> 
> `timestamp` - Expiring bookmarks based on time does not seem like a
> good idea. Both for clock skew and why bother when this would
> functionally just be a convenience API that users could already
> implement for themselves.
> 
> Another thing might also be to provide our bookmark as a full link
> that seems to be fairly standard REST practice these days. Something
> that clients don't have to do any logic with so that we're free to
> change the implementation.
> 
> And lastly, I don't think we should be neglecting the _changes API as
> part of this discussion. I realize that we'll need to support the
> older streaming semantics if we want to maintain replication
> compatibility (which I think we'll all agree is a Good Thing) but it
> also feels a bit wrong to ignore it as part of this work if we're
> going to be modernizing our APIs. Though if we do pick up a good
> versioning scheme then we could theoretically make those changes
> easily enough. Plus, who doesn't want to rewrite chttpd to be a whole
> lot less... chttpd-y?
> 
> 
> On Thu, Apr 23, 2020 at 1:43 PM Robert Samuel Newson <rnew...@apache.org> 
> wrote:
>> 
>> 
>> I think it's a key difference from "cursor" as I've seen them elsewhere, 
>> that ours will point at an ever changing database, you couldn't seamlessly 
>> cursor through a large data set, one "page" at a time.
>> 
>> Bookmarks began in search (raises guilty hand) in order to address a 
>> Lucene-specific issue (that high values of "skip" are incredibly 
>> inefficient, using lots of RAM). That is not true for CouchDB's own indexes, 
>> which can be navigated perfectly with 
>> startkey/endkey/startkey_docid/endkey_docid, etc.
>> 
>> I guess I'm not helping much with these observations but I wouldn't like to 
>> see CouchDB gain an additional and ugly method of doing something already 
>> possible.
>> 
>> B.
>> 
>>> On 23 Apr 2020, at 19:02, Joan Touzet <woh...@apache.org> wrote:
>>> 
>>> I realise this is bikeshedding, but I guess that's kind of the point... 
>>> Everything below is my opinion, not "fact."
>>> 
>>> It's unfortunate we need a new endpoint for all of this. In a vacuum I 
>>> might have just suggested we use the semantics we already have, perhaps 
>>> with ?from= instead of ?since= .
>>> 
>>> "page" only works if the size of a page is well known, either by server 
>>> preference or directly in the URL. If I ask for:
>>> 
>>> GET /{db}/_all_docs?limit=20&page=3
>>> 
>>> I know that I'm always going to get document 41 through 60 in the default 
>>> collation order.
>>> 
>>> There's a *fantastic* summary of examples from popular REST APIs here:
>>> 
>>> https://medium.com/@ignaciochiazzo/paginating-requests-in-apis-d4883d4c1c4c
>>> 
>>> We are *pretty close* to what a cursor means in those other examples, 
>>> except for the fact that our cursor can go stale/invalid after a short time.
>>> 
>>> Bob, could you be a bit more detailed in your explanation how our 
>>> definition isn't close to these? Or did you mean SQL CURSOR (which is 
>>> something entirely different?) If so, I'm fine with this being a REST API 
>>> cursor - something clearly distinct.
>>> 
>>> I come back to wanting to preserve the existing endpoint syntax and naming, 
>>> without new endpoints, but specifying this new FDB token via ?cursor= and 
>>> this being the trigger for the new behaviour. At some point, we simply stop 
>>> accepting ?since= tokens. This seems inline with other popular REST APIs.
>>> 
>>> -Joan "still sick and not sleeping right" Touzet
>>> 
>>> 
>>> On 2020-04-23 12:30, Robert Newson wrote:
>>>> cursor has established meaning in other databases and ours would not be 
>>>> very close to them. I don’t think it’s a good idea.
>>>> B.
>>>>> On 23 Apr 2020, at 11:50, Ilya Khlopotov <iil...@apache.org> wrote:
>>>>> 
>>>>> 
>>>>>> 
>>>>>> The best I could come up with is replacing page with
>>>>>> cursor - {db}/_all_docs/cursor or possibly {db}/_cursor/_all_docs
>>>>> Good idea, I like {db}/_all_docs/cursor (or {db}/_all_docs/_cursor).
>>>>> 
>>>>>> On 2020/04/23 08:54:36, Garren Smith <gar...@apache.org> wrote:
>>>>>> I agree with Bob that page doesn't make sense as an endpoint. I'm also
>>>>>> rubbish with naming. The best I could come up with is replacing page with
>>>>>> cursor - {db}/_all_docs/cursor or possibly {db}/_cursor/_all_docs
>>>>>> All the fields in the bookmark make sense except timestamp. Why would it
>>>>>> matter if the timestamp is old? What happens if a node's time is an hour
>>>>>> behind another node?
>>>>>> 
>>>>>> 
>>>>>>> On Thu, Apr 23, 2020 at 4:55 AM Ilya Khlopotov <iil...@apache.org> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> - page is to provide some notion of progress for user
>>>>>>> - timestamp - I was thinking that we should drop requests if user would
>>>>>>> try to pass bookmark created an hour ago.
>>>>>>> 
>>>>>>> On 2020/04/22 21:58:40, Robert Samuel Newson <rnew...@apache.org> wrote:
>>>>>>>> "page" and "page number" are odd to me as these don't exist as 
>>>>>>>> concepts,
>>>>>>> I'd rather not invent them. I note there's no mention of page size, 
>>>>>>> which
>>>>>>> makes "page number" very vague.
>>>>>>>> 
>>>>>>>> What is "timestamp" in the bookmark and what effect does it have when
>>>>>>> the bookmark is passed back in?
>>>>>>>> 
>>>>>>>> I guess, why does the bookmark include so much extraneous data? Items
>>>>>>> that are not needed to find the fdb key to begin the next response from.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 22 Apr 2020, at 21:18, Ilya Khlopotov <iil...@apache.org> wrote:
>>>>>>>>> 
>>>>>>>>> Hello everyone,
>>>>>>>>> 
>>>>>>>>> Based on the discussions on the thread I would like to propose a
>>>>>>> number of first steps:
>>>>>>>>> 1) introduce new endpoints
>>>>>>>>> - {db}/_all_docs/page
>>>>>>>>> - {db}/_all_docs/queries/page
>>>>>>>>> - _all_dbs/page
>>>>>>>>> - _dbs_info/page
>>>>>>>>> - {db}/_design/{ddoc}/_view/{view}/page
>>>>>>>>> - {db}/_design/{ddoc}/_view/{view}/queries/page
>>>>>>>>> - {db}/_find/page
>>>>>>>>> 
>>>>>>>>> These new endpoints would act as follows:
>>>>>>>>> - don't use delayed responses
>>>>>>>>> - return object with following structure
>>>>>>>>> ```
>>>>>>>>> {
>>>>>>>>>   "total": Total,
>>>>>>>>>   "bookmark": base64 encoded opaque value,
>>>>>>>>>   "completed": true | false,
>>>>>>>>>   "update_seq": when available,
>>>>>>>>>   "page": current page number,
>>>>>>>>>   "items": [
>>>>>>>>>   ]
>>>>>>>>> }
>>>>>>>>> ```
>>>>>>>>> - the bookmark would include following data (base64 or protobuff???):
>>>>>>>>> - direction
>>>>>>>>> - page
>>>>>>>>> - descending
>>>>>>>>> - endkey
>>>>>>>>> - endkey_docid
>>>>>>>>> - inclusive_end
>>>>>>>>> - startkey
>>>>>>>>> - startkey_docid
>>>>>>>>> - last_key
>>>>>>>>> - update_seq
>>>>>>>>> - timestamp
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> 2) Implement per-endpoint configurable max limits
>>>>>>>>> ```
>>>>>>>>> _all_docs = 5000
>>>>>>>>> _all_docs/queries = 5000
>>>>>>>>> _all_dbs = 5000
>>>>>>>>> _dbs_info = 5000
>>>>>>>>> _view = 2500
>>>>>>>>> _view/queries = 2500
>>>>>>>>> _find = 2500
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> Latter (after few years) CouchDB would deprecate and remove old
>>>>>>> endpoints.
>>>>>>>>> 
>>>>>>>>> Best regards,
>>>>>>>>> iilyak
>>>>>>>>> 
>>>>>>>>> On 2020/02/19 22:39:45, Nick Vatamaniuc <vatam...@apache.org> wrote:
>>>>>>>>>> Hello everyone,
>>>>>>>>>> 
>>>>>>>>>> I'd like to discuss the shape and behavior of streaming APIs for
>>>>>>> CouchDB 4.x
>>>>>>>>>> 
>>>>>>>>>> By "streaming APIs" I mean APIs which stream data in row as it gets
>>>>>>>>>> read from the database. These are the endpoints I was thinking of:
>>>>>>>>>> 
>>>>>>>>>> _all_docs, _all_dbs, _dbs_info  and query results
>>>>>>>>>> 
>>>>>>>>>> I want to focus on what happens when FoundationDB transactions
>>>>>>>>>> time-out after 5 seconds. Currently, all those APIs except 
>>>>>>>>>> _changes[1]
>>>>>>>>>> feeds, will crash or freeze. The reason is because the
>>>>>>>>>> transaction_too_old error at the end of 5 seconds is retry-able by
>>>>>>>>>> default, so the request handlers run again and end up shoving the
>>>>>>>>>> whole request down the socket again, headers and all, which is
>>>>>>>>>> obviously broken and not what we want.
>>>>>>>>>> 
>>>>>>>>>> There are few alternatives discussed in couchdb-dev channel. I'll
>>>>>>>>>> present some behaviors but feel free to add more. Some ideas might
>>>>>>>>>> have been discounted on the IRC discussion already but I'll present
>>>>>>>>>> them anyway in case is sparks further conversation:
>>>>>>>>>> 
>>>>>>>>>> A) Do what _changes[1] feeds do. Start a new transaction and continue
>>>>>>>>>> streaming the data from the next key after last emitted in the
>>>>>>>>>> previous transaction. Document the API behavior change that it may
>>>>>>>>>> present a view of the data is never a point-in-time[4] snapshot of 
>>>>>>>>>> the
>>>>>>>>>> DB.
>>>>>>>>>> 
>>>>>>>>>> - Keeps the API shape the same as CouchDB <4.0. Client libraries
>>>>>>>>>> don't have to change to continue using these CouchDB 4.0 endpoints
>>>>>>>>>> - This is the easiest to implement since it would re-use the
>>>>>>>>>> implementation for _changes feed (an extra option passed to the fold
>>>>>>>>>> function).
>>>>>>>>>> - Breaks API behavior if users relied on having a point-in-time[4]
>>>>>>>>>> snapshot view of the data.
>>>>>>>>>> 
>>>>>>>>>> B) Simply end the stream. Let the users pass a `?transaction=true`
>>>>>>>>>> param which indicates they are aware the stream may end early and so
>>>>>>>>>> would have to paginate from the last emitted key with a skip=1. This
>>>>>>>>>> will keep the request bodies the same as current CouchDB. However, if
>>>>>>>>>> the users got all the data one request, they will end up wasting
>>>>>>>>>> another request to see if there is more data available. If they 
>>>>>>>>>> didn't
>>>>>>>>>> get any data they might have a too large of a skip value (see [2]) so
>>>>>>>>>> would have to guess different values for start/end keys. Or impose 
>>>>>>>>>> max
>>>>>>>>>> limit for the `skip` parameter.
>>>>>>>>>> 
>>>>>>>>>> C) End the stream and add a final metadata row like a "transaction":
>>>>>>>>>> "timeout" at the end. That will let the user know to keep paginating
>>>>>>>>>> from the last key onward. This won't work for `_all_dbs` and
>>>>>>>>>> `_dbs_info`[3] Maybe let those two endpoints behave like _changes
>>>>>>>>>> feeds and only use this for views and and _all_docs? If we like this
>>>>>>>>>> choice, let's think what happens for those as I couldn't come up with
>>>>>>>>>> anything decent there.
>>>>>>>>>> 
>>>>>>>>>> D) Same as C but to solve the issue with skips[2], emit a bookmark
>>>>>>>>>> "key" of where the iteration stopped and the current "skip" and
>>>>>>>>>> "limit" params, which would keep decreasing. Then user would pass
>>>>>>>>>> those in "start_key=..." in the next request along with the limit and
>>>>>>>>>> skip params. So something like "continuation":{"skip":599, "limit":5,
>>>>>>>>>> "key":"..."}. This has the same issue with array results for
>>>>>>>>>> `_all_dbs` and `_dbs_info`[3].
>>>>>>>>>> 
>>>>>>>>>> E) Enforce low `limit` and `skip` parameters. Enforce maximum values
>>>>>>>>>> there such that response time is likely to fit in one transaction.
>>>>>>>>>> This could be tricky as different runtime environments will have
>>>>>>>>>> different characteristics. Also, if the timeout happens there isn't a
>>>>>>>>>> a nice way to send an HTTP error since we already sent the 200
>>>>>>>>>> response. The downside is that this might break how some users use 
>>>>>>>>>> the
>>>>>>>>>> API, if say the are using large skips and limits already. Perhaps 
>>>>>>>>>> here
>>>>>>>>>> we do both B and D, such that if users want transactional behavior,
>>>>>>>>>> they specify that `transaction=true` param and only then we enforce
>>>>>>>>>> low limit and skip maximums.
>>>>>>>>>> 
>>>>>>>>>> F) At least for `_all_docs` it seems providing a point-in-time
>>>>>>>>>> snapshot view doesn't necessarily need to be tied to transaction
>>>>>>>>>> boundaries. We could check the update sequence of the database at the
>>>>>>>>>> start of the next transaction and if it hasn't changed we can 
>>>>>>>>>> continue
>>>>>>>>>> emitting a consistent view. This can apply to C and D and would just
>>>>>>>>>> determine when the stream ends. If there are no writes happening to
>>>>>>>>>> the db, this could potential streams all the data just like option A
>>>>>>>>>> would do. Not entirely sure if this would work for views.
>>>>>>>>>> 
>>>>>>>>>> So what do we think? I can see different combinations of options 
>>>>>>>>>> here,
>>>>>>>>>> maybe even different for each API point. For example `_all_dbs`,
>>>>>>>>>> `_dbs_info` are always A, and `_all_docs` and views default to A but
>>>>>>>>>> have parameters to do F, etc.
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> -Nick
>>>>>>>>>> 
>>>>>>>>>> Some footnotes:
>>>>>>>>>> 
>>>>>>>>>> [1] _changes feeds is the only one that works currently. It behaves 
>>>>>>>>>> as
>>>>>>>>>> per RFC
>>>>>>> https://github.com/apache/couchdb-documentation/blob/master/rfcs/003-fdb-seq-index.md#access-patterns
>>>>>>> .
>>>>>>>>>> That is, we continue streaming the data by resetting the transaction
>>>>>>>>>> object and restarting from the last emitted key (db sequence in this
>>>>>>>>>> case). However, because the transaction restarts if a document is
>>>>>>>>>> updated while the streaming take place, it may appear in the _changes
>>>>>>>>>> feed twice. That's a behavior difference from CouchDB < 4.0 and we'd
>>>>>>>>>> have to document it, since previously we presented this point-in-time
>>>>>>>>>> snapshot of the database from when we started streaming.
>>>>>>>>>> 
>>>>>>>>>> [2] Our streaming APIs have both skips and limits. Since FDB doesn't
>>>>>>>>>> currently support efficient offsets for key selectors
>>>>>>>>>> (
>>>>>>> https://apple.github.io/foundationdb/known-limitations.html#dont-use-key-selectors-for-paging
>>>>>>> )
>>>>>>>>>> we implemented skip by iterating over the data. This means that a 
>>>>>>>>>> skip
>>>>>>>>>> of say 100000 could keep timing out the transaction without yielding
>>>>>>>>>> any data.
>>>>>>>>>> 
>>>>>>>>>> [3] _all_dbs and _dbs_info return a JSON array so they don't have an
>>>>>>>>>> obvious place to insert a last metadata row.
>>>>>>>>>> 
>>>>>>>>>> [4] For example they have a constraint that documents "a" and "z"
>>>>>>>>>> cannot both be in the database at the same time. But when iterating
>>>>>>>>>> it's possible that "a" was there at the start. Then by the end, "a"
>>>>>>>>>> was removed and "z" added, so both "a" and "z" would appear in the
>>>>>>>>>> emitted stream. Note that FoundationDB has APIs which exhibit the 
>>>>>>>>>> same
>>>>>>>>>> "relaxed" constrains:
>>>>>>>>>> 
>>>>>>> https://apple.github.io/foundationdb/api-python.html#module-fdb.locality
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>

Re: [DISCUSS] Streaming API in CouchDB 4.0

Reply via email to