I think I addressed all comments and created an RFC https://github.com/apache/couchdb-documentation/pull/530
On 2020/04/28 11:56:15, Ilya Khlopotov <iil...@apache.org> wrote: > Hello, > > I would like to introduce second proposal. > > 1) Add new optional query field called `bookmark` (or `token`) to following > endpoints > - {db}/_all_docs > - {db}/_all_docs/queries > - _dbs_info > - {db}/_design/{ddoc}/_view/{view} > - {db}/_design/{ddoc}/_view/{view}/queries > 2) Add following additional fields into response: > ``` > "first": { > "href": > "https://myserver.com/myddb/_all_docs?limit=50&descending=true" > }, > "previous": { > "href": "https://myserver.com/myddb/_all_docs?bookmark=983uiwfjkdsdf" > }, > "next": { > "href": "https://myserver.com/myddb/_all_docs?bookmark=12343tyekf3" > }, > ``` > 3) Implement per-endpoint configurable max limits > ``` > [request_limits] > _all_docs = 5000 > _all_docs/queries = 5000 > _all_dbs = 5000 > _dbs_info = 5000 > _view = 2500 > _view/queries = 2500 > _find = 2500 > ``` > 4) Implement following semantics: > - The bookmark would be opaque token and would include information needed > to ensure proper pagination without the need to repeat initial parameters of > the request. In fact we might prohibit setting additional parameters when > bookmark query field is specified. > - don't use delayed responses when `bookmark` field is provided > - don't use delayed responses when `limit` query key is specified and when > it is below the max limit > - return 400 when limit query key is specified and it is greater than the > max limit > - return 400 when we stream rows (in case when `limit` query key wasn't > specified) and reach max limit > - the `previous`/`next`/`first` keys are optional and we omit them for the > cases they don't make sense > > Latter on we would introduce API versioning and deal with `{db}/_changes` and > `_all_docs` endpoints. > > Questions: > - `bookmark` vs `token`? > - should we prohibit setting other fields when bookmark is set? > - `previous`/`next`/`first` as href vs token value itself (i.e. `{"previous": > "983uiwfjkdsdf", "next": "12343tyekf3", "first": "iekjhfwo034"}`) > > Best regards, > iilyak > > On 2020/04/22 20:18:57, Ilya Khlopotov <iil...@apache.org> wrote: > > Hello everyone, > > > > Based on the discussions on the thread I would like to propose a number of > > first steps: > > 1) introduce new endpoints > > - {db}/_all_docs/page > > - {db}/_all_docs/queries/page > > - _all_dbs/page > > - _dbs_info/page > > - {db}/_design/{ddoc}/_view/{view}/page > > - {db}/_design/{ddoc}/_view/{view}/queries/page > > - {db}/_find/page > > > > These new endpoints would act as follows: > > - don't use delayed responses > > - return object with following structure > > ``` > > { > > "total": Total, > > "bookmark": base64 encoded opaque value, > > "completed": true | false, > > "update_seq": when available, > > "page": current page number, > > "items": [ > > ] > > } > > ``` > > - the bookmark would include following data (base64 or protobuff???): > > - direction > > - page > > - descending > > - endkey > > - endkey_docid > > - inclusive_end > > - startkey > > - startkey_docid > > - last_key > > - update_seq > > - timestamp > > ``` > > > > 2) Implement per-endpoint configurable max limits > > ``` > > _all_docs = 5000 > > _all_docs/queries = 5000 > > _all_dbs = 5000 > > _dbs_info = 5000 > > _view = 2500 > > _view/queries = 2500 > > _find = 2500 > > ``` > > > > Latter (after few years) CouchDB would deprecate and remove old endpoints. > > > > Best regards, > > iilyak > > > > On 2020/02/19 22:39:45, Nick Vatamaniuc <vatam...@apache.org> wrote: > > > Hello everyone, > > > > > > I'd like to discuss the shape and behavior of streaming APIs for CouchDB > > > 4.x > > > > > > By "streaming APIs" I mean APIs which stream data in row as it gets > > > read from the database. These are the endpoints I was thinking of: > > > > > > _all_docs, _all_dbs, _dbs_info and query results > > > > > > I want to focus on what happens when FoundationDB transactions > > > time-out after 5 seconds. Currently, all those APIs except _changes[1] > > > feeds, will crash or freeze. The reason is because the > > > transaction_too_old error at the end of 5 seconds is retry-able by > > > default, so the request handlers run again and end up shoving the > > > whole request down the socket again, headers and all, which is > > > obviously broken and not what we want. > > > > > > There are few alternatives discussed in couchdb-dev channel. I'll > > > present some behaviors but feel free to add more. Some ideas might > > > have been discounted on the IRC discussion already but I'll present > > > them anyway in case is sparks further conversation: > > > > > > A) Do what _changes[1] feeds do. Start a new transaction and continue > > > streaming the data from the next key after last emitted in the > > > previous transaction. Document the API behavior change that it may > > > present a view of the data is never a point-in-time[4] snapshot of the > > > DB. > > > > > > - Keeps the API shape the same as CouchDB <4.0. Client libraries > > > don't have to change to continue using these CouchDB 4.0 endpoints > > > - This is the easiest to implement since it would re-use the > > > implementation for _changes feed (an extra option passed to the fold > > > function). > > > - Breaks API behavior if users relied on having a point-in-time[4] > > > snapshot view of the data. > > > > > > B) Simply end the stream. Let the users pass a `?transaction=true` > > > param which indicates they are aware the stream may end early and so > > > would have to paginate from the last emitted key with a skip=1. This > > > will keep the request bodies the same as current CouchDB. However, if > > > the users got all the data one request, they will end up wasting > > > another request to see if there is more data available. If they didn't > > > get any data they might have a too large of a skip value (see [2]) so > > > would have to guess different values for start/end keys. Or impose max > > > limit for the `skip` parameter. > > > > > > C) End the stream and add a final metadata row like a "transaction": > > > "timeout" at the end. That will let the user know to keep paginating > > > from the last key onward. This won't work for `_all_dbs` and > > > `_dbs_info`[3] Maybe let those two endpoints behave like _changes > > > feeds and only use this for views and and _all_docs? If we like this > > > choice, let's think what happens for those as I couldn't come up with > > > anything decent there. > > > > > > D) Same as C but to solve the issue with skips[2], emit a bookmark > > > "key" of where the iteration stopped and the current "skip" and > > > "limit" params, which would keep decreasing. Then user would pass > > > those in "start_key=..." in the next request along with the limit and > > > skip params. So something like "continuation":{"skip":599, "limit":5, > > > "key":"..."}. This has the same issue with array results for > > > `_all_dbs` and `_dbs_info`[3]. > > > > > > E) Enforce low `limit` and `skip` parameters. Enforce maximum values > > > there such that response time is likely to fit in one transaction. > > > This could be tricky as different runtime environments will have > > > different characteristics. Also, if the timeout happens there isn't a > > > a nice way to send an HTTP error since we already sent the 200 > > > response. The downside is that this might break how some users use the > > > API, if say the are using large skips and limits already. Perhaps here > > > we do both B and D, such that if users want transactional behavior, > > > they specify that `transaction=true` param and only then we enforce > > > low limit and skip maximums. > > > > > > F) At least for `_all_docs` it seems providing a point-in-time > > > snapshot view doesn't necessarily need to be tied to transaction > > > boundaries. We could check the update sequence of the database at the > > > start of the next transaction and if it hasn't changed we can continue > > > emitting a consistent view. This can apply to C and D and would just > > > determine when the stream ends. If there are no writes happening to > > > the db, this could potential streams all the data just like option A > > > would do. Not entirely sure if this would work for views. > > > > > > So what do we think? I can see different combinations of options here, > > > maybe even different for each API point. For example `_all_dbs`, > > > `_dbs_info` are always A, and `_all_docs` and views default to A but > > > have parameters to do F, etc. > > > > > > Cheers, > > > -Nick > > > > > > Some footnotes: > > > > > > [1] _changes feeds is the only one that works currently. It behaves as > > > per RFC > > > https://github.com/apache/couchdb-documentation/blob/master/rfcs/003-fdb-seq-index.md#access-patterns. > > > That is, we continue streaming the data by resetting the transaction > > > object and restarting from the last emitted key (db sequence in this > > > case). However, because the transaction restarts if a document is > > > updated while the streaming take place, it may appear in the _changes > > > feed twice. That's a behavior difference from CouchDB < 4.0 and we'd > > > have to document it, since previously we presented this point-in-time > > > snapshot of the database from when we started streaming. > > > > > > [2] Our streaming APIs have both skips and limits. Since FDB doesn't > > > currently support efficient offsets for key selectors > > > (https://apple.github.io/foundationdb/known-limitations.html#dont-use-key-selectors-for-paging) > > > we implemented skip by iterating over the data. This means that a skip > > > of say 100000 could keep timing out the transaction without yielding > > > any data. > > > > > > [3] _all_dbs and _dbs_info return a JSON array so they don't have an > > > obvious place to insert a last metadata row. > > > > > > [4] For example they have a constraint that documents "a" and "z" > > > cannot both be in the database at the same time. But when iterating > > > it's possible that "a" was there at the start. Then by the end, "a" > > > was removed and "z" added, so both "a" and "z" would appear in the > > > emitted stream. Note that FoundationDB has APIs which exhibit the same > > > "relaxed" constrains: > > > https://apple.github.io/foundationdb/api-python.html#module-fdb.locality > > > > > >