I apologize in advance. I am finding it very very difficult to allocate the time and energy necessary to go deep into any of these topics, and got lost halfway thru Mike Rhodes' email :( So I'm replying to Adam's initial email which is the only one I've fully digested.
On 2019-09-19 18:11, Adam Kocoloski wrote: > Hi all, > > As we’ve gotten more familiar with FoundationDB we’ve come to realize that > acquiring a read version at the beginning of a transaction is a relatively > expensive[*] operation. It’s also a challenging one to scale given the amount > of communication required between proxies and tlogs in order to agree on a > good version. The prototype CouchDB layer we’ve been working on (i.e., the > beginnings of CouchDB 4.0) uses a separate FDB transaction with a new read > version for every request made to CouchDB. I wanted to start a discussion > about ways we might augment that approach while preserving (or even > enhancing) the semantics that we can expose to CouchDB users. > > One thing we can do is cache known versions that FDB has supplied in the past > second in the CouchDB layer and reuse those when a client permits us to do > so. If you like, this is the modern version of `?stale=ok`, but now > applicable to all types of requests. One big downside of this approach is > that if you scale out the members of the CouchDB layer they’ll have different > views of recent FDB versions, and a client whose requests are load-balanced > across members won’t have any guarantee that time moves forward from request > to request. You could imagine gossiping versions between layer members, but > now you’re basically redoing the work that FoundationDB is doing itself. Keeping extra state alive in the CouchDB runtime is something we've always avoided. Maybe if someone's doing keepalives, but even then, if that "someone" is a reverse proxy server, it could have unintended consequences. One alternative is to always keep just one around, and constantly update it every 5s, whether it's used or not (idle server). Read Your Writes has been one of the biggest requests for CouchDB for ages, and we're finally in a place to provide it. The secondary question on my mind is: is that the default, or is old behaviour the default, or is it a configurable default? > Another approach is to communicate the FDB version as part of the response to > each request, and allow the client to set an FDB version as part of a > submitted request. Clients that do this will experience lower latencies for > requests 2..N that share a version, will have the benefit of a consistent > snapshot of the database for all the reads that are executed using the same > version, and can guarantee they read their own writes when interleaving those > operations (assuming any reads following a write use the new FDB version > associated with the write). This second option seems better, but as mentioned later we don't want it to be a transparent FDB token (or convertible into one). This parallels the nonce approach we use in _changes feeds to ensure a stable feed, yeah? > These techniques are not mutually exclusive; a client could acquire a > slightly stale FDB version and then use that for a collection of read > requests that would all observe the same consistent snapshot of the database. > Also, recall that a CouchDB sequence is now essentially the same as an FDB > version, with a little extra metadata to ensure sequences are always > monotonically increasing even when moving a database to a different FDB > cluster. So if you like, this is about allowing requests to be executed as of > a certain sequence (provided that sequence is no more than 5 seconds old). > > I’m refraining from proposing any specific API extensions at this point, > partly because that’s an easy bikeshed and partly because I think whatever > API we’d add would be a primitive that client libraries would use to > construct richer semantics around. I’m also biting my tongue and avoiding any > detailed discussion of the transactional capabilities that CouchDB could > offer by surfacing these versions to clients — but that’s definitely an > interesting topic in its own right! Mike touches on this and I think it's worth careful consideration. If we eschew API changes for 4.0 then we need to decide on the default. And if we're voting, I'd say making RYWs the default (never hanging onto a handle) and then (ab-)using stale=ok or whatever state we have lying around might be sufficient. > Curious to hear what you all think. Thanks, Adam Thanks Adam, this is great. > [*]: I don’t want to come off as alarmist; when I say this operation is > “expensive” I mean it might take a couple of milliseconds depending on FDB > configuration, and FDB can execute 10s of thousands of these per second > without much tuning. But it’s always good to be looking for the next > bottleneck :) This is the really important data point here for me. While Cloudant cares about 2-3 extra ms on the server side, many many MANY CouchDB users don't. Can we benchmark what this looks like when running FDB+CouchDB on a teeny platform like a RasPi? Is it still 2-3ms? What about the average laptop/desktop? Or is it only 2-3ms on a beefy Cloudant-sized server? -Joan