Adam,  Garren, All,

As Garren says, I think exposing the raw read version to users as explicitly a 
read version (i.e., pretty much exposing the raw FDB functionality direct to 
users) will be confusing and make it easy for people to make mistakes given 
there are some hidden semantics to it, not least of which it currently can't be 
more than 5s old :)

I do, however, think there is a large amount of value in enabling clients to 
avoid the overhead of making an FDB read version request for every CouchDB 
request.

For me, the way to expose read versions is to use it as a building block for 
higher level concepts like bounded staleness and read-your-writes type 
guarantees that make sense at an application level. Adding these abstractions 
as concepts in the HTTP API (even if they end up essentially being an opaque 
token which is really the FDB read version) makes sense to me, because there 
are a lot of people who like CouchDB's HTTP API of itself and eschew client 
libraries.

A second big thing to me is how this functionality interacts with the current 
MVCC logic. I think there's a large potential for overlap. What I'm pondering 
is how and whether the names we use in the API can make the "MVCC document rev 
ID" and what is something like a "database rev ID" (or even "CouchDB instance 
rev ID"!) concept feel like one consistent interface rather than two patterns 
bolted together.

An overlap example:

- User issues GET /db/document?stale=true -- stale=true allows the node that 
serves the request to use a cached read version. The request returns a header 
with an opaque token A (basically the read version).

- User issues PUT /db/document -- returns a token=A+1, but another question is 
how we avoid conflicts for the write in the MVCC sense:
   - We could use the existing rev value MVCC mechanics, and so client sends 
rev ID and the server reads the current rev ID, checks it and allows or denies 
the write.
   - Instead, the write could include token A in the request, and on the server 
side we use the "token" as the read value in the transaction, and add the 
document's keys to the FDB transaction's read set, ensuring the transaction 
fails if the document has changed without needing to read the document's 
current rev ID.

It would be nice if we didn't allow two ways to do this, but it'd also be nice 
if the client didn't have to cope with several rev-ID like things.

---

Obviously we can enable read-your-writes with this:

- User issues a POST /db/_find?token=A+1. We can use A+1 as the read value to 
ensure we see the previous write.
   - I guess if we send a read version that's too old, FDB will have some way 
to tell us that?

---

A separate question. Can we / are we looking at embedding the read version into 
the document rev ID? I wondered if that could be used to avoid a read request 
to FDB to read the current rev ID in some cases, because we could leverage 
FDB's semantics as above.

-- 
Mike.

On Mon, 23 Sep 2019, at 13:31, Garren Smith wrote:
> Hi Adam,
> 
> In general, I like this idea especially with the future possibility of
> adding transactions to CouchDB. What makes me a little nervous is that this
> requires a fair amount of knowledge of CouchDB and FDB for a user to fully
> understand what is happening and could be a potential place where a user
> could get it horribly wrong or cause unnecessary issues. I would prefer
> that a user has to explicitly opt into this functionality, either by
> changing config or via adding another field in the HTTP header or a query
> parameter.
> 
> Cheers
> Garren
> 
> On Fri, Sep 20, 2019 at 12:11 AM Adam Kocoloski <kocol...@apache.org> wrote:
> 
> > Hi all,
> >
> > As we’ve gotten more familiar with FoundationDB we’ve come to realize that
> > acquiring a read version at the beginning of a transaction is a relatively
> > expensive[*] operation. It’s also a challenging one to scale given the
> > amount of communication required between proxies and tlogs in order to
> > agree on a good version. The prototype CouchDB layer we’ve been working on
> > (i.e., the beginnings of CouchDB 4.0) uses a separate FDB transaction with
> > a new read version for every request made to CouchDB. I wanted to start a
> > discussion about ways we might augment that approach while preserving (or
> > even enhancing) the semantics that we can expose to CouchDB users.
> >
> > One thing we can do is cache known versions that FDB has supplied in the
> > past second in the CouchDB layer and reuse those when a client permits us
> > to do so. If you like, this is the modern version of `?stale=ok`, but now
> > applicable to all types of requests. One big downside of this approach is
> > that if you scale out the members of the CouchDB layer they’ll have
> > different views of recent FDB versions, and a client whose requests are
> > load-balanced across members won’t have any guarantee that time moves
> > forward from request to request. You could imagine gossiping versions
> > between layer members, but now you’re basically redoing the work that
> > FoundationDB is doing itself.
> >
> > Another approach is to communicate the FDB version as part of the response
> > to each request, and allow the client to set an FDB version as part of a
> > submitted request. Clients that do this will experience lower latencies for
> > requests 2..N that share a version, will have the benefit of a consistent
> > snapshot of the database for all the reads that are executed using the same
> > version, and can guarantee they read their own writes when interleaving
> > those operations (assuming any reads following a write use the new FDB
> > version associated with the write).
> >
> 
> > These techniques are not mutually exclusive; a client could acquire a
> > slightly stale FDB version and then use that for a collection of read
> > requests that would all observe the same consistent snapshot of the
> > database.  Also, recall that a CouchDB sequence is now essentially the same
> > as an FDB version, with a little extra metadata to ensure sequences are
> > always monotonically increasing even when moving a database to a different
> > FDB cluster. So if you like, this is about allowing requests to be executed
> > as of a certain sequence (provided that sequence is no more than 5 seconds
> > old).
> >
> > I’m refraining from proposing any specific API extensions at this point,
> > partly because that’s an easy bikeshed and partly because I think whatever
> > API we’d add would be a primitive that client libraries would use to
> > construct richer semantics around. I’m also biting my tongue and avoiding
> > any detailed discussion of the transactional capabilities that CouchDB
> > could offer by surfacing these versions to clients — but that’s definitely
> > an interesting topic in its own right!
> >
> > Curious to hear what you all think. Thanks, Adam
> >
> > [*]: I don’t want to come off as alarmist; when I say this operation is
> > “expensive” I mean it might take a couple of milliseconds depending on FDB
> > configuration, and FDB can execute 10s of thousands of these per second
> > without much tuning. But it’s always good to be looking for the next
> > bottleneck :)
> >
> >
>

Reply via email to