I think it’s fine to start a focused discussion here as it might help inform 
some of the broader debate over in that thread.

As a reminder, today CouchDB writes the entire body of each document revision 
on disk as a separate blob. Edit conflicts that have common fields between them 
do not share any storage on disk. The revision tree is encoded into a compact 
format and a copy of it is stored directly in both the by_id tree and the 
by_seq tree. Each leaf entry in the revision tree contain a pointer to the 
position of the associated doc revision on disk.

As a further reminder, CouchDB 2.x clusters can generate edit conflict 
revisions just from multiple clients concurrently updating the same document in 
a single cluster. This won’t happen when FoundationDB is running under the 
hood, but users who deploy multiple CouchDB or PouchDB servers and replicate 
between them can of course still produce conflicts just like they could in 
CouchDB 1.x, so we need a solution.

Let’s consider the two sub-topics separately: 1) storage of edit conflict 
bodies and 2) revision trees

## Edit Conflict Storage

The simplest possible solution would be to store each document revision 
separately, like we do today. We could store document bodies with (“docid”, 
“revid”) as the key prefix, and each transaction could clear the key range 
associated with the base revision against which the edit is being attempted. 
This would work, but I think we can try to be a bit more clever and save on 
storage space given that we’re splitting JSON documents into multiple KV pairs.

One thought I’d had is to introduce a special enum Value which indicates that 
the subtree “beneath” the given Key is in conflict. For example, consider the 
documents

{
    “_id”: “foo”,
    “_rev”: “1-abc”,
    “owner”: “alice”,
    “active”: true
}

and 

{
    “_id”: “foo”,
    “_rev”: “1-def”,
    “owner”: “bob”,
    “active”: true
}

We could represent these using the following set of KVs:

(“foo”, “active”) = true
(“foo”, “owner”) = kCONFLICT
(“foo”, “owner”, “1-abc”) = “alice”
(“foo”, “owner”, “1-def”) = “bob”

This approach also extends to conflicts where the two versions have different 
data types. Consider a more complicated example where bob dropped the “active” 
field and changed the “owner” field to an object:

{
  “_id”: “foo”,
  “_rev”: “1-def”,
  “owner”: {
    “name”: “bob”,
    “email”: “b...@example.com"
  }
}

Now the set of KVs for “foo” looks like this (note that a missing field needs 
to be handled explicitly):

(“foo”, “active”) = kCONFLICT
(“foo”, “active”, “1-abc”) = true
(“foo”, “active”, “1-def”) = kMISSING
(“foo”, “owner”) = kCONFLICT
(“foo”, “owner”, “1-abc”) = “alice”
(“foo”, “owner”, “1-def”, “name”) = “bob”
(“foo”, “owner”, “1-def”, “email”) = “b...@example.com”

I like this approach for the common case where documents share most of their 
data in common but have a conflict in a very specific field or set of fields. 

I’ve encountered one important downside, though: an edit that replicates in and 
conflicts with the entire document can cause a bit of a data explosion. 
Consider a case where I have 10 conflicting versions of a 100KB document, but 
the conflicts are all related to a single scalar value. Now I replicate in an 
empty document, and suddenly I have a kCONFLICT at the root. In this model I 
now need to list out every path of every one of the 10 existing revisions and I 
end up with a 1MB update. Yuck. That’s technically no worse in the end state 
than the “zero sharing” case above, but one could easily imagine overrunning 
the transaction size limit this way.

I suspect there’s a smart path out of this. Maybe the system detects a 
“default” value for each field and uses that instead of writing out the value 
for every revision in a conflicted subtree. Worth some discussion.

## Revision Trees

In CouchDB we currently represent revisions as a hash history tree; each 
revision identifier is derived from the content of the revision including the 
revision identifier of its parent. Individual edit branches are bounded in 
*length* (I believe the default is 1000 entries), but the number of edit 
branches is technically unbounded.

The size limits in FoundationDB preclude us from storing the entire key tree as 
a single value; in pathological situations the tree could exceed 100KB. Rather, 
I think it would make sense to store each edit *branch* as a separate KV. We 
stem the branch long before it hits the value size limit, and in the happy case 
of no edit conflicts this means we store the edit history metadata in a single 
KV. It also means that we can apply an interactive edit without retrieving the 
entire conflicted revision tree; we need only retrieve and modify the single 
branch against which the edit is being applied. The downside is that we 
duplicate historical revision identifiers shared by multiple edit branches, but 
I think this is a worthwhile tradeoff.

I would furthermore try to structure the keys so that it is possible to 
retrieve the “winning” revision in a single limit=1 range query. Ideally I’d 
like to proide the following properties:

1) a document read does not need to retrieve the revision tree at all, just the 
winning revision identifier (which would be stored with the rest of the doc)
2) a document update only needs to read the edit branch of the revision tree 
against which the update is being applied, and it can read that branch 
immediately knowing only the content of the edit that is being attempted (i.e., 
it does not need to read the current version of the document itself).

So, I’d propose a separate subspace (maybe “_meta”?) for the revision trees, 
with keys and values that look like

(“_meta”, DocID, IsDeleted, RevPosition, RevHash) = [ParentRev, GrandparentRev, 
…]

The inclusion of IsDeleted, RevPosition and RevHash in the key should be 
sufficient (with the right encoding) to create a range query that automatically 
selects the “winner” according to CouchDB’s arcane rules, which are something 
like

1) deleted=false beats deleted=true
2) longer paths (i.e. higher RevPosition) beat shorter ones
3) RevHashes with larger binary values beat ones with smaller values

===========

OK, that’s all on this topic from me for now. I think this is a particularly 
exciting area where we start to see the dividends of splitting up data into 
multiple KV pairs in FoundationDB :) Cheers,

Adam


> On Feb 4, 2019, at 2:41 PM, Robert Newson <rnew...@apache.org> wrote:
> 
> This one is quite tightly coupled to the other thread on data model, should 
> we start much conversation here before that one gets closer to a solution?
> 
> -- 
>  Robert Samuel Newson
>  rnew...@apache.org
> 
> On Mon, 4 Feb 2019, at 19:25, Ilya Khlopotov wrote:
>> This is a beginning of a discussion thread about storage of edit 
>> conflicts and everything which relates to revisions.
>> 
>> 

Reply via email to