I overlooked two important details when I sent this earlier email. I try not to 
be in the habit of replying to my own emails but here we go ...

First, in order to update a document while reading only the winning edit branch 
entry (the second property that I said I wanted to provide) we also need to 
include the versionstamp associated with the current leaf in the edit branch 
KV, so we know which entry to clear from the “by_seq” subspace. So a more 
complete example would be

(“_meta”, DocID, IsDeleted, RevPosition, RevHash) = (VersionstampForRev, 
[ParentRev, GrandparentRev, …])

Second ... that property isn’t really compatible with the compact edit conflict 
storage model I proposed, is it? If you need to merge in a new edit to an 
existing set of KVs you can’t very well just blindly clear the whole range.

This is an interesting balancing act. Perhaps there’s a good way for us to 
preserve the “fast path” which is applicable in 99% of circumstances and then 
only fallback to the “read the doc in order to update it” mode when we know 
there are multiple edit branches.

Adam

> On Feb 4, 2019, at 6:22 PM, Adam Kocoloski <kocol...@apache.org> wrote:
> 
> I think it’s fine to start a focused discussion here as it might help inform 
> some of the broader debate over in that thread.
> 
> As a reminder, today CouchDB writes the entire body of each document revision 
> on disk as a separate blob. Edit conflicts that have common fields between 
> them do not share any storage on disk. The revision tree is encoded into a 
> compact format and a copy of it is stored directly in both the by_id tree and 
> the by_seq tree. Each leaf entry in the revision tree contain a pointer to 
> the position of the associated doc revision on disk.
> 
> As a further reminder, CouchDB 2.x clusters can generate edit conflict 
> revisions just from multiple clients concurrently updating the same document 
> in a single cluster. This won’t happen when FoundationDB is running under the 
> hood, but users who deploy multiple CouchDB or PouchDB servers and replicate 
> between them can of course still produce conflicts just like they could in 
> CouchDB 1.x, so we need a solution.
> 
> Let’s consider the two sub-topics separately: 1) storage of edit conflict 
> bodies and 2) revision trees
> 
> ## Edit Conflict Storage
> 
> The simplest possible solution would be to store each document revision 
> separately, like we do today. We could store document bodies with (“docid”, 
> “revid”) as the key prefix, and each transaction could clear the key range 
> associated with the base revision against which the edit is being attempted. 
> This would work, but I think we can try to be a bit more clever and save on 
> storage space given that we’re splitting JSON documents into multiple KV 
> pairs.
> 
> One thought I’d had is to introduce a special enum Value which indicates that 
> the subtree “beneath” the given Key is in conflict. For example, consider the 
> documents
> 
> {
>    “_id”: “foo”,
>    “_rev”: “1-abc”,
>    “owner”: “alice”,
>    “active”: true
> }
> 
> and 
> 
> {
>    “_id”: “foo”,
>    “_rev”: “1-def”,
>    “owner”: “bob”,
>    “active”: true
> }
> 
> We could represent these using the following set of KVs:
> 
> (“foo”, “active”) = true
> (“foo”, “owner”) = kCONFLICT
> (“foo”, “owner”, “1-abc”) = “alice”
> (“foo”, “owner”, “1-def”) = “bob”
> 
> This approach also extends to conflicts where the two versions have different 
> data types. Consider a more complicated example where bob dropped the 
> “active” field and changed the “owner” field to an object:
> 
> {
>  “_id”: “foo”,
>  “_rev”: “1-def”,
>  “owner”: {
>    “name”: “bob”,
>    “email”: “b...@example.com"
>  }
> }
> 
> Now the set of KVs for “foo” looks like this (note that a missing field needs 
> to be handled explicitly):
> 
> (“foo”, “active”) = kCONFLICT
> (“foo”, “active”, “1-abc”) = true
> (“foo”, “active”, “1-def”) = kMISSING
> (“foo”, “owner”) = kCONFLICT
> (“foo”, “owner”, “1-abc”) = “alice”
> (“foo”, “owner”, “1-def”, “name”) = “bob”
> (“foo”, “owner”, “1-def”, “email”) = “b...@example.com”
> 
> I like this approach for the common case where documents share most of their 
> data in common but have a conflict in a very specific field or set of fields. 
> 
> I’ve encountered one important downside, though: an edit that replicates in 
> and conflicts with the entire document can cause a bit of a data explosion. 
> Consider a case where I have 10 conflicting versions of a 100KB document, but 
> the conflicts are all related to a single scalar value. Now I replicate in an 
> empty document, and suddenly I have a kCONFLICT at the root. In this model I 
> now need to list out every path of every one of the 10 existing revisions and 
> I end up with a 1MB update. Yuck. That’s technically no worse in the end 
> state than the “zero sharing” case above, but one could easily imagine 
> overrunning the transaction size limit this way.
> 
> I suspect there’s a smart path out of this. Maybe the system detects a 
> “default” value for each field and uses that instead of writing out the value 
> for every revision in a conflicted subtree. Worth some discussion.
> 
> ## Revision Trees
> 
> In CouchDB we currently represent revisions as a hash history tree; each 
> revision identifier is derived from the content of the revision including the 
> revision identifier of its parent. Individual edit branches are bounded in 
> *length* (I believe the default is 1000 entries), but the number of edit 
> branches is technically unbounded.
> 
> The size limits in FoundationDB preclude us from storing the entire key tree 
> as a single value; in pathological situations the tree could exceed 100KB. 
> Rather, I think it would make sense to store each edit *branch* as a separate 
> KV. We stem the branch long before it hits the value size limit, and in the 
> happy case of no edit conflicts this means we store the edit history metadata 
> in a single KV. It also means that we can apply an interactive edit without 
> retrieving the entire conflicted revision tree; we need only retrieve and 
> modify the single branch against which the edit is being applied. The 
> downside is that we duplicate historical revision identifiers shared by 
> multiple edit branches, but I think this is a worthwhile tradeoff.
> 
> I would furthermore try to structure the keys so that it is possible to 
> retrieve the “winning” revision in a single limit=1 range query. Ideally I’d 
> like to proide the following properties:
> 
> 1) a document read does not need to retrieve the revision tree at all, just 
> the winning revision identifier (which would be stored with the rest of the 
> doc)
> 2) a document update only needs to read the edit branch of the revision tree 
> against which the update is being applied, and it can read that branch 
> immediately knowing only the content of the edit that is being attempted 
> (i.e., it does not need to read the current version of the document itself).
> 
> So, I’d propose a separate subspace (maybe “_meta”?) for the revision trees, 
> with keys and values that look like
> 
> (“_meta”, DocID, IsDeleted, RevPosition, RevHash) = [ParentRev, 
> GrandparentRev, …]
> 
> The inclusion of IsDeleted, RevPosition and RevHash in the key should be 
> sufficient (with the right encoding) to create a range query that 
> automatically selects the “winner” according to CouchDB’s arcane rules, which 
> are something like
> 
> 1) deleted=false beats deleted=true
> 2) longer paths (i.e. higher RevPosition) beat shorter ones
> 3) RevHashes with larger binary values beat ones with smaller values
> 
> ===========
> 
> OK, that’s all on this topic from me for now. I think this is a particularly 
> exciting area where we start to see the dividends of splitting up data into 
> multiple KV pairs in FoundationDB :) Cheers,
> 
> Adam
> 
> 
>> On Feb 4, 2019, at 2:41 PM, Robert Newson <rnew...@apache.org> wrote:
>> 
>> This one is quite tightly coupled to the other thread on data model, should 
>> we start much conversation here before that one gets closer to a solution?
>> 
>> -- 
>> Robert Samuel Newson
>> rnew...@apache.org
>> 
>>> On Mon, 4 Feb 2019, at 19:25, Ilya Khlopotov wrote:
>>> This is a beginning of a discussion thread about storage of edit 
>>> conflicts and everything which relates to revisions.
>>> 
>>> 
> 

Reply via email to