Hi,

With the Redwood storage engine under development and with prefix elision part 
of its design, I don’t think we should get too hung up on adding complications 
and indirections in the key space just yet. We haven’t written a line of code 
or run any tests, this is premature optimisation.

I’d like to focus on the simplest solution that yields all required properties. 
We can embellish later (if warranted).

I am intrigued by all the ideas that might allow us cheaper inserts and updates 
than the current code where there are multiple edit branches in the stored 
document.

B.

> On 6 Feb 2019, at 02:18, Ilya Khlopotov <iil...@apache.org> wrote:
> 
> While reading Adam's proposal I came to realize that: we don't have to 
> calculate winning revision at read time.
> Since FDB's transactions are atomic we can calculate it when we write. This 
> means we can just write latest values into separate range. This makes lookup 
> of latest version fast.
> Another realization is if we want to share values for some json paths we 
> would have to introduce a level of indirection. 
> Bellow is the data model inspired by Adam's idea to share json_paths. In this 
> model the json_path is stored in the revision where it was first added (we 
> call that revision an owner of a json_path). The values for json_path key can 
> be scalar values, parts of scalar values or pointers to owner location.
> The below snippets are sketches of transactions. 
> The transactions will include updates to other keys as needed 
> (`external_size`, `by_seq` and so on).  The revision tree management is not 
> covered yet.
> The `rev -> vsn` indirection is not strictly required. It is added because it 
> saves some space since `rev` is a long string and `vsn` is FDB versionstamp 
> of fixed size.
> 
> - `{NS} / {docid} / _by_rev / {rev} = vsn` 
> - `{NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL`
> - `{NS} / {docid} / _data / {json_path} = latest_value | part`
> - `{NS} / {docid} / {vsn} / _data / {json_path} = value | part | 
> {another_vsn}`
> 
> ```
> write(txn, doc_id, prev_rev, json):
>   txn.add_write_conflict_key("{NS} / {doc_id} / _rev")
>   rev = generate_new_rev()
>   txn["{NS} / {docid} / _by_rev / {rev}"] = vsn 
>   for every json_path in flattened json
>     - {NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL
>     if rev is HEAD:
>       # this range contains values for all json paths for the latest revision 
> (read optimization)
>       - {NS} / {docid} / _data / {json_path} = latest_value | part
>     - {NS} / {docid} / {vsn} / _data / {json_path} = value | part | 
> {another_vsn}
>   txn["{NS} / {doc_id} / _rev"] = rev
> 
> get_current(txn, doc_id):
>  # there is no sharing of json_paths in this range (read optimization)
>  txn.get_range("{NS} / {docid} / _data / 0x00", "{NS} / {docid} / _data / 
> 0xFF" )
> 
> get_revision(txn, doc_id, rev):
>  vsn = txn["{NS} / {docid} / _by_rev / {rev}"] 
>  json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00", "{NS} / 
> {vsn} / {docid} / _data / 0xFF" )
>  for every json_path in json_paths:
>    if value has type vsn:
>       another_vsn = value
>          value = txn["{NS} / {docid} / {another_vsn} / _data / {json_path}"]
>    result[json_path] = value
> 
> delete_revision(txn, doc_id, rev):
>  vsn = txn["{NS} / {docid} / _by_rev / {rev}"] 
>  json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00", "{NS} / 
> {vsn} / {docid} / _data / 0xFF" )
>  for every json_path in json_paths:
>    if value has type vsn:
>      # remove reference to deleted revision from the owner
>       del txn[{NS} / {docid} / _used_by / {json_path} / {vsn}]
>    # check if deleted revision of json_path is not used by anything else
>    if txn.get_range("{NS} / {docid} / _used_by / {json_path} / {vsn}", 
> limit=1) == []:
>       del txn["{NS} / {docid} / {vsn} / _data / {json_path}"]
>    if vsn is HEAD:
>       copy range for winning revision into "{NS} / {docid} / _data / 
> {json_path}"
> ```
> 
> best regards,
> iilyak
> 
> On 2019/02/04 23:22:09, Adam Kocoloski <kocol...@apache.org> wrote: 
>> I think it’s fine to start a focused discussion here as it might help inform 
>> some of the broader debate over in that thread.
>> 
>> As a reminder, today CouchDB writes the entire body of each document 
>> revision on disk as a separate blob. Edit conflicts that have common fields 
>> between them do not share any storage on disk. The revision tree is encoded 
>> into a compact format and a copy of it is stored directly in both the by_id 
>> tree and the by_seq tree. Each leaf entry in the revision tree contain a 
>> pointer to the position of the associated doc revision on disk.
>> 
>> As a further reminder, CouchDB 2.x clusters can generate edit conflict 
>> revisions just from multiple clients concurrently updating the same document 
>> in a single cluster. This won’t happen when FoundationDB is running under 
>> the hood, but users who deploy multiple CouchDB or PouchDB servers and 
>> replicate between them can of course still produce conflicts just like they 
>> could in CouchDB 1.x, so we need a solution.
>> 
>> Let’s consider the two sub-topics separately: 1) storage of edit conflict 
>> bodies and 2) revision trees
>> 
>> ## Edit Conflict Storage
>> 
>> The simplest possible solution would be to store each document revision 
>> separately, like we do today. We could store document bodies with (“docid”, 
>> “revid”) as the key prefix, and each transaction could clear the key range 
>> associated with the base revision against which the edit is being attempted. 
>> This would work, but I think we can try to be a bit more clever and save on 
>> storage space given that we’re splitting JSON documents into multiple KV 
>> pairs.
>> 
>> One thought I’d had is to introduce a special enum Value which indicates 
>> that the subtree “beneath” the given Key is in conflict. For example, 
>> consider the documents
>> 
>> {
>>    “_id”: “foo”,
>>    “_rev”: “1-abc”,
>>    “owner”: “alice”,
>>    “active”: true
>> }
>> 
>> and 
>> 
>> {
>>    “_id”: “foo”,
>>    “_rev”: “1-def”,
>>    “owner”: “bob”,
>>    “active”: true
>> }
>> 
>> We could represent these using the following set of KVs:
>> 
>> (“foo”, “active”) = true
>> (“foo”, “owner”) = kCONFLICT
>> (“foo”, “owner”, “1-abc”) = “alice”
>> (“foo”, “owner”, “1-def”) = “bob”
>> 
>> This approach also extends to conflicts where the two versions have 
>> different data types. Consider a more complicated example where bob dropped 
>> the “active” field and changed the “owner” field to an object:
>> 
>> {
>>  “_id”: “foo”,
>>  “_rev”: “1-def”,
>>  “owner”: {
>>    “name”: “bob”,
>>    “email”: “b...@example.com"
>>  }
>> }
>> 
>> Now the set of KVs for “foo” looks like this (note that a missing field 
>> needs to be handled explicitly):
>> 
>> (“foo”, “active”) = kCONFLICT
>> (“foo”, “active”, “1-abc”) = true
>> (“foo”, “active”, “1-def”) = kMISSING
>> (“foo”, “owner”) = kCONFLICT
>> (“foo”, “owner”, “1-abc”) = “alice”
>> (“foo”, “owner”, “1-def”, “name”) = “bob”
>> (“foo”, “owner”, “1-def”, “email”) = “b...@example.com”
>> 
>> I like this approach for the common case where documents share most of their 
>> data in common but have a conflict in a very specific field or set of 
>> fields. 
>> 
>> I’ve encountered one important downside, though: an edit that replicates in 
>> and conflicts with the entire document can cause a bit of a data explosion. 
>> Consider a case where I have 10 conflicting versions of a 100KB document, 
>> but the conflicts are all related to a single scalar value. Now I replicate 
>> in an empty document, and suddenly I have a kCONFLICT at the root. In this 
>> model I now need to list out every path of every one of the 10 existing 
>> revisions and I end up with a 1MB update. Yuck. That’s technically no worse 
>> in the end state than the “zero sharing” case above, but one could easily 
>> imagine overrunning the transaction size limit this way.
>> 
>> I suspect there’s a smart path out of this. Maybe the system detects a 
>> “default” value for each field and uses that instead of writing out the 
>> value for every revision in a conflicted subtree. Worth some discussion.
>> 
>> ## Revision Trees
>> 
>> In CouchDB we currently represent revisions as a hash history tree; each 
>> revision identifier is derived from the content of the revision including 
>> the revision identifier of its parent. Individual edit branches are bounded 
>> in *length* (I believe the default is 1000 entries), but the number of edit 
>> branches is technically unbounded.
>> 
>> The size limits in FoundationDB preclude us from storing the entire key tree 
>> as a single value; in pathological situations the tree could exceed 100KB. 
>> Rather, I think it would make sense to store each edit *branch* as a 
>> separate KV. We stem the branch long before it hits the value size limit, 
>> and in the happy case of no edit conflicts this means we store the edit 
>> history metadata in a single KV. It also means that we can apply an 
>> interactive edit without retrieving the entire conflicted revision tree; we 
>> need only retrieve and modify the single branch against which the edit is 
>> being applied. The downside is that we duplicate historical revision 
>> identifiers shared by multiple edit branches, but I think this is a 
>> worthwhile tradeoff.
>> 
>> I would furthermore try to structure the keys so that it is possible to 
>> retrieve the “winning” revision in a single limit=1 range query. Ideally I’d 
>> like to proide the following properties:
>> 
>> 1) a document read does not need to retrieve the revision tree at all, just 
>> the winning revision identifier (which would be stored with the rest of the 
>> doc)
>> 2) a document update only needs to read the edit branch of the revision tree 
>> against which the update is being applied, and it can read that branch 
>> immediately knowing only the content of the edit that is being attempted 
>> (i.e., it does not need to read the current version of the document itself).
>> 
>> So, I’d propose a separate subspace (maybe “_meta”?) for the revision trees, 
>> with keys and values that look like
>> 
>> (“_meta”, DocID, IsDeleted, RevPosition, RevHash) = [ParentRev, 
>> GrandparentRev, …]
>> 
>> The inclusion of IsDeleted, RevPosition and RevHash in the key should be 
>> sufficient (with the right encoding) to create a range query that 
>> automatically selects the “winner” according to CouchDB’s arcane rules, 
>> which are something like
>> 
>> 1) deleted=false beats deleted=true
>> 2) longer paths (i.e. higher RevPosition) beat shorter ones
>> 3) RevHashes with larger binary values beat ones with smaller values
>> 
>> ===========
>> 
>> OK, that’s all on this topic from me for now. I think this is a particularly 
>> exciting area where we start to see the dividends of splitting up data into 
>> multiple KV pairs in FoundationDB :) Cheers,
>> 
>> Adam
>> 
>> 
>>> On Feb 4, 2019, at 2:41 PM, Robert Newson <rnew...@apache.org> wrote:
>>> 
>>> This one is quite tightly coupled to the other thread on data model, should 
>>> we start much conversation here before that one gets closer to a solution?
>>> 
>>> -- 
>>> Robert Samuel Newson
>>> rnew...@apache.org
>>> 
>>> On Mon, 4 Feb 2019, at 19:25, Ilya Khlopotov wrote:
>>>> This is a beginning of a discussion thread about storage of edit 
>>>> conflicts and everything which relates to revisions.
>>>> 
>>>> 
>> 
>> 

Reply via email to