I think we should choose simple. We can then see if performance is too low or storage overhead too high and then see what we can do about it.
B. -- Robert Samuel Newson rnew...@apache.org On Thu, 7 Feb 2019, at 20:36, Ilya Khlopotov wrote: > We cannot do simple thing if we want to support sharing of JSON terms. I > think if we want the simplest path we should move sharing out of the > scope. The problem with sharing is we need to know the location of > shared terms when we do write. This means that we have to read full > document on every write. There might be tricks to replace full document > read with some sort of hierarchical signature or sketch of a document. > However these tricks do not fall into simplest solution category. We > need to choose the design goals: > - simple > - performance > - reduced storage overhead > > best regards, > iilyak > > On 2019/02/07 12:45:34, Garren Smith <gar...@apache.org> wrote: > > I’m also in favor of keeping it really simple and then testing and > > measuring it. > > > > What is the best way to measure that we have something that works? I’m not > > sure just relying on our current tests will prove that? Should we define > > and build some more complex situations e.g docs with lots of conflicts or > > docs with wide revisions and make sure we can solve for those? > > > > On Thu, Feb 7, 2019 at 12:33 PM Jan Lehnardt <j...@apache.org> wrote: > > > > > I’m also very much in favour with starting with the simplest thing that > > > can possibly work and doesn’t go against the advertised best practices of > > > FoundationDB. Let’s get that going and get a feel for how it all works > > > together, before trying to optimise things we can’t measure yet. > > > > > > Best > > > Jan > > > — > > > > > > > On 6. Feb 2019, at 16:58, Robert Samuel Newson <rnew...@apache.org> > > > wrote: > > > > > > > > Hi, > > > > > > > > With the Redwood storage engine under development and with prefix > > > elision part of its design, I don’t think we should get too hung up on > > > adding complications and indirections in the key space just yet. We > > > haven’t > > > written a line of code or run any tests, this is premature optimisation. > > > > > > > > I’d like to focus on the simplest solution that yields all required > > > properties. We can embellish later (if warranted). > > > > > > > > I am intrigued by all the ideas that might allow us cheaper inserts and > > > updates than the current code where there are multiple edit branches in > > > the > > > stored document. > > > > > > > > B. > > > > > > > >> On 6 Feb 2019, at 02:18, Ilya Khlopotov <iil...@apache.org> wrote: > > > >> > > > >> While reading Adam's proposal I came to realize that: we don't have to > > > calculate winning revision at read time. > > > >> Since FDB's transactions are atomic we can calculate it when we write. > > > This means we can just write latest values into separate range. This makes > > > lookup of latest version fast. > > > >> Another realization is if we want to share values for some json paths > > > we would have to introduce a level of indirection. > > > >> Bellow is the data model inspired by Adam's idea to share json_paths. > > > In this model the json_path is stored in the revision where it was first > > > added (we call that revision an owner of a json_path). The values for > > > json_path key can be scalar values, parts of scalar values or pointers to > > > owner location. > > > >> The below snippets are sketches of transactions. > > > >> The transactions will include updates to other keys as needed > > > (`external_size`, `by_seq` and so on). The revision tree management is > > > not > > > covered yet. > > > >> The `rev -> vsn` indirection is not strictly required. It is added > > > because it saves some space since `rev` is a long string and `vsn` is FDB > > > versionstamp of fixed size. > > > >> > > > >> - `{NS} / {docid} / _by_rev / {rev} = vsn` > > > >> - `{NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL` > > > >> - `{NS} / {docid} / _data / {json_path} = latest_value | part` > > > >> - `{NS} / {docid} / {vsn} / _data / {json_path} = value | part | > > > {another_vsn}` > > > >> > > > >> ``` > > > >> write(txn, doc_id, prev_rev, json): > > > >> txn.add_write_conflict_key("{NS} / {doc_id} / _rev") > > > >> rev = generate_new_rev() > > > >> txn["{NS} / {docid} / _by_rev / {rev}"] = vsn > > > >> for every json_path in flattened json > > > >> - {NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL > > > >> if rev is HEAD: > > > >> # this range contains values for all json paths for the latest > > > revision (read optimization) > > > >> - {NS} / {docid} / _data / {json_path} = latest_value | part > > > >> - {NS} / {docid} / {vsn} / _data / {json_path} = value | part | > > > {another_vsn} > > > >> txn["{NS} / {doc_id} / _rev"] = rev > > > >> > > > >> get_current(txn, doc_id): > > > >> # there is no sharing of json_paths in this range (read optimization) > > > >> txn.get_range("{NS} / {docid} / _data / 0x00", "{NS} / {docid} / _data > > > / 0xFF" ) > > > >> > > > >> get_revision(txn, doc_id, rev): > > > >> vsn = txn["{NS} / {docid} / _by_rev / {rev}"] > > > >> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00", > > > "{NS} / {vsn} / {docid} / _data / 0xFF" ) > > > >> for every json_path in json_paths: > > > >> if value has type vsn: > > > >> another_vsn = value > > > >> value = txn["{NS} / {docid} / {another_vsn} / _data / > > > {json_path}"] > > > >> result[json_path] = value > > > >> > > > >> delete_revision(txn, doc_id, rev): > > > >> vsn = txn["{NS} / {docid} / _by_rev / {rev}"] > > > >> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00", > > > "{NS} / {vsn} / {docid} / _data / 0xFF" ) > > > >> for every json_path in json_paths: > > > >> if value has type vsn: > > > >> # remove reference to deleted revision from the owner > > > >> del txn[{NS} / {docid} / _used_by / {json_path} / {vsn}] > > > >> # check if deleted revision of json_path is not used by anything else > > > >> if txn.get_range("{NS} / {docid} / _used_by / {json_path} / {vsn}", > > > limit=1) == []: > > > >> del txn["{NS} / {docid} / {vsn} / _data / {json_path}"] > > > >> if vsn is HEAD: > > > >> copy range for winning revision into "{NS} / {docid} / _data / > > > {json_path}" > > > >> ``` > > > >> > > > >> best regards, > > > >> iilyak > > > >> > > > >> On 2019/02/04 23:22:09, Adam Kocoloski <kocol...@apache.org> wrote: > > > >>> I think it’s fine to start a focused discussion here as it might help > > > inform some of the broader debate over in that thread. > > > >>> > > > >>> As a reminder, today CouchDB writes the entire body of each document > > > revision on disk as a separate blob. Edit conflicts that have common > > > fields > > > between them do not share any storage on disk. The revision tree is > > > encoded > > > into a compact format and a copy of it is stored directly in both the > > > by_id > > > tree and the by_seq tree. Each leaf entry in the revision tree contain a > > > pointer to the position of the associated doc revision on disk. > > > >>> > > > >>> As a further reminder, CouchDB 2.x clusters can generate edit conflict > > > revisions just from multiple clients concurrently updating the same > > > document in a single cluster. This won’t happen when FoundationDB is > > > running under the hood, but users who deploy multiple CouchDB or PouchDB > > > servers and replicate between them can of course still produce conflicts > > > just like they could in CouchDB 1.x, so we need a solution. > > > >>> > > > >>> Let’s consider the two sub-topics separately: 1) storage of edit > > > conflict bodies and 2) revision trees > > > >>> > > > >>> ## Edit Conflict Storage > > > >>> > > > >>> The simplest possible solution would be to store each document > > > revision separately, like we do today. We could store document bodies with > > > (“docid”, “revid”) as the key prefix, and each transaction could clear the > > > key range associated with the base revision against which the edit is > > > being > > > attempted. This would work, but I think we can try to be a bit more clever > > > and save on storage space given that we’re splitting JSON documents into > > > multiple KV pairs. > > > >>> > > > >>> One thought I’d had is to introduce a special enum Value which > > > indicates that the subtree “beneath” the given Key is in conflict. For > > > example, consider the documents > > > >>> > > > >>> { > > > >>> “_id”: “foo”, > > > >>> “_rev”: “1-abc”, > > > >>> “owner”: “alice”, > > > >>> “active”: true > > > >>> } > > > >>> > > > >>> and > > > >>> > > > >>> { > > > >>> “_id”: “foo”, > > > >>> “_rev”: “1-def”, > > > >>> “owner”: “bob”, > > > >>> “active”: true > > > >>> } > > > >>> > > > >>> We could represent these using the following set of KVs: > > > >>> > > > >>> (“foo”, “active”) = true > > > >>> (“foo”, “owner”) = kCONFLICT > > > >>> (“foo”, “owner”, “1-abc”) = “alice” > > > >>> (“foo”, “owner”, “1-def”) = “bob” > > > >>> > > > >>> This approach also extends to conflicts where the two versions have > > > different data types. Consider a more complicated example where bob > > > dropped > > > the “active” field and changed the “owner” field to an object: > > > >>> > > > >>> { > > > >>> “_id”: “foo”, > > > >>> “_rev”: “1-def”, > > > >>> “owner”: { > > > >>> “name”: “bob”, > > > >>> “email”: “b...@example.com" > > > >>> } > > > >>> } > > > >>> > > > >>> Now the set of KVs for “foo” looks like this (note that a missing > > > field needs to be handled explicitly): > > > >>> > > > >>> (“foo”, “active”) = kCONFLICT > > > >>> (“foo”, “active”, “1-abc”) = true > > > >>> (“foo”, “active”, “1-def”) = kMISSING > > > >>> (“foo”, “owner”) = kCONFLICT > > > >>> (“foo”, “owner”, “1-abc”) = “alice” > > > >>> (“foo”, “owner”, “1-def”, “name”) = “bob” > > > >>> (“foo”, “owner”, “1-def”, “email”) = “b...@example.com” > > > >>> > > > >>> I like this approach for the common case where documents share most of > > > their data in common but have a conflict in a very specific field or set > > > of > > > fields. > > > >>> > > > >>> I’ve encountered one important downside, though: an edit that > > > replicates in and conflicts with the entire document can cause a bit of a > > > data explosion. Consider a case where I have 10 conflicting versions of a > > > 100KB document, but the conflicts are all related to a single scalar > > > value. > > > Now I replicate in an empty document, and suddenly I have a kCONFLICT at > > > the root. In this model I now need to list out every path of every one of > > > the 10 existing revisions and I end up with a 1MB update. Yuck. That’s > > > technically no worse in the end state than the “zero sharing” case above, > > > but one could easily imagine overrunning the transaction size limit this > > > way. > > > >>> > > > >>> I suspect there’s a smart path out of this. Maybe the system detects a > > > “default” value for each field and uses that instead of writing out the > > > value for every revision in a conflicted subtree. Worth some discussion. > > > >>> > > > >>> ## Revision Trees > > > >>> > > > >>> In CouchDB we currently represent revisions as a hash history tree; > > > each revision identifier is derived from the content of the revision > > > including the revision identifier of its parent. Individual edit branches > > > are bounded in *length* (I believe the default is 1000 entries), but the > > > number of edit branches is technically unbounded. > > > >>> > > > >>> The size limits in FoundationDB preclude us from storing the entire > > > key tree as a single value; in pathological situations the tree could > > > exceed 100KB. Rather, I think it would make sense to store each edit > > > *branch* as a separate KV. We stem the branch long before it hits the > > > value > > > size limit, and in the happy case of no edit conflicts this means we store > > > the edit history metadata in a single KV. It also means that we can apply > > > an interactive edit without retrieving the entire conflicted revision > > > tree; > > > we need only retrieve and modify the single branch against which the edit > > > is being applied. The downside is that we duplicate historical revision > > > identifiers shared by multiple edit branches, but I think this is a > > > worthwhile tradeoff. > > > >>> > > > >>> I would furthermore try to structure the keys so that it is possible > > > to retrieve the “winning” revision in a single limit=1 range query. > > > Ideally > > > I’d like to proide the following properties: > > > >>> > > > >>> 1) a document read does not need to retrieve the revision tree at all, > > > just the winning revision identifier (which would be stored with the rest > > > of the doc) > > > >>> 2) a document update only needs to read the edit branch of the > > > revision tree against which the update is being applied, and it can read > > > that branch immediately knowing only the content of the edit that is being > > > attempted (i.e., it does not need to read the current version of the > > > document itself). > > > >>> > > > >>> So, I’d propose a separate subspace (maybe “_meta”?) for the revision > > > trees, with keys and values that look like > > > >>> > > > >>> (“_meta”, DocID, IsDeleted, RevPosition, RevHash) = [ParentRev, > > > GrandparentRev, …] > > > >>> > > > >>> The inclusion of IsDeleted, RevPosition and RevHash in the key should > > > be sufficient (with the right encoding) to create a range query that > > > automatically selects the “winner” according to CouchDB’s arcane rules, > > > which are something like > > > >>> > > > >>> 1) deleted=false beats deleted=true > > > >>> 2) longer paths (i.e. higher RevPosition) beat shorter ones > > > >>> 3) RevHashes with larger binary values beat ones with smaller values > > > >>> > > > >>> =========== > > > >>> > > > >>> OK, that’s all on this topic from me for now. I think this is a > > > particularly exciting area where we start to see the dividends of > > > splitting > > > up data into multiple KV pairs in FoundationDB :) Cheers, > > > >>> > > > >>> Adam > > > >>> > > > >>> > > > >>>> On Feb 4, 2019, at 2:41 PM, Robert Newson <rnew...@apache.org> wrote: > > > >>>> > > > >>>> This one is quite tightly coupled to the other thread on data model, > > > should we start much conversation here before that one gets closer to a > > > solution? > > > >>>> > > > >>>> -- > > > >>>> Robert Samuel Newson > > > >>>> rnew...@apache.org > > > >>>> > > > >>>> On Mon, 4 Feb 2019, at 19:25, Ilya Khlopotov wrote: > > > >>>>> This is a beginning of a discussion thread about storage of edit > > > >>>>> conflicts and everything which relates to revisions. > > > >>>>> > > > >>>>> > > > >>> > > > >>> > > > > > > > > > > -- > > > Professional Support for Apache CouchDB: > > > https://neighbourhood.ie/couchdb-support/ > > > > > > > >