Good points on revtree, I agree with you we should store that intelligently to 
gain the benefits you mentioned.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Tue, 19 Feb 2019, at 18:41, Adam Kocoloski wrote:
> I do not think we should store the revtree as a blob. The design where 
> each edit branch is its own KV should save on network IO and CPU cycles 
> for normal updates. We’ve performed too many heroics to keep 
> couch_key_tree from stalling entire databases when trying to update a 
> single document with a wide revision tree, I would much prefer to ignore 
> other edit branches entirely when all we’re doing is extending one of 
> them.
> 
> I also do not think we should store JSON documents as blobs, but it’s a 
> closer call. Some of my reasoning for preferring the exploded path 
> design:
> 
> - it lends itself nicely to sub-document operations, for which Jan 
> crafted an RFC last year: https://github.com/apache/couchdb/issues/1559
> - it optimizes the creation of Mango indexes on existing databases since 
> we only need to retrieve the value(s) we want to index
> - it optimizes Mango queries that use field selectors
> - anyone who wanted to try their hand at GraphQL will find it very 
> handy: https://github.com/apache/couchdb/issues/1499
> - looking further ahead, it lets us play with smarter leaf value types 
> like Counters (yes I’m still on the CRDT bandwagon, sorry)
> 
> A few comments on the thread:
> 
> >>> * Most documents bodies are probably going to be smaller than 100k. So in
> >>> the majority of case it would be one write / one read to update and fetch
> >>> the document body.
> 
> We should test, but I expect reading 50KB of data in a range query is 
> almost as efficient as reading a single 50 KB value. Similarly, writes 
> to a contiguous set of keys should be quite efficient.
> 
> I am concerned about the overhead of the repeated field paths in the 
> keys with the exploded path option in the absence of key prefix 
> compression. That would be my main reason to acquiesce and throw away 
> all the document structure.
> 
> Adam
> 
> > On Feb 19, 2019, at 12:04 PM, Robert Newson <rnew...@apache.org> wrote:
> > 
> > I like the idea that we'd reuse the same pattern (but perhaps not the same 
> > _code_) for doc bodies, revtree and attachments.
> > 
> > I hope we still get to delete couch_key_tree.erl, though.
> > 
> > -- 
> >  Robert Samuel Newson
> >  rnew...@apache.org
> > 
> > On Tue, 19 Feb 2019, at 17:03, Jan Lehnardt wrote:
> >> I like the idea from a “trying a simple thing first” perspective, but 
> >> Nick’s points below are especially convincing to with this for now.
> >> 
> >> Best
> >> Jan
> >> —
> >> 
> >>> On 19. Feb 2019, at 17:53, Nick Vatamaniuc <vatam...@gmail.com> wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> Sorry for jumping in so late, I was following from the sidelines mostly. A
> >>> lot of good discussion happening and am excited about the possibilities
> >>> here.
> >>> 
> >>> I do like the simpler "chunking" approach for a few reasons:
> >>> 
> >>> * Most documents bodies are probably going to be smaller than 100k. So in
> >>> the majority of case it would be one write / one read to update and fetch
> >>> the document body.
> >>> 
> >>> * We could reuse the chunking code for attachment handling and possibly
> >>> revision key trees. So it's the general pattern of upload chunks to some
> >>> prefix, and when finished flip an atomic toggle to make it current.
> >>> 
> >>> * Do the same thing with revision trees and we could re-use the revision
> >>> tree manipulation logic. That is, the key tree in most cases would be 
> >>> small
> >>> enough to fit in 100k but if they get huge, they'd get chunked. This would
> >>> allow us to reuse all the battle tested couch_key_tree code mostly as is.
> >>> We even have property tests for it
> >>> https://github.com/apache/couchdb/blob/master/src/couch/test/couch_key_tree_prop_tests.erl
> >>> 
> >>> * It removes the need to explain the max exploded path length limitation 
> >>> to
> >>> customers.
> >>> 
> >>> Cheers,
> >>> -Nick
> >>> 
> >>> 
> >>> On Tue, Feb 19, 2019 at 11:18 AM Robert Newson <rnew...@apache.org> wrote:
> >>> 
> >>>> Hi,
> >>>> 
> >>>> An alternative storage model that we should seriously consider is to
> >>>> follow our current approach in couch_file et al. Specifically, that the
> >>>> document _body_ is stored as an uninterpreted binary value. This would be
> >>>> much like the obvious plan for attachment storage; a key prefix that
> >>>> identifies the database and document, with the final item of that key 
> >>>> tuple
> >>>> is an incrementing integer. Each of those keys has a binary value of up 
> >>>> to
> >>>> 100k. Fetching all values with that key prefix, in fdb's natural 
> >>>> ordering,
> >>>> will yield the full document body, which can be JSON decoded for further
> >>>> processing.
> >>>> 
> >>>> I like this idea, and I like Adam's original proposal to explode 
> >>>> documents
> >>>> into property paths. I have a slight preference for the simplicity of the
> >>>> idea in the previous paragraph, not least because it's close to what we 
> >>>> do
> >>>> today. I also think it will be possible to migrate to alternative storage
> >>>> models in future, and foundationdb's transaction supports means we can do
> >>>> this migration seamlessly should we come to it.
> >>>> 
> >>>> I'm very interested in knowing if anyone else is interested in going this
> >>>> simple, or considers it a wasted opportunity relative to the 'exploded'
> >>>> path.
> >>>> 
> >>>> B.
> >>>> 
> >>>> --
> >>>> Robert Samuel Newson
> >>>> rnew...@apache.org
> >>>> 
> >>>> On Mon, 4 Feb 2019, at 19:59, Robert Newson wrote:
> >>>>> I've been remiss here in not posting the data model ideas that IBM
> >>>>> worked up while we were thinking about using FoundationDB so I'm posting
> >>>>> it now. This is Adam' Kocoloski's original work, I am just transcribing
> >>>>> it, and this is the context that the folks from the IBM side came in
> >>>>> with, for full disclosure.
> >>>>> 
> >>>>> Basics
> >>>>> 
> >>>>> 1. All CouchDB databases are inside a Directory
> >>>>> 2. Each CouchDB database is a Directory within that Directory
> >>>>> 3. It's possible to list all subdirectories of a Directory, so
> >>>>> `_all_dbs` is the list of directories from 1.
> >>>>> 4. Each Directory representing a CouchdB database has several Subspaces;
> >>>>> 4a. by_id/ doc subspace: actual document contents
> >>>>> 4b. by_seq/versionstamp subspace: for the _changes feed
> >>>>> 4c. index_definitions, indexes, ...
> >>>>> 
> >>>>> JSON Mapping
> >>>>> 
> >>>>> A hierarchical JSON object naturally maps to multiple KV pairs in FDB:
> >>>>> 
> >>>>> {
> >>>>>   “_id”: “foo”,
> >>>>>   “owner”: “bob”,
> >>>>>   “mylist”: [1,3,5],
> >>>>>   “mymap”: {
> >>>>>       “blue”: “#0000FF”,
> >>>>>       “red”: “#FF0000”
> >>>>>   }
> >>>>> }
> >>>>> 
> >>>>> maps to
> >>>>> 
> >>>>> (“foo”, “owner”) = “bob”
> >>>>> (“foo”, “mylist”, 0) = 1
> >>>>> (“foo”, “mylist”, 1) = 3
> >>>>> (“foo”, “mylist”, 2) = 5
> >>>>> (“foo”, “mymap”, “blue”) = “#0000FF”
> >>>>> (“foo”, “mymap”, “red”) = “#FF0000”
> >>>>> 
> >>>>> NB: this means that the 100KB limit applies to individual leafs in the
> >>>>> JSON object, not the entire doc
> >>>>> 
> >>>>> Edit Conflicts
> >>>>> 
> >>>>> We need to account for the presence of conflicts in various levels of
> >>>>> the doc due to replication.
> >>>>> 
> >>>>> Proposal is to create a special value indicating that the subtree below
> >>>>> our current cursor position is in an unresolvable conflict. Then add
> >>>>> additional KV pairs below to describe the conflicting entries.
> >>>>> 
> >>>>> KV data model allows us to store these efficiently and minimize
> >>>>> duplication of data:
> >>>>> 
> >>>>> A document with these two conflicts:
> >>>>> 
> >>>>> {
> >>>>>   “_id”: “foo”,
> >>>>>   “_rev”: “1-abc”,
> >>>>>   “owner”: “alice”,
> >>>>>   “active”: true
> >>>>> }
> >>>>> {
> >>>>>   “_id”: “foo”,
> >>>>>   “_rev”: “1-def”,
> >>>>>   “owner”: “bob”,
> >>>>>   “active”: true
> >>>>> }
> >>>>> 
> >>>>> could be stored thus:
> >>>>> 
> >>>>> (“foo”, “active”) = true
> >>>>> (“foo”, “owner”) = kCONFLICT
> >>>>> (“foo”, “owner”, “1-abc”) = “alice”
> >>>>> (“foo”, “owner”, “1-def”) = “bob”
> >>>>> 
> >>>>> So long as `kCONFLICT` is set at the top of the conflicting subtree this
> >>>>> representation can handle conflicts of different data types as well.
> >>>>> 
> >>>>> Missing fields need to be handled explicitly:
> >>>>> 
> >>>>> {
> >>>>> “_id”: “foo”,
> >>>>> “_rev”: “1-abc”,
> >>>>> “owner”: “alice”,
> >>>>> “active”: true
> >>>>> }
> >>>>> 
> >>>>> {
> >>>>> “_id”: “foo”,
> >>>>> “_rev”: “1-def”,
> >>>>> “owner”: {
> >>>>>   “name”: “bob”,
> >>>>>   “email”: “
> >>>>> b...@example.com
> >>>>> "
> >>>>> }
> >>>>> }
> >>>>> 
> >>>>> could be stored thus:
> >>>>> 
> >>>>> (“foo”, “active”) = kCONFLICT
> >>>>> (“foo”, “active”, “1-abc”) = true
> >>>>> (“foo”, “active”, “1-def”) = kMISSING
> >>>>> (“foo”, “owner”) = kCONFLICT
> >>>>> (“foo”, “owner”, “1-abc”) = “alice”
> >>>>> (“foo”, “owner”, “1-def”, “name”) = “bob”
> >>>>> (“foo”, “owner”, “1-def”, “email”) = ...
> >>>>> 
> >>>>> Revision Metadata
> >>>>> 
> >>>>> * CouchDB uses a hash history for revisions
> >>>>> ** Each edit is identified by the hash of the content of the edit
> >>>>> including the base revision against which it was applied
> >>>>> ** Individual edit branches are bounded in length but the number of
> >>>>> branches is potentially unbounded
> >>>>> 
> >>>>> * Size limits preclude us from storing the entire key tree as a single
> >>>>> value; in pathological situations
> >>>>> the tree could exceed 100KB (each entry is > 16 bytes)
> >>>>> 
> >>>>> * Store each edit branch as a separate KV including deleted status in a
> >>>>> special subspace
> >>>>> 
> >>>>> * Structure key representation so that “winning” revision can be
> >>>>> automatically retrieved in a limit=1
> >>>>> key range operation
> >>>>> 
> >>>>> (“foo”, “_meta”, “deleted=false”, 1, “def”) = []
> >>>>> (“foo”, “_meta”, “deleted=false”, 4, “bif”) = [“3-baz”,”2-bar”,”1-foo”]
> >>>>> <-- winner
> >>>>> (“foo”, “_meta”, “deleted=true”, 3, “abc”) = [“2-bar”, “1-foo”]
> >>>>> 
> >>>>> Changes Feed
> >>>>> 
> >>>>> * FDB supports a concept called a versionstamp — a 10 byte, unique,
> >>>>> monotonically (but not sequentially) increasing value for each committed
> >>>>> transaction. The first 8 bytes are the committed version of the
> >>>>> database. The last 2 bytes are monotonic in the serialization order for
> >>>>> transactions.
> >>>>> 
> >>>>> * A transaction can specify a particular index into a key where the
> >>>>> following 10 bytes will be overwritten by the versionstamp at commit
> >>>>> time
> >>>>> 
> >>>>> * A subspace keyed on versionstamp naturally yields a _changes feed
> >>>>> 
> >>>>> by_seq subspace
> >>>>> (“versionstamp1”) = (“foo”, “1-abc”)
> >>>>> (“versionstamp4”) = (“bar”, “4-def”)
> >>>>> 
> >>>>> by_id subspace
> >>>>> (“bar”, “_vsn”) = “versionstamp4”
> >>>>> ...
> >>>>> (“foo”, “_vsn”) = “versionstamp1”
> >>>>> 
> >>>>> JSON Indexes
> >>>>> 
> >>>>> * “Mango” JSON indexes are defined by
> >>>>> ** a list of field names, each of which may be nested,
> >>>>> ** an optional partial_filter_selector which constrains the set of docs
> >>>>> that contribute
> >>>>> ** an optional name defined by the ddoc field (the name is auto-
> >>>>> generated if not supplied)
> >>>>> 
> >>>>> * Store index definitions in a single subspace to aid query planning
> >>>>> ** ((person,name), title, email) = (“name-title-email”, “{“student”:
> >>>>> true}”)
> >>>>> ** Store the values for each index in a dedicated subspace, adding the
> >>>>> document ID as the last element in the tuple
> >>>>> *** (“rosie revere”, “engineer”, “ro...@example.com", “foo”) = null
> >>>>> 
> >>>>> B.
> >>>>> 
> >>>>> --
> >>>>> Robert Samuel Newson
> >>>>> rnew...@apache.org
> >>>>> 
> >>>>> On Mon, 4 Feb 2019, at 19:13, Ilya Khlopotov wrote:
> >>>>>> 
> >>>>>> I want to fix previous mistakes. I did two mistakes in previous
> >>>>>> calculations:
> >>>>>> - I used 1Kb as base size for calculating expansion factor (although
> >>>> we
> >>>>>> don't know exact size of original document)
> >>>>>> - The expansion factor calculation included number of revisions (it
> >>>>>> shouldn't)
> >>>>>> 
> >>>>>> I'll focus on flattened JSON docs model
> >>>>>> 
> >>>>>> The following formula is used in previous calculation.
> >>>>>> storage_size_per_document=mapping_table_size*number_of_revisions +
> >>>>>> depth*number_of_paths*number_of_revisions +
> >>>>>> number_of_paths*value_size*number_of_revisions
> >>>>>> 
> >>>>>> To clarify things a little bit I want to calculate space requirement
> >>>> for
> >>>>>> single revision this time.
> >>>>>> mapping_table_size=field_name_size*(field_name_length+4(integer
> >>>>>> size))=100 * (20 + 4(integer size)) = 2400 bytes
> >>>>>> storage_size_per_document_per_revision_per_replica=mapping_table_size
> >>>> +
> >>>>>> depth*number_of_paths + value_size*number_of_paths =
> >>>>>> 2400bytes + 10*1000+1000*100=112400bytes~=110 Kb
> >>>>>> 
> >>>>>> We definitely can reduce requirement for mapping table by adopting
> >>>>>> rnewson's idea of a schema.
> >>>>>> 
> >>>>>> On 2019/02/04 11:08:16, Ilya Khlopotov <iil...@apache.org> wrote:
> >>>>>>> Hi Michael,
> >>>>>>> 
> >>>>>>>> For example, hears a crazy thought:
> >>>>>>>> Map every distinct occurence of a key/value instance through a
> >>>> crypto hash
> >>>>>>>> function to get a set of hashes.
> >>>>>>>> 
> >>>>>>>> These can be be precomputed by Couch without any lookups in FDB.
> >>>> These
> >>>>>>>> will be spread all over kingdom come in FDB and not lend
> >>>> themselves to
> >>>>>>>> range search well.
> >>>>>>>> 
> >>>>>>>> So what you do is index them for frequency of occurring in the
> >>>> same set.
> >>>>>>>> In essence, you 'bucket them' statistically, and that bucket id
> >>>> becomes a
> >>>>>>>> key prefix. A crypto hash value can be copied into more than one
> >>>> bucket.
> >>>>>>>> The {bucket_id}/{cryptohash} becomes a {val_id}
> >>>>>>> 
> >>>>>>>> When writing a document, Couch submits the list/array of
> >>>> cryptohash values
> >>>>>>>> it computed to FDB and gets back the corresponding  {val_id} (the
> >>>> id with
> >>>>>>>> the bucket prefixed).  This can get somewhat expensive if there's
> >>>> always a
> >>>>>>>> lot of app local cache misses.
> >>>>>>>> 
> >>>>>>>> A document's value is then a series of {val_id} arrays up to 100k
> >>>> per
> >>>>>>>> segment.
> >>>>>>>> 
> >>>>>>>> When retrieving a document, you get the val_ids, find the distinct
> >>>> buckets
> >>>>>>>> and min/max entries for this doc, and then parallel query each
> >>>> bucket while
> >>>>>>>> reconstructing the document.
> >>>>>>> 
> >>>>>>> Interesting idea. Let's try to think it through to see if we can
> >>>> make it viable.
> >>>>>>> Let's go through hypothetical example. Input data for the example:
> >>>>>>> - 1M of documents
> >>>>>>> - each document is around 10Kb
> >>>>>>> - each document consists of 1K of unique JSON paths
> >>>>>>> - each document has 100 unique JSON field names
> >>>>>>> - every scalar value is 100 bytes
> >>>>>>> - 10% of unique JSON paths for every document already stored in
> >>>> database under different doc or different revision of the current one
> >>>>>>> - we assume 3 independent copies for every key-value pair in FDB
> >>>>>>> - our hash key size is 32 bytes
> >>>>>>> - let's assume we can determine if key is already on the storage
> >>>> without doing query
> >>>>>>> - 1% of paths is in cache (unrealistic value, in real live the
> >>>> percentage is lower)
> >>>>>>> - every JSON field name is 20 bytes
> >>>>>>> - every JSON path is 10 levels deep
> >>>>>>> - document key prefix length is 50
> >>>>>>> - every document has 10 revisions
> >>>>>>> Let's estimate the storage requirements and size of data we need to
> >>>> transmit. The calculations are not exact.
> >>>>>>> 1. storage_size_per_document (we cannot estimate exact numbers since
> >>>> we don't know how FDB stores it)
> >>>>>>> - 10 * ((10Kb - (10Kb * 10%)) + (1K - (1K * 10%)) * 32 bytes) =
> >>>> 38Kb * 10 * 3 = 1140 Kb (11x)
> >>>>>>> 2. number of independent keys to retrieve on document read
> >>>> (non-range queries) per document
> >>>>>>> - 1K - (1K * 1%) = 990
> >>>>>>> 3. number of range queries: 0
> >>>>>>> 4. data to transmit on read: (1K - (1K * 1%)) * (100 bytes + 32
> >>>> bytes) = 102 Kb (10x)
> >>>>>>> 5. read latency (we use 2ms per read based on numbers from
> >>>> https://apple.github.io/foundationdb/performance.html)
> >>>>>>>   - sequential: 990*2ms = 1980ms
> >>>>>>>   - range: 0
> >>>>>>> Let's compare these numbers with initial proposal (flattened JSON
> >>>> docs without global schema and without cache)
> >>>>>>> 1. storage_size_per_document
> >>>>>>> - mapping table size: 100 * (20 + 4(integer size)) = 2400 bytes
> >>>>>>> - key size: (10 * (4 + 1(delimiter))) + 50 = 100 bytes
> >>>>>>> - storage_size_per_document: 2.4K*10 + 100*1K*10 + 1K*100*10 =
> >>>> 2024K = 1976 Kb * 3 = 5930 Kb (59.3x)
> >>>>>>> 2. number of independent keys to retrieve: 0-2 (depending on index
> >>>> structure)
> >>>>>>> 3. number of range queries: 1 (1001 of keys in result)
> >>>>>>> 4. data to transmit on read: 24K + 1000*100 + 1000*100 = 23.6 Kb
> >>>> (2.4x)
> >>>>>>> 5. read latency (we use 2ms per read based on numbers from
> >>>> https://apple.github.io/foundationdb/performance.html and estimate range
> >>>> read performance based on numbers from
> >>>> https://apple.github.io/foundationdb/benchmarking.html#single-core-read-test
> >>>> )
> >>>>>>> - range read performance: Given read performance is about 305,000
> >>>> reads/second and range performance 3,600,000 keys/second we estimate 
> >>>> range
> >>>> performance to be 11.8x compared to read performance. If read performance
> >>>> is 2ms than range performance is 0.169ms (which is hard to believe).
> >>>>>>> - sequential: 2 * 2 = 4ms
> >>>>>>> - range: 0.169
> >>>>>>> 
> >>>>>>> It looks like we are dealing with a tradeoff:
> >>>>>>> - Map every distinct occurrence of a key/value instance through a
> >>>> crypto hash:
> >>>>>>> - 5.39x more disk space efficient
> >>>>>>> - 474x slower
> >>>>>>> - flattened JSON model
> >>>>>>> - 5.39x less efficient in disk space
> >>>>>>> - 474x faster
> >>>>>>> 
> >>>>>>> In any case this unscientific exercise was very helpful. Since it
> >>>> uncovered the high cost in terms of disk space. 59.3x of original disk 
> >>>> size
> >>>> is too much IMO.
> >>>>>>> 
> >>>>>>> Are the any ways we can make Michael's model more performant?
> >>>>>>> 
> >>>>>>> Also I don't quite understand few aspects of the global hash table
> >>>> proposal:
> >>>>>>> 
> >>>>>>> 1. > - Map every distinct occurence of a key/value instance through
> >>>> a crypto hash function to get a set of hashes.
> >>>>>>> I think we are talking only about scalar values here? I.e.
> >>>> `"#/foo.bar.baz": 123`
> >>>>>>> Since I don't know how we can make it work for all possible JSON
> >>>> paths `{"foo": {"bar": {"size": 12, "baz": 123}}}":
> >>>>>>> - foo
> >>>>>>> - foo.bar
> >>>>>>> - foo.bar.baz
> >>>>>>> 
> >>>>>>> 2. how to delete documents
> >>>>>>> 
> >>>>>>> Best regards,
> >>>>>>> ILYA
> >>>>>>> 
> >>>>>>> 
> >>>>>>> On 2019/01/30 23:33:22, Michael Fair <mich...@daclubhouse.net>
> >>>> wrote:
> >>>>>>>> On Wed, Jan 30, 2019, 12:57 PM Adam Kocoloski <kocol...@apache.org
> >>>> wrote:
> >>>>>>>> 
> >>>>>>>>> Hi Michael,
> >>>>>>>>> 
> >>>>>>>>>> The trivial fix is to use DOCID/REVISIONID as DOC_KEY.
> >>>>>>>>> 
> >>>>>>>>> Yes that’s definitely one way to address storage of edit
> >>>> conflicts. I
> >>>>>>>>> think there are other, more compact representations that we can
> >>>> explore if
> >>>>>>>>> we have this “exploded” data model where each scalar value maps
> >>>> to an
> >>>>>>>>> individual KV pair.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> I agree, as I mentioned on the original thread, I see a scheme,
> >>>> that
> >>>>>>>> handles both conflicts and revisions, where you only have to store
> >>>> the most
> >>>>>>>> recent change to a field.  Like you suggested, multiple revisions
> >>>> can share
> >>>>>>>> a key.  Which in my mind's eye further begs the conflicts/revisions
> >>>>>>>> discussion along with the working within the limits discussion
> >>>> because it
> >>>>>>>> seems to me they are all intrinsically related as a "feature".
> >>>>>>>> 
> >>>>>>>> Saying 'We'll break documents up into roughly 80k segments', then
> >>>> trying to
> >>>>>>>> overlay some kind of field sharing scheme for revisions/conflicts
> >>>> doesn't
> >>>>>>>> seem like it will work.
> >>>>>>>> 
> >>>>>>>> I probably should have left out the trivial fix proposal as I
> >>>> don't think
> >>>>>>>> it's a feasible solution to actually use.
> >>>>>>>> 
> >>>>>>>> The comment is more regarding that I do not see how this thread
> >>>> can escape
> >>>>>>>> including how to store/retrieve conflicts/revisions.
> >>>>>>>> 
> >>>>>>>> For instance, the 'doc as individual fields' proposal lends itself
> >>>> to value
> >>>>>>>> sharing across mutiple documents (and I don't just mean revisions
> >>>> of the
> >>>>>>>> same doc, I mean the same key/value instance could be shared for
> >>>> every
> >>>>>>>> document).
> >>>>>>>> However that's not really relevant if we're not considering the
> >>>> amount of
> >>>>>>>> shared information across documents in the storage scheme.
> >>>>>>>> 
> >>>>>>>> Simply storing documents in <100k segments (perhaps in some kind of
> >>>>>>>> compressed binary representation) to deal with that FDB limit
> >>>> seems fine.
> >>>>>>>> The only reason to consider doing something else is because of its
> >>>> impact
> >>>>>>>> to indexing, searches, reduce functions, revisions, on-disk size
> >>>> impact,
> >>>>>>>> etc.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>>> I'm assuming the process will flatten the key paths of the
> >>>> document into
> >>>>>>>>> an array and then request the value of each key as multiple
> >>>> parallel
> >>>>>>>>> queries against FDB at once
> >>>>>>>>> 
> >>>>>>>>> Ah, I think this is not one of Ilya’s assumptions. He’s trying
> >>>> to design a
> >>>>>>>>> model which allows the retrieval of a document with a single
> >>>> range read,
> >>>>>>>>> which is a good goal in my opinion.
> >>>>>>>>> 
> >>>>>>>> 
> >>>>>>>> I am not sure I agree.
> >>>>>>>> 
> >>>>>>>> Think of bitTorrent, a single range read should pull back the
> >>>> structure of
> >>>>>>>> the document (the pieces to fetch), but not necessarily the whole
> >>>> document.
> >>>>>>>> 
> >>>>>>>> What if you already have a bunch of pieces in common with other
> >>>> documents
> >>>>>>>> locally (a repeated header/footer/ or type for example); and you
> >>>> only need
> >>>>>>>> to get a few pieces of data you don't already have?
> >>>>>>>> 
> >>>>>>>> The real goal to Couch I see is to treat your document set like the
> >>>>>>>> collection of structured information that it is.  In some respects
> >>>> like an
> >>>>>>>> extension of your application's heap space for structured objects
> >>>> and
> >>>>>>>> efficiently querying that collection to get back subsets of the
> >>>> data.
> >>>>>>>> 
> >>>>>>>> Otherwise it seems more like a slightly upgraded file system plus
> >>>> a fancy
> >>>>>>>> grep/find like feature...
> >>>>>>>> 
> >>>>>>>> The best way I see to unlock more features/power is to a move
> >>>> towards a
> >>>>>>>> more granular and efficient way to store and retrieve the scalar
> >>>> values...
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> For example, hears a crazy thought:
> >>>>>>>> Map every distinct occurence of a key/value instance through a
> >>>> crypto hash
> >>>>>>>> function to get a set of hashes.
> >>>>>>>> 
> >>>>>>>> These can be be precomputed by Couch without any lookups in FDB.
> >>>> These
> >>>>>>>> will be spread all over kingdom come in FDB and not lend
> >>>> themselves to
> >>>>>>>> range search well.
> >>>>>>>> 
> >>>>>>>> So what you do is index them for frequency of occurring in the
> >>>> same set.
> >>>>>>>> In essence, you 'bucket them' statistically, and that bucket id
> >>>> becomes a
> >>>>>>>> key prefix. A crypto hash value can be copied into more than one
> >>>> bucket.
> >>>>>>>> The {bucket_id}/{cryptohash} becomes a {val_id}
> >>>>>>>> 
> >>>>>>>> When writing a document, Couch submits the list/array of
> >>>> cryptohash values
> >>>>>>>> it computed to FDB and gets back the corresponding  {val_id} (the
> >>>> id with
> >>>>>>>> the bucket prefixed).  This can get somewhat expensive if there's
> >>>> always a
> >>>>>>>> lot of app local cache misses.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> A document's value is then a series of {val_id} arrays up to 100k
> >>>> per
> >>>>>>>> segment.
> >>>>>>>> 
> >>>>>>>> When retrieving a document, you get the val_ids, find the distinct
> >>>> buckets
> >>>>>>>> and min/max entries for this doc, and then parallel query each
> >>>> bucket while
> >>>>>>>> reconstructing the document.
> >>>>>>>> 
> >>>>>>>> The values returned from the buckets query are the key/value
> >>>> strings
> >>>>>>>> required to reassemble this document.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> ----------
> >>>>>>>> I put this forward primarily to hilite the idea that trying to
> >>>> match the
> >>>>>>>> storage representation of documents in a straight forward way to
> >>>> FDB keys
> >>>>>>>> to reduce query count might not be the most performance oriented
> >>>> approach.
> >>>>>>>> 
> >>>>>>>> I'd much prefer a storage approach that reduced data duplication
> >>>> and
> >>>>>>>> enabled fast sub-document queries.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> This clearly falls in the realm of what people want the 'use case'
> >>>> of Couch
> >>>>>>>> to be/become.  By giving Couch more access to sub-document
> >>>> queries, I could
> >>>>>>>> eventually see queries as complicated as GraphQL submitted to
> >>>> Couch and
> >>>>>>>> pulling back ad-hoc aggregated data across multiple documents in a
> >>>> single
> >>>>>>>> application layer request.
> >>>>>>>> 
> >>>>>>>> Hehe - one way to look at the database of Couch documents is that
> >>>> they are
> >>>>>>>> all conflict revisions of the single root empty document.   What I
> >>>> mean be
> >>>>>>>> this is consider thinking of the entire document store as one
> >>>> giant DAG of
> >>>>>>>> key/value pairs. How even separate documents are still typically
> >>>> related to
> >>>>>>>> each other.  For most applications there is a tremendous amount of
> >>>> data
> >>>>>>>> redundancy between docs and especially between revisions of those
> >>>> docs...
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> And all this is a long way of saying "I think there could be a lot
> >>>> of value
> >>>>>>>> in assuming documents are 'assembled' from multiple queries to
> >>>> FDB, with
> >>>>>>>> local caching, instead of simply retrieved"
> >>>>>>>> 
> >>>>>>>> Thanks, I hope I'm not the only outlier here thinking this way!?
> >>>>>>>> 
> >>>>>>>> Mike :-)
> >>>>>>>> 
> >>>>>>> 
> >>>> 
> >> 
> 

Reply via email to