Ah sure, if we store the *cough* schema per doc, then it's not that easy. An iteration of this proposal could store paths globally with ids that the k/v store then uses for keys, which would enable what I described, but happy to ignore this for the time being. :)
Cheers Jan — > On 30. Jan 2019, at 17:58, Adam Kocoloski <kocol...@apache.org> wrote: > > Jan, I don’t think it does have that "fun property #2", as the mapping is > created separately for each document. In this proposal the field name “foo” > could map to 2 in one document and 42 in another. > > Thanks for the proposal Ilya. Personally I wonder if the 10KB limit on field > paths is anything more than a theoretical concern. It’s hard for me to > imagine a useful schema that would get anywhere near that deep, but maybe I’m > insufficiently creative :) There’s certainly a storage overhead from > repeating the upper portion of a path over and over again, but that’s also > something the storage engine can optimize away through prefix elision. The > current production storage engine in FoundationDB does not do this elision, > but the new one in development does. > > The value size limit is probably not so theoretical. I think as a project we > could choose to impose a 100KB size limit on scalar values - a user who had a > string longer than 100KB could chunk it up into an array of strings pretty > easily to work around that limit. But let’s say we don’t want to impose that > limit. In your design, how do I distinguish {PART_IDX} from the elements of > the {JSON_PATH}? I was kind of expecting to see some magic value indicating > that the subsequent set of keys with the same prefix are all elements of a > “multi-part object”: > > {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} = kMULTIPART > {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX} = “First 100 KB …" > ... > > You might have figured out something more efficient that saves a KV here but > I can’t quite grok it. > > Cheers, Adam > > >> On Jan 30, 2019, at 8:24 AM, Jan Lehnardt <j...@apache.org> wrote: >> >> >> >>> On 30. Jan 2019, at 14:22, Jan Lehnardt <j...@apache.org >>> <mailto:j...@apache.org>> wrote: >>> >>> Thanks Ilya for getting this started! >>> >>> Two quick notes on this one: >>> >>> 1. note that JSON does not guarantee object key order and that CouchDB has >>> never guaranteed it either, and with say emit(doc.foo, doc.bar), if either >>> emit() parameter was an object, the undefined-sort-order of SpiderMonkey >>> would mix things up. While worth bringing up, this is not a BC break. >>> >>> 2. This would have the fun property of being able to rename a key inside >>> all docs that have that key. >> >> …in one short operation. >> >> Best >> Jan >> — >>> >>> Best >>> Jan >>> — >>> >>>> On 30. Jan 2019, at 14:05, Ilya Khlopotov <iil...@apache.org> wrote: >>>> >>>> # First proposal >>>> >>>> In order to overcome FoudationDB limitations on key size (10 kB) and value >>>> size (100 kB) we could use the following approach. >>>> >>>> Bellow the paths are using slash for illustration purposes only. We can >>>> use nested subspaces, tuples, directories or something else. >>>> >>>> - Store documents in a subspace or directory (to keep prefix for a key >>>> short) >>>> - When we store the document we would enumerate all field names (0 and 1 >>>> are reserved) and store the mapping table in the key which look like: >>>> ``` >>>> {DB_DOCS_NS} / {DOC_KEY} / 0 >>>> ``` >>>> - Flatten the JSON document (convert it into key value pairs where the key >>>> is `JSON_PATH` and value is `SCALAR_VALUE`) >>>> - Replace elements of JSON_PATH with integers from mapping table we >>>> constructed earlier >>>> - When we have array use `1 / {array_idx}` >>>> - Store scalar values in the keys which look like the following (we use >>>> `JSON_PATH` with integers). >>>> ``` >>>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} >>>> ``` >>>> - If the scalar value exceeds 100kB we would split it and store every part >>>> under key constructed as: >>>> ``` >>>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX} >>>> ``` >>>> >>>> Since all parts of the documents are stored under a common `{DB_DOCS_NS} / >>>> {DOC_KEY}` they will be stored on the same server most of the time. The >>>> document can be retrieved by using range query >>>> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} >>>> / 0xFF")`). We can reconstruct the document since the mapping is returned >>>> as well. >>>> >>>> The downside of this approach is we wouldn't be able to ensure the same >>>> order of keys in the JSON object. Currently the `jiffy` JSON encoder >>>> respects order of keys. >>>> ``` >>>> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}). >>>> <<"{\"bbb\":1,\"aaa\":12}">> >>>> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}). >>>> <<"{\"aaa\":12,\"bbb\":1}">> >>>> ``` >>>> >>>> Best regards, >>>> iilyak >>>> >>>>> On 2019/01/30 13:02:57, Ilya Khlopotov <iil...@apache.org> wrote: >>>>> As you might already know the FoundationDB has a number of limitations >>>>> which influences the way we might store JSON documents. The limitations >>>>> are: >>>>> >>>>> | limitation |recommended value|recommended max|absolute >>>>> max| >>>>> |-------------------------|----------------------:|--------------------:|--------------:| >>>>> | transaction duration | | >>>>> | 5 sec | >>>>> | transaction data size | | >>>>> | 10 Mb | >>>>> | key size | 32 bytes | >>>>> 1 kB | 10 kB | >>>>> | value size | | >>>>> 10 kB | 100 kB | >>>>> >>>>> In order to fit the JSON document into 100kB we would have to partition >>>>> it in some way. There are three ways of partitioning the document >>>>> 1. store multiple binary blobs (parts) in different keys >>>>> 2. flatten JSON structure and store every path leading to a scalar value >>>>> under own key >>>>> 3. measure the size of different branches of a tree representing the JSON >>>>> document (while we parse) and use another key for the branch when we >>>>> about to exceed the limit >>>>> >>>>> - The first approach is the simplest but it wouldn't allow us to access >>>>> parts of the document. >>>>> - The downsides of a second approach are: >>>>> - flattened JSON structure would have long paths which means longer keys >>>>> - the scalar value cannot be more than 100kb (unless we split it as well) >>>>> - Third approach falls short in cases when the structure of the document >>>>> doesn't allow a clean cut off branches: >>>>> - complex rules to handle all corner cases >>>>> >>>>> The goals of this thread are: >>>>> - to collect ideas on how to encode and store the JSON document >>>>> - to comment on the collected ideas >>>>> >>>>> Non goals: >>>>> - the storage of metadata for the document would be discussed elsewhere >>>>> - thumb stones >>>>> - edit conflicts >>>>> - revisions >>>>> >>>>> Best regards, >>>>> iilyak >>>>> >>> >>> -- >>> Professional Support for Apache CouchDB: >>> https://neighbourhood.ie/couchdb-support/ >>> >> >> -- >> Professional Support for Apache CouchDB: >> https://neighbourhood.ie/couchdb-support/ >> <https://neighbourhood.ie/couchdb-support/>