>From what I recall Jiffy is able to cope with the valid-but-kinda-silly[1]
>thing where you have multiple JSON keys with the same name, i.e., { "foo": 1,
>"foo": 2 }.
Are the proposals on the table able to continue this support (or am I wrong
about Jiffy)?
[1] https://tools.ietf.org/html/rfc8259#section-4, "The names within an object
SHOULD be unique.", though https://tools.ietf.org/html/rfc7493#section-2.3 does
sensibly close that down.
--
Mike.
On Wed, 30 Jan 2019, at 13:33, Jan Lehnardt wrote:
>
>
> > On 30. Jan 2019, at 14:22, Jan Lehnardt <[email protected]> wrote:
> >
> > Thanks Ilya for getting this started!
> >
> > Two quick notes on this one:
> >
> > 1. note that JSON does not guarantee object key order and that CouchDB has
> > never guaranteed it either, and with say emit(doc.foo, doc.bar), if either
> > emit() parameter was an object, the undefined-sort-order of SpiderMonkey
> > would mix things up. While worth bringing up, this is not a BC break.
> >
> > 2. This would have the fun property of being able to rename a key inside
> > all docs that have that key.
>
> …in one short operation.
>
> Best
> Jan
> —
> >
> > Best
> > Jan
> > —
> >
> >> On 30. Jan 2019, at 14:05, Ilya Khlopotov <[email protected]> wrote:
> >>
> >> # First proposal
> >>
> >> In order to overcome FoudationDB limitations on key size (10 kB) and value
> >> size (100 kB) we could use the following approach.
> >>
> >> Bellow the paths are using slash for illustration purposes only. We can
> >> use nested subspaces, tuples, directories or something else.
> >>
> >> - Store documents in a subspace or directory (to keep prefix for a key
> >> short)
> >> - When we store the document we would enumerate all field names (0 and 1
> >> are reserved) and store the mapping table in the key which look like:
> >> ```
> >> {DB_DOCS_NS} / {DOC_KEY} / 0
> >> ```
> >> - Flatten the JSON document (convert it into key value pairs where the key
> >> is `JSON_PATH` and value is `SCALAR_VALUE`)
> >> - Replace elements of JSON_PATH with integers from mapping table we
> >> constructed earlier
> >> - When we have array use `1 / {array_idx}`
> >> - Store scalar values in the keys which look like the following (we use
> >> `JSON_PATH` with integers).
> >> ```
> >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
> >> ```
> >> - If the scalar value exceeds 100kB we would split it and store every part
> >> under key constructed as:
> >> ```
> >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
> >> ```
> >>
> >> Since all parts of the documents are stored under a common `{DB_DOCS_NS} /
> >> {DOC_KEY}` they will be stored on the same server most of the time. The
> >> document can be retrieved by using range query
> >> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY}
> >> / 0xFF")`). We can reconstruct the document since the mapping is returned
> >> as well.
> >>
> >> The downside of this approach is we wouldn't be able to ensure the same
> >> order of keys in the JSON object. Currently the `jiffy` JSON encoder
> >> respects order of keys.
> >> ```
> >> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
> >> <<"{\"bbb\":1,\"aaa\":12}">>
> >> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
> >> <<"{\"aaa\":12,\"bbb\":1}">>
> >> ```
> >>
> >> Best regards,
> >> iilyak
> >>
> >> On 2019/01/30 13:02:57, Ilya Khlopotov <[email protected]> wrote:
> >>> As you might already know the FoundationDB has a number of limitations
> >>> which influences the way we might store JSON documents. The limitations
> >>> are:
> >>>
> >>> | limitation |recommended value|recommended max|absolute
> >>> max|
> >>> |-------------------------|----------------------:|--------------------:|--------------:|
> >>> | transaction duration | |
> >>> | 5 sec |
> >>> | transaction data size | |
> >>> | 10 Mb |
> >>> | key size | 32 bytes |
> >>> 1 kB | 10 kB |
> >>> | value size | |
> >>> 10 kB | 100 kB |
> >>>
> >>> In order to fit the JSON document into 100kB we would have to partition
> >>> it in some way. There are three ways of partitioning the document
> >>> 1. store multiple binary blobs (parts) in different keys
> >>> 2. flatten JSON structure and store every path leading to a scalar value
> >>> under own key
> >>> 3. measure the size of different branches of a tree representing the JSON
> >>> document (while we parse) and use another key for the branch when we
> >>> about to exceed the limit
> >>>
> >>> - The first approach is the simplest but it wouldn't allow us to access
> >>> parts of the document.
> >>> - The downsides of a second approach are:
> >>> - flattened JSON structure would have long paths which means longer keys
> >>> - the scalar value cannot be more than 100kb (unless we split it as well)
> >>> - Third approach falls short in cases when the structure of the document
> >>> doesn't allow a clean cut off branches:
> >>> - complex rules to handle all corner cases
> >>>
> >>> The goals of this thread are:
> >>> - to collect ideas on how to encode and store the JSON document
> >>> - to comment on the collected ideas
> >>>
> >>> Non goals:
> >>> - the storage of metadata for the document would be discussed elsewhere
> >>> - thumb stones
> >>> - edit conflicts
> >>> - revisions
> >>>
> >>> Best regards,
> >>> iilyak
> >>>
> >
> > --
> > Professional Support for Apache CouchDB:
> > https://neighbourhood.ie/couchdb-support/
> >
>
> --
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
>
>