Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Jan Lehnardt Wed, 30 Jan 2019 06:43:28 -0800

+1

> On 30. Jan 2019, at 15:31, Paul Davis <paul.joseph.da...@gmail.com> wrote:
> 
> Jiffy preserves duplicate keys if its not decoding into a map (in
> which case last value for duplicate keys wins). Its significantly
> corner case and not at all supported by nearly any other JSON library
> so changing that shouldn't be considered a breaking change in my
> opinion.
> 
> On Wed, Jan 30, 2019 at 8:21 AM Mike Rhodes <couc...@dx13.co.uk> wrote:
>> 
>> From what I recall Jiffy is able to cope with the valid-but-kinda-silly[1] 
>> thing where you have multiple JSON keys with the same name, i.e., { "foo": 
>> 1, "foo": 2 }.
>> 
>> Are the proposals on the table able to continue this support (or am I wrong 
>> about Jiffy)?
>> 
>> [1] https://tools.ietf.org/html/rfc8259#section-4, "The names within an 
>> object SHOULD be unique.", though 
>> https://tools.ietf.org/html/rfc7493#section-2.3 does sensibly close that 
>> down.
>> 
>> --
>> Mike.
>> 
>> On Wed, 30 Jan 2019, at 13:33, Jan Lehnardt wrote:
>>> 
>>> 
>>>> On 30. Jan 2019, at 14:22, Jan Lehnardt <j...@apache.org> wrote:
>>>> 
>>>> Thanks Ilya for getting this started!
>>>> 
>>>> Two quick notes on this one:
>>>> 
>>>> 1. note that JSON does not guarantee object key order and that CouchDB has 
>>>> never guaranteed it either, and with say emit(doc.foo, doc.bar), if either 
>>>> emit() parameter was an object, the undefined-sort-order of SpiderMonkey 
>>>> would mix things up. While worth bringing up, this is not a BC break.
>>>> 
>>>> 2. This would have the fun property of being able to rename a key inside 
>>>> all docs that have that key.
>>> 
>>> …in one short operation.
>>> 
>>> Best
>>> Jan
>>> —
>>>> 
>>>> Best
>>>> Jan
>>>> —
>>>> 
>>>>> On 30. Jan 2019, at 14:05, Ilya Khlopotov <iil...@apache.org> wrote:
>>>>> 
>>>>> # First proposal
>>>>> 
>>>>> In order to overcome FoudationDB limitations on key size (10 kB) and 
>>>>> value size (100 kB) we could use the following approach.
>>>>> 
>>>>> Bellow the paths are using slash for illustration purposes only. We can 
>>>>> use nested subspaces, tuples, directories or something else.
>>>>> 
>>>>> - Store documents in a subspace or directory  (to keep prefix for a key 
>>>>> short)
>>>>> - When we store the document we would enumerate all field names (0 and 1 
>>>>> are reserved) and store the mapping table in the key which look like:
>>>>> ```
>>>>> {DB_DOCS_NS} / {DOC_KEY} / 0
>>>>> ```
>>>>> - Flatten the JSON document (convert it into key value pairs where the 
>>>>> key is `JSON_PATH` and value is `SCALAR_VALUE`)
>>>>> - Replace elements of JSON_PATH with integers from mapping table we 
>>>>> constructed earlier
>>>>> - When we have array use `1 / {array_idx}`
>>>>> - Store scalar values in the keys which look like the following (we use 
>>>>> `JSON_PATH` with integers).
>>>>> ```
>>>>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
>>>>> ```
>>>>> - If the scalar value exceeds 100kB we would split it and store every 
>>>>> part under key constructed as:
>>>>> ```
>>>>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
>>>>> ```
>>>>> 
>>>>> Since all parts of the documents are stored under a common `{DB_DOCS_NS} 
>>>>> / {DOC_KEY}` they will be stored on the same server most of the time. The 
>>>>> document can be retrieved by using range query 
>>>>> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} 
>>>>> / 0xFF")`). We can reconstruct the document since the mapping is returned 
>>>>> as well.
>>>>> 
>>>>> The downside of this approach is we wouldn't be able to ensure the same 
>>>>> order of keys in the JSON object. Currently the `jiffy` JSON encoder 
>>>>> respects order of keys.
>>>>> ```
>>>>> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
>>>>> <<"{\"bbb\":1,\"aaa\":12}">>
>>>>> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
>>>>> <<"{\"aaa\":12,\"bbb\":1}">>
>>>>> ```
>>>>> 
>>>>> Best regards,
>>>>> iilyak
>>>>> 
>>>>> On 2019/01/30 13:02:57, Ilya Khlopotov <iil...@apache.org> wrote:
>>>>>> As you might already know the FoundationDB has a number of limitations 
>>>>>> which influences the way we might store JSON documents. The limitations 
>>>>>> are:
>>>>>> 
>>>>>> |      limitation             |recommended value|recommended 
>>>>>> max|absolute max|
>>>>>> |-------------------------|----------------------:|--------------------:|--------------:|
>>>>>> | transaction duration  |                              |                 
>>>>>>           |      5 sec      |
>>>>>> | transaction data size |                              |                 
>>>>>>           |      10 Mb     |
>>>>>> | key size                   |                 32 bytes |                
>>>>>>    1 kB  |     10 kB      |
>>>>>> | value size                |                               |            
>>>>>>       10 kB |    100 kB     |
>>>>>> 
>>>>>> In order to fit the JSON document into 100kB we would have to partition 
>>>>>> it in some way. There are three ways of partitioning the document
>>>>>> 1. store multiple binary blobs (parts) in different keys
>>>>>> 2. flatten JSON structure and store every path leading to a scalar value 
>>>>>> under own key
>>>>>> 3. measure the size of different branches of a tree representing the 
>>>>>> JSON document (while we parse) and use another key for the branch when 
>>>>>> we about to exceed the limit
>>>>>> 
>>>>>> - The first approach is the simplest but it wouldn't allow us to access 
>>>>>> parts of the document.
>>>>>> - The downsides of a second approach are:
>>>>>> - flattened JSON structure would have long paths which means longer keys
>>>>>> - the scalar value cannot be more than 100kb (unless we split it as well)
>>>>>> - Third approach falls short in cases when the structure of the document 
>>>>>> doesn't allow a clean cut off branches:
>>>>>> - complex rules to handle all corner cases
>>>>>> 
>>>>>> The goals of this thread are:
>>>>>> - to collect ideas on how to encode and store the JSON document
>>>>>> - to comment on the collected ideas
>>>>>> 
>>>>>> Non goals:
>>>>>> - the storage of metadata for the document would be discussed elsewhere
>>>>>> - thumb stones
>>>>>> - edit conflicts
>>>>>> - revisions
>>>>>> 
>>>>>> Best regards,
>>>>>> iilyak
>>>>>> 
>>>> 
>>>> --
>>>> Professional Support for Apache CouchDB:
>>>> https://neighbourhood.ie/couchdb-support/
>>>> 
>>> 
>>> --
>>> Professional Support for Apache CouchDB:
>>> https://neighbourhood.ie/couchdb-support/
>>> 
>>>


-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Reply via email to