Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

Ilya Khlopotov Fri, 08 Feb 2019 16:50:15 -0800

# Data model without support for per key revisions

In this model "per key revisions" support was sacrificed so we can avoid doing 
read of previous revision of the document when we write new version of it.


# Ranges used in the model

- `{NS} / _mapping / _last_field_id
- `{NS} / _mapping / _by_field / {field_name} = field_id` # we would cache it 
in Layer's memory
- `{NS} / _mapping / _by_field_id / {field_id} = field_name` # we would cache 
it in Layer's memory
- `{NS} / {docid} / _info` = '{"scheme": {scheme_name} / {scheme_revision}, 
"revision": {revision}}' 
- `{NS} / {docid} / _data / {compressed_json_path} = latest_value | part`
- `{NS} / {docid} / {revision} / _info` = '{"scheme": {scheme_name} / 
{scheme_revision}}'
- `{NS} / {docid} / {revision} / _data / {compressed_json_path} = value | part`
- `{NS} / {docid} / _index / _revs / {is_deleted} / {rev_pos} / {revision} = 
{parent_revision}`
- `{NS} / _index / _by_seq / {seq}` = "{docid} / {revision}" # seq is a FDB 
versionstamp

We would have few special documents:
- "_schema / {schema_name}" - this doc would contain validation rules for 
schema (not used in MVP).
- when we start using schema we would be able to populate `{NS} / _mapping / 
xxx` range when we write schema document
- the schema document MUST fit into 100K (we don't use flatten JSON model for 
it)

# JSON path compression

- Assign integer field_id to every unique field_name of a JSON document 
starting from 10.
- We would use first 10 integers to encode type of the value:
  - 0 - the value is an array
  - 1 - the value is a big scalar value broken down into multiple parts 
  - 2..10 -- reserved for future use
- Replace field names in JSON path with field IDs

## Example of compressed JSON 
```
{
    foo: {
        bar: {
          baz: [1, 2, 3]
        },
        langs: {
           "en_US": "English",
           "en_UK": "English (UK)" 
           "en_CA": "English (Canada)",
           "zh_CN": "Chinese (China)" 
        },
        translations: {
           "en_US": {
               "license": "200 Kb of text"
           }
        }
    }
}
```
this document would be compressed into
```
# written in separate transaction and cached in the Layer
{NS} / _mapping / _by_field / foo = 10
{NS} / _mapping / _by_field / bar = 12
{NS} / _mapping / _by_field / baz = 11
{NS} / _mapping / _by_field / langs = 18
{NS} / _mapping / _by_field / en_US = 13
{NS} / _mapping / _by_field / en_UK = 14
{NS} / _mapping / _by_field / en_CA = 15
{NS} / _mapping / _by_field / zh_CN = 16
{NS} / _mapping / _by_field / translations = 17
{NS} / _mapping / _by_field / license = 19
{NS} / _mapping / _by_field_id / 10 = foo
{NS} / _mapping / _by_field_id / 12 = bar
{NS} / _mapping / _by_field_id / 11 = baz
{NS} / _mapping / _by_field_id  / 18 = langs
{NS} / _mapping / _by_field_id  / 13 = en_US
{NS} / _mapping / _by_field_id  / 14 = en_UK
{NS} / _mapping / _by_field_id  / 15 = en_CA
{NS} / _mapping / _by_field_id  / 16 = zh_CN
{NS} / _mapping / _by_field_id  / 17 = translations
{NS} / _mapping / _by_field_id  / 19 = license

# written on document write
{NS} / {docid} / _data / 10 /12 / 11 / 0 / 0 = 1
{NS} / {docid} / _data / 10 /12 / 11 / 0 / 1 = 2
{NS} / {docid} / _data / 10 /12 / 11 / 0 / 2 = 3
{NS} / {docid} / _data / 10 / 18 / 13 = English
{NS} / {docid} / _data / 10 / 18 / 14 = English (UK)
{NS} / {docid} / _data / 10 / 18 / 15 = English (Canada)
{NS} / {docid} / _data / 10 / 18 / 16 = Chinese (China)
{NS} / {docid} / _data / 10 / 17 / 13 / 19 / 1 / 0 = first 100K of license
{NS} / {docid} / _data / 10 / 17 / 13 / 19 / 1 / 1 = second 100K of license
```

# Operations


## Read latest revision

- We do range read "{NS} / {docid}" and assemble documents using results of the 
query. 
- If we cannot find field_id in Layer's cache we would read "{NS} / _mapping / 
_by_field_id " range and cache the result.

## Read specified revision

- Do a range read "`{NS} / {docid} / {revision} /" and assemble document using 
result of the query
- If we cannot find field_id in Layer's cache we would read "{NS} / _mapping / 
_by_field_id " range and cache the result.

## Write 

- flatten JSON
- check if we there are missing fields in field cache of the Layer
- if the keys are missing start key allocation transaction 
  - read "{NS} / _mapping / _by_field / {field_name}"
    - if it doesn't exists add key to the write conflict range (the FDB would 
do it by default)
  - `field_idx = txn["{NS} / _mapping / _last_field_id"] + 1` and add it to the 
write conflict range (the FDB would do it by default)
  - write `"{NS} / _mapping / _last_field_id" = field_idx`
  - write `"{NS} / _mapping / _by_field / {field_name}" = field_idx` 
  - write `"{NS} / _mapping / _by_field_id / {field_idx}" = field_name` 
- read `{NS} / {docid} / _info`, verify that revision is equal to specified 
parent_revision and add the key into write conflict range
- generate new_revision
- write all fields into two ranges (split big values as needed)
   - "{NS} / {docid} / _data / {compressed_json_path}"
   - "{NS} / {docid} / {new_revision} / _data / {compressed_json_path}"
- write into following keys
  - `{NS} / {docid} / _info` = '{"scheme": {scheme_name} / {scheme_revision}, 
"revision": {revision}}' 
  - `{NS} / {docid} / {new_revision} / _info` = '{"scheme": {scheme_name} / 
{scheme_revision}}'
  - `{NS} / {docid} / _index / _revs / {is_deleted} / {rev_pos} / 
{new_revision} = {parent_revision}`
  - `{NS} / _index / _by_seq / {seq}` = "{docid} / {revision}" # seq is a FDB 
versionstamp
- update database stats
  - `{NS} / _meta / number_of_docs` += 1
  - `{NS} / _meta / external_size` += external_size

## Get list of all known revisions for the document

- range query `{NS} / {docid} / _index / _revs /`

## Changes feed

- we would set a watch for `{NS} / _meta / external_size` key
- when watch is fired we would do a range query starting from "{NS} / _index / 
_by_seq / {since_seq}"
- remember last key returned by range query to set a new value for since_seq

best regards,
iilyak
On 2019/02/04 19:25:13, Ilya Khlopotov <[email protected]> wrote: 
> This is a beginning of a discussion thread about storage of edit conflicts 
> and everything which relates to revisions.
> 
> 
>

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

Reply via email to