Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-07 Thread Garren Smith
Hi Adam,

Thanks for the detailed email. In terms of the data model, that makes a lot
of sense.

I’m still playing a bit of catchup on understanding how fdb works, so I
can’t comment on the best way to retrieve a document.

>From my side, I would like to see our decisions also driven by testing and
validating that our data model works. I find the way that fdb was tested
and built really impressive. I would love to see us apply some of that to
the way we build our CouchDB layer.

Cheers
Garren

On Fri, Feb 8, 2019 at 5:35 AM Adam Kocoloski  wrote:

> Bob, Garren, Jan - heard you loud and clear, K.I.S.S. I do think it’s a
> bit “simplistic" to exclusively choose simplicity over performance and
> storage density. We’re (re)building a database here, one that has some
> users with pretty demanding performance and scalability requirements. And
> yes, we should certainly be testing and measuring. Kyle and team are
> setting up infrastructure in IBM land to help with that now, but I also
> believe we can design the data model and architecture with a basic
> performance model of FoundationDB in mind:
>
> - reads cost 1ms
> - short range reads are the same cost as a single lookup
> - reads of independent parts of the keyspace can be parallelized for cheap
> - writes are zero-cost until commit time
>
> We ought to be able to use these assumptions to drive some decisions about
> data models ahead of any end-to-end performance test.
>
> If there are specific elements of the edit conflicts management where you
> think greater simplicity is warranted, let’s get those called out. Ilya
> noted (correctly, in my opinion) that the term sharing stuff is one of
> those items. It’s relatively complex, potentially a performance hit, and
> only saves on storage density in the corner case of lots of edit conflicts.
> That’s a good one to drop.
>
> I’m relatively happy with the revision history data model at this point.
> Hopefully folks find it easy to grok, and it’s efficient for both reads and
> writes. It costs some extra storage for conflict revisions compared to the
> current tree representation (up to 16K per edit branch, with default
> _revs_limit) but knowing what we know about the performance death spiral
> for wide revision trees today I’ll happily make a storage vs. performance
> tradeoff here :)
>
> Setting the shared term approach aside, I’ve still been mulling over the
> key structure for the actual document data:
>
> -  I thought about trying to construct a special _conflicts subspace, but
> I don’t like that approach because the choice of a “winning" revision can
> flip back and forth very quickly with concurrent writers to different edit
> branches. I think we really want to have a way for revisions to naturally
> sort themselves so the winner is the first or last revision in a list.
>
> - Assuming we’re using key paths of the form (docid, revision-ish, path,
> to, field), the goal here is to find an efficient way to get the last key
> with prefix “docid” (assuming winner sorts last), and then all the keys
> that share the same (docid, revision-ish) prefix as that one. I see two
> possible approaches so far, neither perfect:
>
> Option 1: Execute a get_key() operation with a key selector that asks for
> the last key less than “docid\xFF” (again assuming winner sorts last), and
> then do a get_range_startswith() request setting the streaming mode to
> “want_all” and the prefix to the docid plus whatever revision-ish we found
> from the get_key() request. This is two roundtrips instead of one, but it
> always retrieves exactly the right set of keys, and the second step is
> executed as fast as possible.
>
> Option 2: Jump straight to get_range_startswith() request using only
> “docid” as the prefix, then cancel the iteration once we reach a revision
> not equal to the first one we see. We might transfer too much data, or we
> might end up doing multiple roundtrips if the default “iterator” streaming
> mode sends too little data to start (I haven’t checked what the default
> iteration block is there), but in the typical case of zero edit conflicts
> we have a good chance of retrieving the full document in one roundtrip.
>
> I don’t have a good sense of which option wins out here from a performance
> perspective, but they’re both operating on the same data model so easy
> enough to test the alternatives. The important bit is getting the
> revision-ish things to sort correctly. I think we can do that by generating
> something like
>
> revision-ish = NotDeleted/1bit : RevPos : RevHash
>
> with some suitable order-preserving encoding on the RevPos integer.
>
> Apologies for the long email. Happy for any comments, either here or over
> on IRC. Cheers,
>
> Adam
>
> > On Feb 7, 2019, at 4:52 PM, Robert Newson  wrote:
> >
> > I think we should choose simple. We can then see if performance is too
> low or storage overhead too high and then see what we can do about it.
> >
> > B.
> >
> > --
> >  Robert Samuel Newson
> >  rnew.

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-07 Thread Adam Kocoloski
Bob, Garren, Jan - heard you loud and clear, K.I.S.S. I do think it’s a bit 
“simplistic" to exclusively choose simplicity over performance and storage 
density. We’re (re)building a database here, one that has some users with 
pretty demanding performance and scalability requirements. And yes, we should 
certainly be testing and measuring. Kyle and team are setting up infrastructure 
in IBM land to help with that now, but I also believe we can design the data 
model and architecture with a basic performance model of FoundationDB in mind:

- reads cost 1ms
- short range reads are the same cost as a single lookup
- reads of independent parts of the keyspace can be parallelized for cheap
- writes are zero-cost until commit time

We ought to be able to use these assumptions to drive some decisions about data 
models ahead of any end-to-end performance test.

If there are specific elements of the edit conflicts management where you think 
greater simplicity is warranted, let’s get those called out. Ilya noted 
(correctly, in my opinion) that the term sharing stuff is one of those items. 
It’s relatively complex, potentially a performance hit, and only saves on 
storage density in the corner case of lots of edit conflicts. That’s a good one 
to drop.

I’m relatively happy with the revision history data model at this point. 
Hopefully folks find it easy to grok, and it’s efficient for both reads and 
writes. It costs some extra storage for conflict revisions compared to the 
current tree representation (up to 16K per edit branch, with default 
_revs_limit) but knowing what we know about the performance death spiral for 
wide revision trees today I’ll happily make a storage vs. performance tradeoff 
here :) 

Setting the shared term approach aside, I’ve still been mulling over the key 
structure for the actual document data:

-  I thought about trying to construct a special _conflicts subspace, but I 
don’t like that approach because the choice of a “winning" revision can flip 
back and forth very quickly with concurrent writers to different edit branches. 
I think we really want to have a way for revisions to naturally sort themselves 
so the winner is the first or last revision in a list.

- Assuming we’re using key paths of the form (docid, revision-ish, path, to, 
field), the goal here is to find an efficient way to get the last key with 
prefix “docid” (assuming winner sorts last), and then all the keys that share 
the same (docid, revision-ish) prefix as that one. I see two possible 
approaches so far, neither perfect:

Option 1: Execute a get_key() operation with a key selector that asks for the 
last key less than “docid\xFF” (again assuming winner sorts last), and then do 
a get_range_startswith() request setting the streaming mode to “want_all” and 
the prefix to the docid plus whatever revision-ish we found from the get_key() 
request. This is two roundtrips instead of one, but it always retrieves exactly 
the right set of keys, and the second step is executed as fast as possible.

Option 2: Jump straight to get_range_startswith() request using only “docid” as 
the prefix, then cancel the iteration once we reach a revision not equal to the 
first one we see. We might transfer too much data, or we might end up doing 
multiple roundtrips if the default “iterator” streaming mode sends too little 
data to start (I haven’t checked what the default iteration block is there), 
but in the typical case of zero edit conflicts we have a good chance of 
retrieving the full document in one roundtrip.

I don’t have a good sense of which option wins out here from a performance 
perspective, but they’re both operating on the same data model so easy enough 
to test the alternatives. The important bit is getting the revision-ish things 
to sort correctly. I think we can do that by generating something like

revision-ish = NotDeleted/1bit : RevPos : RevHash

with some suitable order-preserving encoding on the RevPos integer.

Apologies for the long email. Happy for any comments, either here or over on 
IRC. Cheers,

Adam

> On Feb 7, 2019, at 4:52 PM, Robert Newson  wrote:
> 
> I think we should choose simple. We can then see if performance is too low or 
> storage overhead too high and then see what we can do about it.
> 
> B.
> 
> -- 
>  Robert Samuel Newson
>  rnew...@apache.org
> 
> On Thu, 7 Feb 2019, at 20:36, Ilya Khlopotov wrote:
>> We cannot do simple thing if we want to support sharing of JSON terms. I 
>> think if we want the simplest path we should move sharing out of the 
>> scope. The problem with sharing is we need to know the location of 
>> shared terms when we do write. This means that we have to read full 
>> document on every write. There might be tricks to replace full document 
>> read with some sort of hierarchical signature or sketch of a document. 
>> However these tricks do not fall into simplest solution category. We 
>> need to choose the design goals:
>> - simple
>> - performance
>> - reduced

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-07 Thread Robert Newson
I think we should choose simple. We can then see if performance is too low or 
storage overhead too high and then see what we can do about it.

B.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Thu, 7 Feb 2019, at 20:36, Ilya Khlopotov wrote:
> We cannot do simple thing if we want to support sharing of JSON terms. I 
> think if we want the simplest path we should move sharing out of the 
> scope. The problem with sharing is we need to know the location of 
> shared terms when we do write. This means that we have to read full 
> document on every write. There might be tricks to replace full document 
> read with some sort of hierarchical signature or sketch of a document. 
> However these tricks do not fall into simplest solution category. We 
> need to choose the design goals:
> - simple
> - performance
> - reduced storage overhead
> 
> best regards,
> iilyak
> 
> On 2019/02/07 12:45:34, Garren Smith  wrote: 
> > I’m also in favor of keeping it really simple and then testing and
> > measuring it.
> > 
> > What is the best way to measure that we have something that works? I’m not
> > sure just relying on our current tests will prove that? Should we define
> > and build some more complex situations e.g docs with lots of conflicts or
> > docs with wide revisions and make sure we can solve for those?
> > 
> > On Thu, Feb 7, 2019 at 12:33 PM Jan Lehnardt  wrote:
> > 
> > > I’m also very much in favour with starting with the simplest thing that
> > > can possibly work and doesn’t go against the advertised best practices of
> > > FoundationDB. Let’s get that going and get a feel for how it all works
> > > together, before trying to optimise things we can’t measure yet.
> > >
> > > Best
> > > Jan
> > > —
> > >
> > > > On 6. Feb 2019, at 16:58, Robert Samuel Newson 
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > With the Redwood storage engine under development and with prefix
> > > elision part of its design, I don’t think we should get too hung up on
> > > adding complications and indirections in the key space just yet. We 
> > > haven’t
> > > written a line of code or run any tests, this is premature optimisation.
> > > >
> > > > I’d like to focus on the simplest solution that yields all required
> > > properties. We can embellish later (if warranted).
> > > >
> > > > I am intrigued by all the ideas that might allow us cheaper inserts and
> > > updates than the current code where there are multiple edit branches in 
> > > the
> > > stored document.
> > > >
> > > > B.
> > > >
> > > >> On 6 Feb 2019, at 02:18, Ilya Khlopotov  wrote:
> > > >>
> > > >> While reading Adam's proposal I came to realize that: we don't have to
> > > calculate winning revision at read time.
> > > >> Since FDB's transactions are atomic we can calculate it when we write.
> > > This means we can just write latest values into separate range. This makes
> > > lookup of latest version fast.
> > > >> Another realization is if we want to share values for some json paths
> > > we would have to introduce a level of indirection.
> > > >> Bellow is the data model inspired by Adam's idea to share json_paths.
> > > In this model the json_path is stored in the revision where it was first
> > > added (we call that revision an owner of a json_path). The values for
> > > json_path key can be scalar values, parts of scalar values or pointers to
> > > owner location.
> > > >> The below snippets are sketches of transactions.
> > > >> The transactions will include updates to other keys as needed
> > > (`external_size`, `by_seq` and so on).  The revision tree management is 
> > > not
> > > covered yet.
> > > >> The `rev -> vsn` indirection is not strictly required. It is added
> > > because it saves some space since `rev` is a long string and `vsn` is FDB
> > > versionstamp of fixed size.
> > > >>
> > > >> - `{NS} / {docid} / _by_rev / {rev} = vsn`
> > > >> - `{NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL`
> > > >> - `{NS} / {docid} / _data / {json_path} = latest_value | part`
> > > >> - `{NS} / {docid} / {vsn} / _data / {json_path} = value | part |
> > > {another_vsn}`
> > > >>
> > > >> ```
> > > >> write(txn, doc_id, prev_rev, json):
> > > >>  txn.add_write_conflict_key("{NS} / {doc_id} / _rev")
> > > >>  rev = generate_new_rev()
> > > >>  txn["{NS} / {docid} / _by_rev / {rev}"] = vsn
> > > >>  for every json_path in flattened json
> > > >>- {NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL
> > > >>if rev is HEAD:
> > > >>  # this range contains values for all json paths for the latest
> > > revision (read optimization)
> > > >>  - {NS} / {docid} / _data / {json_path} = latest_value | part
> > > >>- {NS} / {docid} / {vsn} / _data / {json_path} = value | part |
> > > {another_vsn}
> > > >>  txn["{NS} / {doc_id} / _rev"] = rev
> > > >>
> > > >> get_current(txn, doc_id):
> > > >> # there is no sharing of json_paths in this range (read optimization)
> > > >> txn.get_range("{NS} / {docid} / _data / 0x00", "{N

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-07 Thread Adam Kocoloski
I am OK not doing term sharing. I would like to keep these two performance 
efficiencies discussed earlier:

1) A client can retrieve the “winning” revision of the document (and only that 
revision) in a single roundtrip knowing only the document ID
2) A client can update an edit branch of a document without retrieving the body 
of the current revision

These are two very common access patterns and we should optimize for them. I 
don’t think we quite have that sorted out yet.

Adam

> On Feb 7, 2019, at 3:36 PM, Ilya Khlopotov  wrote:
> 
> We cannot do simple thing if we want to support sharing of JSON terms. I 
> think if we want the simplest path we should move sharing out of the scope. 
> The problem with sharing is we need to know the location of shared terms when 
> we do write. This means that we have to read full document on every write. 
> There might be tricks to replace full document read with some sort of 
> hierarchical signature or sketch of a document. However these tricks do not 
> fall into simplest solution category. We need to choose the design goals:
> - simple
> - performance
> - reduced storage overhead
> 
> best regards,
> iilyak
> 
> On 2019/02/07 12:45:34, Garren Smith  wrote: 
>> I’m also in favor of keeping it really simple and then testing and
>> measuring it.
>> 
>> What is the best way to measure that we have something that works? I’m not
>> sure just relying on our current tests will prove that? Should we define
>> and build some more complex situations e.g docs with lots of conflicts or
>> docs with wide revisions and make sure we can solve for those?
>> 
>> On Thu, Feb 7, 2019 at 12:33 PM Jan Lehnardt  wrote:
>> 
>>> I’m also very much in favour with starting with the simplest thing that
>>> can possibly work and doesn’t go against the advertised best practices of
>>> FoundationDB. Let’s get that going and get a feel for how it all works
>>> together, before trying to optimise things we can’t measure yet.
>>> 
>>> Best
>>> Jan
>>> —
>>> 
 On 6. Feb 2019, at 16:58, Robert Samuel Newson 
>>> wrote:
 
 Hi,
 
 With the Redwood storage engine under development and with prefix
>>> elision part of its design, I don’t think we should get too hung up on
>>> adding complications and indirections in the key space just yet. We haven’t
>>> written a line of code or run any tests, this is premature optimisation.
 
 I’d like to focus on the simplest solution that yields all required
>>> properties. We can embellish later (if warranted).
 
 I am intrigued by all the ideas that might allow us cheaper inserts and
>>> updates than the current code where there are multiple edit branches in the
>>> stored document.
 
 B.
 
> On 6 Feb 2019, at 02:18, Ilya Khlopotov  wrote:
> 
> While reading Adam's proposal I came to realize that: we don't have to
>>> calculate winning revision at read time.
> Since FDB's transactions are atomic we can calculate it when we write.
>>> This means we can just write latest values into separate range. This makes
>>> lookup of latest version fast.
> Another realization is if we want to share values for some json paths
>>> we would have to introduce a level of indirection.
> Bellow is the data model inspired by Adam's idea to share json_paths.
>>> In this model the json_path is stored in the revision where it was first
>>> added (we call that revision an owner of a json_path). The values for
>>> json_path key can be scalar values, parts of scalar values or pointers to
>>> owner location.
> The below snippets are sketches of transactions.
> The transactions will include updates to other keys as needed
>>> (`external_size`, `by_seq` and so on).  The revision tree management is not
>>> covered yet.
> The `rev -> vsn` indirection is not strictly required. It is added
>>> because it saves some space since `rev` is a long string and `vsn` is FDB
>>> versionstamp of fixed size.
> 
> - `{NS} / {docid} / _by_rev / {rev} = vsn`
> - `{NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL`
> - `{NS} / {docid} / _data / {json_path} = latest_value | part`
> - `{NS} / {docid} / {vsn} / _data / {json_path} = value | part |
>>> {another_vsn}`
> 
> ```
> write(txn, doc_id, prev_rev, json):
> txn.add_write_conflict_key("{NS} / {doc_id} / _rev")
> rev = generate_new_rev()
> txn["{NS} / {docid} / _by_rev / {rev}"] = vsn
> for every json_path in flattened json
>   - {NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL
>   if rev is HEAD:
> # this range contains values for all json paths for the latest
>>> revision (read optimization)
> - {NS} / {docid} / _data / {json_path} = latest_value | part
>   - {NS} / {docid} / {vsn} / _data / {json_path} = value | part |
>>> {another_vsn}
> txn["{NS} / {doc_id} / _rev"] = rev
> 
> get_current(txn, doc_id):
> # there is no sharing of json_paths in this range (rea

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-07 Thread Ilya Khlopotov
We cannot do simple thing if we want to support sharing of JSON terms. I think 
if we want the simplest path we should move sharing out of the scope. The 
problem with sharing is we need to know the location of shared terms when we do 
write. This means that we have to read full document on every write. There 
might be tricks to replace full document read with some sort of hierarchical 
signature or sketch of a document. However these tricks do not fall into 
simplest solution category. We need to choose the design goals:
- simple
- performance
- reduced storage overhead

best regards,
iilyak

On 2019/02/07 12:45:34, Garren Smith  wrote: 
> I’m also in favor of keeping it really simple and then testing and
> measuring it.
> 
> What is the best way to measure that we have something that works? I’m not
> sure just relying on our current tests will prove that? Should we define
> and build some more complex situations e.g docs with lots of conflicts or
> docs with wide revisions and make sure we can solve for those?
> 
> On Thu, Feb 7, 2019 at 12:33 PM Jan Lehnardt  wrote:
> 
> > I’m also very much in favour with starting with the simplest thing that
> > can possibly work and doesn’t go against the advertised best practices of
> > FoundationDB. Let’s get that going and get a feel for how it all works
> > together, before trying to optimise things we can’t measure yet.
> >
> > Best
> > Jan
> > —
> >
> > > On 6. Feb 2019, at 16:58, Robert Samuel Newson 
> > wrote:
> > >
> > > Hi,
> > >
> > > With the Redwood storage engine under development and with prefix
> > elision part of its design, I don’t think we should get too hung up on
> > adding complications and indirections in the key space just yet. We haven’t
> > written a line of code or run any tests, this is premature optimisation.
> > >
> > > I’d like to focus on the simplest solution that yields all required
> > properties. We can embellish later (if warranted).
> > >
> > > I am intrigued by all the ideas that might allow us cheaper inserts and
> > updates than the current code where there are multiple edit branches in the
> > stored document.
> > >
> > > B.
> > >
> > >> On 6 Feb 2019, at 02:18, Ilya Khlopotov  wrote:
> > >>
> > >> While reading Adam's proposal I came to realize that: we don't have to
> > calculate winning revision at read time.
> > >> Since FDB's transactions are atomic we can calculate it when we write.
> > This means we can just write latest values into separate range. This makes
> > lookup of latest version fast.
> > >> Another realization is if we want to share values for some json paths
> > we would have to introduce a level of indirection.
> > >> Bellow is the data model inspired by Adam's idea to share json_paths.
> > In this model the json_path is stored in the revision where it was first
> > added (we call that revision an owner of a json_path). The values for
> > json_path key can be scalar values, parts of scalar values or pointers to
> > owner location.
> > >> The below snippets are sketches of transactions.
> > >> The transactions will include updates to other keys as needed
> > (`external_size`, `by_seq` and so on).  The revision tree management is not
> > covered yet.
> > >> The `rev -> vsn` indirection is not strictly required. It is added
> > because it saves some space since `rev` is a long string and `vsn` is FDB
> > versionstamp of fixed size.
> > >>
> > >> - `{NS} / {docid} / _by_rev / {rev} = vsn`
> > >> - `{NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL`
> > >> - `{NS} / {docid} / _data / {json_path} = latest_value | part`
> > >> - `{NS} / {docid} / {vsn} / _data / {json_path} = value | part |
> > {another_vsn}`
> > >>
> > >> ```
> > >> write(txn, doc_id, prev_rev, json):
> > >>  txn.add_write_conflict_key("{NS} / {doc_id} / _rev")
> > >>  rev = generate_new_rev()
> > >>  txn["{NS} / {docid} / _by_rev / {rev}"] = vsn
> > >>  for every json_path in flattened json
> > >>- {NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL
> > >>if rev is HEAD:
> > >>  # this range contains values for all json paths for the latest
> > revision (read optimization)
> > >>  - {NS} / {docid} / _data / {json_path} = latest_value | part
> > >>- {NS} / {docid} / {vsn} / _data / {json_path} = value | part |
> > {another_vsn}
> > >>  txn["{NS} / {doc_id} / _rev"] = rev
> > >>
> > >> get_current(txn, doc_id):
> > >> # there is no sharing of json_paths in this range (read optimization)
> > >> txn.get_range("{NS} / {docid} / _data / 0x00", "{NS} / {docid} / _data
> > / 0xFF" )
> > >>
> > >> get_revision(txn, doc_id, rev):
> > >> vsn = txn["{NS} / {docid} / _by_rev / {rev}"]
> > >> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00",
> > "{NS} / {vsn} / {docid} / _data / 0xFF" )
> > >> for every json_path in json_paths:
> > >>   if value has type vsn:
> > >>  another_vsn = value
> > >> value = txn["{NS} / {docid} / {another_vsn} / _data /
> > {json_path}"]
> > >>   result[json_pa

[DISCUSS] Release 2.3.1

2019-02-07 Thread Jan Lehnardt
Hi everyone,

we’ve had some great fixes come in since we release 2.3.0 in early December.

Some of those fixes should make it into our user’s hands sooner than later.

I’ve prepared a branch, rebased against 2.3.x so we can use a PR to look over
the differences.

I went through all commits to master since 2.3.0 and picked all fixes that
I thought should make it in, plus a tiny feature that’s extremely useful
(h/t Joan). I might have missed your favourite fix or feature, so now is
your time to speak up on what else you’d like included:

https://github.com/apache/couchdb/pull/1908

If you want to contest including anything that’s already there, speak up,
too.

The only not-so-nice thing is dropping of previously supported Erlang
releases in a patch release, but there is no way around that with the
one Mochiweb update we need in there.

I think it is fine if we put it up top into the release announcement.

Best
Jan
-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/



Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-07 Thread Garren Smith
I’m also in favor of keeping it really simple and then testing and
measuring it.

What is the best way to measure that we have something that works? I’m not
sure just relying on our current tests will prove that? Should we define
and build some more complex situations e.g docs with lots of conflicts or
docs with wide revisions and make sure we can solve for those?

On Thu, Feb 7, 2019 at 12:33 PM Jan Lehnardt  wrote:

> I’m also very much in favour with starting with the simplest thing that
> can possibly work and doesn’t go against the advertised best practices of
> FoundationDB. Let’s get that going and get a feel for how it all works
> together, before trying to optimise things we can’t measure yet.
>
> Best
> Jan
> —
>
> > On 6. Feb 2019, at 16:58, Robert Samuel Newson 
> wrote:
> >
> > Hi,
> >
> > With the Redwood storage engine under development and with prefix
> elision part of its design, I don’t think we should get too hung up on
> adding complications and indirections in the key space just yet. We haven’t
> written a line of code or run any tests, this is premature optimisation.
> >
> > I’d like to focus on the simplest solution that yields all required
> properties. We can embellish later (if warranted).
> >
> > I am intrigued by all the ideas that might allow us cheaper inserts and
> updates than the current code where there are multiple edit branches in the
> stored document.
> >
> > B.
> >
> >> On 6 Feb 2019, at 02:18, Ilya Khlopotov  wrote:
> >>
> >> While reading Adam's proposal I came to realize that: we don't have to
> calculate winning revision at read time.
> >> Since FDB's transactions are atomic we can calculate it when we write.
> This means we can just write latest values into separate range. This makes
> lookup of latest version fast.
> >> Another realization is if we want to share values for some json paths
> we would have to introduce a level of indirection.
> >> Bellow is the data model inspired by Adam's idea to share json_paths.
> In this model the json_path is stored in the revision where it was first
> added (we call that revision an owner of a json_path). The values for
> json_path key can be scalar values, parts of scalar values or pointers to
> owner location.
> >> The below snippets are sketches of transactions.
> >> The transactions will include updates to other keys as needed
> (`external_size`, `by_seq` and so on).  The revision tree management is not
> covered yet.
> >> The `rev -> vsn` indirection is not strictly required. It is added
> because it saves some space since `rev` is a long string and `vsn` is FDB
> versionstamp of fixed size.
> >>
> >> - `{NS} / {docid} / _by_rev / {rev} = vsn`
> >> - `{NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL`
> >> - `{NS} / {docid} / _data / {json_path} = latest_value | part`
> >> - `{NS} / {docid} / {vsn} / _data / {json_path} = value | part |
> {another_vsn}`
> >>
> >> ```
> >> write(txn, doc_id, prev_rev, json):
> >>  txn.add_write_conflict_key("{NS} / {doc_id} / _rev")
> >>  rev = generate_new_rev()
> >>  txn["{NS} / {docid} / _by_rev / {rev}"] = vsn
> >>  for every json_path in flattened json
> >>- {NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL
> >>if rev is HEAD:
> >>  # this range contains values for all json paths for the latest
> revision (read optimization)
> >>  - {NS} / {docid} / _data / {json_path} = latest_value | part
> >>- {NS} / {docid} / {vsn} / _data / {json_path} = value | part |
> {another_vsn}
> >>  txn["{NS} / {doc_id} / _rev"] = rev
> >>
> >> get_current(txn, doc_id):
> >> # there is no sharing of json_paths in this range (read optimization)
> >> txn.get_range("{NS} / {docid} / _data / 0x00", "{NS} / {docid} / _data
> / 0xFF" )
> >>
> >> get_revision(txn, doc_id, rev):
> >> vsn = txn["{NS} / {docid} / _by_rev / {rev}"]
> >> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00",
> "{NS} / {vsn} / {docid} / _data / 0xFF" )
> >> for every json_path in json_paths:
> >>   if value has type vsn:
> >>  another_vsn = value
> >> value = txn["{NS} / {docid} / {another_vsn} / _data /
> {json_path}"]
> >>   result[json_path] = value
> >>
> >> delete_revision(txn, doc_id, rev):
> >> vsn = txn["{NS} / {docid} / _by_rev / {rev}"]
> >> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00",
> "{NS} / {vsn} / {docid} / _data / 0xFF" )
> >> for every json_path in json_paths:
> >>   if value has type vsn:
> >> # remove reference to deleted revision from the owner
> >>  del txn[{NS} / {docid} / _used_by / {json_path} / {vsn}]
> >>   # check if deleted revision of json_path is not used by anything else
> >>   if txn.get_range("{NS} / {docid} / _used_by / {json_path} / {vsn}",
> limit=1) == []:
> >>  del txn["{NS} / {docid} / {vsn} / _data / {json_path}"]
> >>   if vsn is HEAD:
> >>  copy range for winning revision into "{NS} / {docid} / _data /
> {json_path}"
> >> ```
> >>
> >> best regards,
> >> iilyak
> >>
> >> On 2019/02/0

Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts

2019-02-07 Thread Jan Lehnardt
I’m also very much in favour with starting with the simplest thing that can 
possibly work and doesn’t go against the advertised best practices of 
FoundationDB. Let’s get that going and get a feel for how it all works 
together, before trying to optimise things we can’t measure yet.

Best
Jan
—

> On 6. Feb 2019, at 16:58, Robert Samuel Newson  wrote:
> 
> Hi,
> 
> With the Redwood storage engine under development and with prefix elision 
> part of its design, I don’t think we should get too hung up on adding 
> complications and indirections in the key space just yet. We haven’t written 
> a line of code or run any tests, this is premature optimisation.
> 
> I’d like to focus on the simplest solution that yields all required 
> properties. We can embellish later (if warranted).
> 
> I am intrigued by all the ideas that might allow us cheaper inserts and 
> updates than the current code where there are multiple edit branches in the 
> stored document.
> 
> B.
> 
>> On 6 Feb 2019, at 02:18, Ilya Khlopotov  wrote:
>> 
>> While reading Adam's proposal I came to realize that: we don't have to 
>> calculate winning revision at read time.
>> Since FDB's transactions are atomic we can calculate it when we write. This 
>> means we can just write latest values into separate range. This makes lookup 
>> of latest version fast.
>> Another realization is if we want to share values for some json paths we 
>> would have to introduce a level of indirection. 
>> Bellow is the data model inspired by Adam's idea to share json_paths. In 
>> this model the json_path is stored in the revision where it was first added 
>> (we call that revision an owner of a json_path). The values for json_path 
>> key can be scalar values, parts of scalar values or pointers to owner 
>> location.
>> The below snippets are sketches of transactions. 
>> The transactions will include updates to other keys as needed 
>> (`external_size`, `by_seq` and so on).  The revision tree management is not 
>> covered yet.
>> The `rev -> vsn` indirection is not strictly required. It is added because 
>> it saves some space since `rev` is a long string and `vsn` is FDB 
>> versionstamp of fixed size.
>> 
>> - `{NS} / {docid} / _by_rev / {rev} = vsn` 
>> - `{NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL`
>> - `{NS} / {docid} / _data / {json_path} = latest_value | part`
>> - `{NS} / {docid} / {vsn} / _data / {json_path} = value | part | 
>> {another_vsn}`
>> 
>> ```
>> write(txn, doc_id, prev_rev, json):
>>  txn.add_write_conflict_key("{NS} / {doc_id} / _rev")
>>  rev = generate_new_rev()
>>  txn["{NS} / {docid} / _by_rev / {rev}"] = vsn 
>>  for every json_path in flattened json
>>- {NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL
>>if rev is HEAD:
>>  # this range contains values for all json paths for the latest revision 
>> (read optimization)
>>  - {NS} / {docid} / _data / {json_path} = latest_value | part
>>- {NS} / {docid} / {vsn} / _data / {json_path} = value | part | 
>> {another_vsn}
>>  txn["{NS} / {doc_id} / _rev"] = rev
>> 
>> get_current(txn, doc_id):
>> # there is no sharing of json_paths in this range (read optimization)
>> txn.get_range("{NS} / {docid} / _data / 0x00", "{NS} / {docid} / _data / 
>> 0xFF" )
>> 
>> get_revision(txn, doc_id, rev):
>> vsn = txn["{NS} / {docid} / _by_rev / {rev}"] 
>> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00", "{NS} / 
>> {vsn} / {docid} / _data / 0xFF" )
>> for every json_path in json_paths:
>>   if value has type vsn:
>>  another_vsn = value
>> value = txn["{NS} / {docid} / {another_vsn} / _data / {json_path}"]
>>   result[json_path] = value
>> 
>> delete_revision(txn, doc_id, rev):
>> vsn = txn["{NS} / {docid} / _by_rev / {rev}"] 
>> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00", "{NS} / 
>> {vsn} / {docid} / _data / 0xFF" )
>> for every json_path in json_paths:
>>   if value has type vsn:
>> # remove reference to deleted revision from the owner
>>  del txn[{NS} / {docid} / _used_by / {json_path} / {vsn}]
>>   # check if deleted revision of json_path is not used by anything else
>>   if txn.get_range("{NS} / {docid} / _used_by / {json_path} / {vsn}", 
>> limit=1) == []:
>>  del txn["{NS} / {docid} / {vsn} / _data / {json_path}"]
>>   if vsn is HEAD:
>>  copy range for winning revision into "{NS} / {docid} / _data / 
>> {json_path}"
>> ```
>> 
>> best regards,
>> iilyak
>> 
>> On 2019/02/04 23:22:09, Adam Kocoloski  wrote: 
>>> I think it’s fine to start a focused discussion here as it might help 
>>> inform some of the broader debate over in that thread.
>>> 
>>> As a reminder, today CouchDB writes the entire body of each document 
>>> revision on disk as a separate blob. Edit conflicts that have common fields 
>>> between them do not share any storage on disk. The revision tree is encoded 
>>> into a compact format and a copy of it is stored directly in both the by_id 
>>> tree and the by_seq

Re: [DISCUSSION] Proposed new RFC process

2019-02-07 Thread Jan Lehnardt
I’m favour of all that’s outlined here. Thanks for getting this written up, 
Joan!

Best
Jan
—

> On 5. Feb 2019, at 19:27, Joan Touzet  wrote:
> 
> Hi everyone,
> 
> One thing that has plagued the CouchDB project for a while is the
> introduction of new features without much discussion prior to the
> feature landing.
> 
> To put all the cards on the table: it is good that IBM/Cloudant have
> been benevolent and have contributed significant new functionality with
> every recent release of CouchDB. Some highlights include:
> 
>  2.0: clustering code (née bigcouch)
>  2.1: replication scheduler
>  2.2: pluggable storage engines
>  2.3: clustered purge
> 
> However, most of these features landed as PRs on the CouchDB repository
> already complete, or at the very least, with the design process already
> finished, with only implementation details remaining. In at least a
> couple of the cases, the PRs were so large that any review by
> non-Cloudant people was effectively impossible. And, of course, by the
> time the PR lands, any prototype implementations that would have
> informed the final PR (and helped understand how it works) are not
> visible or available. Further, documentation is often lacking at this
> stage, though increasingly I see excellent README.md files placed at the
> top of each OTP application that are especially welcome.
> 
> This needs to change.
> 
> Fortunately, the Internet has a good precedent for dealing with this
> very issue: the RFC[1]. While we don't need the entire pomp and
> circumstance workflow that follows the RFC, we can certainly steal that
> nice template.
> 
> I've taken a pass at adapting the RFC template to GitHub's and
> CouchDB's needs (PRs welcome):
> 
>
> https://raw.githubusercontent.com/apache/couchdb/master/.github/ISSUE_TEMPLATE/rfc.md
>
> https://github.com/apache/couchdb/issues/new?labels=rfc%2C+discussion&template=rfc.md
> 
> I believe this template is not onerous to fill out, and if it contained
> sufficient detail on the proposal, would give committers enough knowledge
> to be able to make informed decisions about the proposal - i.e., vote on
> it.
> 
> 
> I propose the PMC REQUIRE this template for any substantial change to
> CouchDB's HTTP API, or its behaviour.
> 
> I propose developers SHOULD use this template for any change to
> CouchDB's HTTP API or behaviour, even small ones.
> 
> I propose developers COULD use this template for any minor change to
> CouchDB, but it's unnecessary if it's something like a debug interface,
> a config file improvement, etc. Documentation and Fauxton changes would
> likely be exempt from this workflow, too.
> 
> 
> I imagine the workflow being like this:
> 
>  * [DISCUSS]ions like the ones happening now for FDB happen on the list.
> 
>  * Consensus starts to be reached, or, at the very least, the important
>advantages and disadvantages of the proposal have been clarified.
> 
>  * A filled-in RFC is filed in GH Issues.
> 
>  * (Optional.) A bot watches for issues of this type and forwards notice
>of them to dev@. This becomes a [PROPOSAL] per our bylaws at that
>point.
> 
>  * Committers formally vote +1/+0.5/0/-0.5/-1 on the proposal in GH.
> 
>  * (Optional.) A bot could close the vote and sum the responses from
>committers, emailing the results to dev@.
> 
>  * Repeat if necessary (because the RFC gets rewritten/improved, etc.)
> 
> Note the only two new steps are 1) ensuring that we actually HAVE the
> discussion on the mailing list about the proposed change; and 2) a new
> template we should use for these major changes, which itself should help
> make understanding the proposal and the voting process smoother.
> 
> For me, the question that remains is: at what point do we REQUIRE the
> proposed RFC process to be followed? When is a change "substantial?" And
> I think that it may be sufficient at this point to leave it grey-area,
> with the PMC weighing in as a group if necessary. (The easiest thing to
> do is to simply say "do the RFC anyway," because if any proposal is
> sufficiently unclear as to not be easily translatable into an RFC, then
> it is probably too weak of a proposal to stand muster without one.)
> 
> One other question is whether or not we add a new decision type to our
> decision matrix in the Bylaws for this. Our rules are somewhat unclear
> here, since this is both a technical and a non-technical decision, 
> meaning it probably defaults over to our non-technical decision type.
> That type is lazy majority, with committers being given binding votes,
> but importantly does NOT allow for vetoes (blocking -1 votes). (I have
> one other topic for the Bylaws mailing, which will be separate, so we
> can pick up this discusison in that thread if desired.)
> 
> Thoughts?
> 
> -Joan "evolve or perish!" Touzet
> 
> [1]: https://www.ietf.org/standards/rfcs/

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/