Re: db/_all_conflicts

Jan Lehnardt Wed, 30 Mar 2016 06:27:08 -0700

> On 29 Mar 2016, at 20:14, Adam Kocoloski <kocol...@apache.org> wrote:
> 
> Neat stuff. Years ago I actually committed this feature to the codebase using 
> a table scan and then Damien backed it out because of the scalability 
> concern. Glad to see we’re approaching it in a more considered fashion this 
> time around :)
> 
> One thing we might consider is to maintain a *count* of the number of 
> conflicted documents in the database automatically. If the count is nonzero 
> when you expected it to be zero, build the conflicted documents index and do 
> your inspection. In the happy case where there are no conflicts we just saved 
> you a bunch of effort.
> 
> We don’t really need a separate index to accomplish this; we just need to 
> modify the reducer function supplied to the by_id btree. We’ve played that 
> game before to add things like data size accumulators to the DB info object. 
> There may be a modest hit to the write performance to count the number of 
> non-deleted leafs in the rev tree on document update, but honestly that says 
> as much about the inefficiencies in couch_key_tree as anything else - that 
> quantity ought to be very cheap to uncover.


Bob Newson and I talked about this on IRC some more and I think this is all 
similar if not the same thinking: remember how we optimised `skip` in view 
results? We could keep track of the number of conflicts per b-tree node and 
then easily skip over the subtrees that don’t have any conflicts, so a 
table-scan would be relatively cheap.

Best
Jan
--


> 
> Adam
> 
>> On Mar 29, 2016, at 1:26 PM, Robert Kowalski <r...@kowalski.gd> wrote:
>> 
>> Hi,
>> 
>> good points!
>> 
>>> 3.1. An optimisation of 3. would be making this an Erlang view, but that 
>>> would come with
>>> the additional security concern of opening up Erlang views.
>> 
>> The great thing about Mango is, with an index Mango is faster than JS
>> views as it is Erlang based.
>> 
>> 
>> And Dale is making a good suggestion.
>> 
>> ```
>> {
>> selector: {
>>   _conflicts: {'$exists`: true}
>> }
>> ```
>> 
>> The selector already works without an index with the latest change in
>> Mango, it doesn't strictly require an index for ad-hoc queries any
>> more: 
>> https://github.com/apache/couchdb-mango/commit/01252f971bef0c8da1d78bf5a7b506b71926ce1b
>> 
>> Cool so we are already almost done! :)
>> 
>> This is great for development and I wonder if we could reduce the
>> friction for people that would like to use an index for conflicts,
>> e.g. in their production systems. Remember, the mission is to make
>> conflict handling a first class citizen in CouchDB and make it as easy
>> as possible for our users.
>> 
>> Current state:
>> 
>> POST to `$DB/_index`:
>> 
>> ```
>> 
>> {
>>   "index": {
>>       "fields": ["_conflicts"]
>>   },
>>   "name" : "conflict-index",
>>   "type" : "json"
>> }
>> 
>> ```
>> 
>> I feel it is hard to type on the terminal, e.g. when I use curl. With
>> a JS HTTP client it is also a lot to type.
>> 
>> 
>> I thought about API sugar. I feel unsure about API-sugar which could
>> abstract this somehow, as I don't want to pollute the API. At the same
>> time I would also like to make it as easy as possible for users to
>> handle their conflicts.
>> 
>> Rough idea:
>> 
>> POST to `$DB/_index`:
>> 
>> ```
>> { "type" : "conflicts" }
>> ```
>> 
>> Hmmm....
>> 
>> What do you think?
>> 
>> On Mon, Mar 14, 2016 at 4:54 PM, Jan Lehnardt <j...@apache.org> wrote:
>>> 
>>>> On 14 Mar 2016, at 16:22, Dale Harvey <d...@arandomurl.com> wrote:
>>>> 
>>>> I would really like to give users better abilities to handle conflict
>>>> resolution, I am however extremely worried about considering to introduce
>>>> another API endpoint. We have like 6/7 read API's each of them having their
>>>> own idiosyncrasies and its extremely confusing for users to know which to
>>>> use when.
>>>> 
>>>> If we could extend our existing APIs to cater for this use case it seems
>>>> hugely preferable, ie something like mango / pouchdb find
>>>> 
>>>> db.find({
>>>> selector: {
>>>>  _conflicts: {'$exists`: true}
>>>> }
>>>> }).then(function (result) {
>>>> ...
>>>> });
>>> 
>>> Great input Dale!
>>> 
>>> Let’s split this into two issues then:
>>> 
>>> A. how do we get the information.
>>> B. how do we present it to users.
>>> 
>>> 
>>> As for B., the thought process went like this:
>>> 
>>> 1. _all_docs + Erlang filter.
>>> 
>>> As Robert pointed out, that’s a no-go for large databases.
>>> 
>>> 
>>> 2. Add another index to the main database file like by-seq/by-id 
>>> (_changes/_all_docs)
>>> 
>>> I pointed out that this will make all write operations slower, for 
>>> everyone, not just for the people who want this. (A scenario where I 
>>> wouldn’t want this is where CouchDB is the cloud-counterpart for one or 
>>> more PouchDB instances, and conflict resolution only ever happens in 
>>> PouchDB).
>>> 
>>> So I’d say this is a soft-no on adding this to the main database file, also 
>>> given that we had similar discussions about adding another index to view 
>>> files before.
>>> 
>>> 
>>> 3. A view: Fauxton could hide creating a ddoc behind a button, and users 
>>> could opt into this easily, while understanding the trade-offs.
>>> 
>>> Robert feels like tying this to Fauxton as opposed to CouchDB makes this 
>>> approach useful for fewer people than it could (props for not being 
>>> focussed on your own project there ;)
>>> 
>>> 
>>> 3.1. An optimisation of 3. would be making this an Erlang view, but that 
>>> would come with the additional security concern of opening up Erlang views.
>>> 
>>> 
>>> 4. Given all of the above, how about this: a new CouchDB module 
>>> (couch_conflicts) that is essentially an Erlang view for conflicts that is 
>>> disabled by default, but when enabled uses the native query server to build 
>>> an index that can give the list of conflicting documents (and the 
>>> conflicting revisions?) *without* having to enable the native query server 
>>> for everyone. The module can be enabled in the config (or admin PUT to the 
>>> endpoint as other things in 2.0). We’d also build a basic 
>>> keep-view-indexes-up-to-date that would trigger an update after, say, 1000 
>>> doc updates (we’d make that configurable of course), something which we’d 
>>> want for other views as well anyway.
>>> 
>>> * * *
>>> 
>>> As for A., how we present this to the user I have no strong feelings about. 
>>> We could make this part of Mango, like Dale suggested, or a new 
>>> /db/_all_conflicts with its own idiosyncrasies or something else ;)
>>> 
>>> 
>>> I just want to make sure make the right trade-offs on the storage/indexing 
>>> level, and, while not making everyone pay for the overhead, make it really 
>>> easy to opt into this feature. (Unless we all agree that the performance 
>>> hit for 2. is worth it :)
>>> 
>>> 
>>> Best
>>> Jan
>>> --
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> 
>>>> On 14 March 2016 at 14:07, Sebastian Rothbucher <
>>>> sebastianrothbuc...@googlemail.com> wrote:
>>>> 
>>>>> Hi Robert,
>>>>> 
>>>>> this looks awesome already! I don't want to be the spoiler in this, but
>>>>> wouldn't conflicts occur recently, e.g. using _changes (descending) might
>>>>> do the trick of limit-ing? (Still you'd discard docs that simply don't 
>>>>> have
>>>>> conflicts, but probably way not that many)
>>>>> 
>>>>> If that doesn't do the trick: just forget what I just said ;-)
>>>>> 
>>>>> Best
>>>>>  Sebastian
>>>>> 
>>>>> On Mon, Mar 14, 2016 at 2:58 PM, Robert Kowalski <r...@kowalski.gd> wrote:
>>>>> 
>>>>>> Hi folks,
>>>>>> 
>>>>>> it is hackweek for the Fauxton team and I am lucky enough to be able
>>>>>> to work on whatever I want :)
>>>>>> 
>>>>>> Conflicts are an integral part of CouchDB. Right now I dream of making
>>>>>> conflict-resolution a first class citizen in Couch. Conflict
>>>>>> resolution requires a lot of manual steps. The idea is to give the
>>>>>> user all the tools they need to easily solve conflicts, and also to
>>>>>> help them to avoid conflicts in the future.
>>>>>> 
>>>>>> To empower every user to detect and solve conflicts easily on their
>>>>>> own, instead of writing some custom bash/js scripts and custom view
>>>>>> hackery I would like to have a list of conflicts in Fauxton for every
>>>>>> database.
>>>>>> 
>>>>>> The list, provided by Couch, shows which documents have conflicts. I
>>>>>> can then click on the conflicting doc and get a nice diffing editor
>>>>>> which helps me to solve the conflict. Here's an early draft: [1]
>>>>>> 
>>>>>> Discussing the matter in couchdb-dev we thought about serverside
>>>>>> filtering of _all_docs - which is a problem for large databases.
>>>>>> 
>>>>>> Another option is a new endpoint, e.g. /db/_all_conflicts. Behind this
>>>>>> endpoint is an index which is listing the conflicting documents.
>>>>>> 
>>>>>> Jan and Alex suggested the index could be opt-in. They suggested an
>>>>>> "auto-warmer" - it would update the index every 1000 doc updates or
>>>>>> so. This way not every doc write would get slower. In later iteration
>>>>>> we could even expose the "auto-warming" feature to other views.
>>>>>> 
>>>>>> Do you want to join me on my quest to provide the best conflict
>>>>>> resolution tools and education?
>>>>>> What do you think about it?
>>>>>> 
>>>>>> Best,
>>>>>> Robert :)
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>>> https://cloud.githubusercontent.com/assets/298166/13741539/c4ecf6d0-e9ce-11e5-84c5-502b0989c290.png
>>>>>> 
>>>>> 
>>> 
>>> --
>>> Professional Support for Apache CouchDB:
>>> https://neighbourhood.ie/couchdb-support/
>>> 
> 

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Re: db/_all_conflicts

Reply via email to