Neat stuff. Years ago I actually committed this feature to the codebase using a 
table scan and then Damien backed it out because of the scalability concern. 
Glad to see we’re approaching it in a more considered fashion this time around 
:)

One thing we might consider is to maintain a *count* of the number of 
conflicted documents in the database automatically. If the count is nonzero 
when you expected it to be zero, build the conflicted documents index and do 
your inspection. In the happy case where there are no conflicts we just saved 
you a bunch of effort.

We don’t really need a separate index to accomplish this; we just need to 
modify the reducer function supplied to the by_id btree. We’ve played that game 
before to add things like data size accumulators to the DB info object. There 
may be a modest hit to the write performance to count the number of non-deleted 
leafs in the rev tree on document update, but honestly that says as much about 
the inefficiencies in couch_key_tree as anything else - that quantity ought to 
be very cheap to uncover.

Adam

> On Mar 29, 2016, at 1:26 PM, Robert Kowalski <r...@kowalski.gd> wrote:
> 
> Hi,
> 
> good points!
> 
>> 3.1. An optimisation of 3. would be making this an Erlang view, but that 
>> would come with
>> the additional security concern of opening up Erlang views.
> 
> The great thing about Mango is, with an index Mango is faster than JS
> views as it is Erlang based.
> 
> 
> And Dale is making a good suggestion.
> 
> ```
> {
>  selector: {
>    _conflicts: {'$exists`: true}
>  }
> ```
> 
> The selector already works without an index with the latest change in
> Mango, it doesn't strictly require an index for ad-hoc queries any
> more: 
> https://github.com/apache/couchdb-mango/commit/01252f971bef0c8da1d78bf5a7b506b71926ce1b
> 
> Cool so we are already almost done! :)
> 
> This is great for development and I wonder if we could reduce the
> friction for people that would like to use an index for conflicts,
> e.g. in their production systems. Remember, the mission is to make
> conflict handling a first class citizen in CouchDB and make it as easy
> as possible for our users.
> 
> Current state:
> 
> POST to `$DB/_index`:
> 
> ```
> 
> {
>    "index": {
>        "fields": ["_conflicts"]
>    },
>    "name" : "conflict-index",
>    "type" : "json"
> }
> 
> ```
> 
> I feel it is hard to type on the terminal, e.g. when I use curl. With
> a JS HTTP client it is also a lot to type.
> 
> 
> I thought about API sugar. I feel unsure about API-sugar which could
> abstract this somehow, as I don't want to pollute the API. At the same
> time I would also like to make it as easy as possible for users to
> handle their conflicts.
> 
> Rough idea:
> 
> POST to `$DB/_index`:
> 
> ```
> { "type" : "conflicts" }
> ```
> 
> Hmmm....
> 
> What do you think?
> 
> On Mon, Mar 14, 2016 at 4:54 PM, Jan Lehnardt <j...@apache.org> wrote:
>> 
>>> On 14 Mar 2016, at 16:22, Dale Harvey <d...@arandomurl.com> wrote:
>>> 
>>> I would really like to give users better abilities to handle conflict
>>> resolution, I am however extremely worried about considering to introduce
>>> another API endpoint. We have like 6/7 read API's each of them having their
>>> own idiosyncrasies and its extremely confusing for users to know which to
>>> use when.
>>> 
>>> If we could extend our existing APIs to cater for this use case it seems
>>> hugely preferable, ie something like mango / pouchdb find
>>> 
>>> db.find({
>>> selector: {
>>>   _conflicts: {'$exists`: true}
>>> }
>>> }).then(function (result) {
>>> ...
>>> });
>> 
>> Great input Dale!
>> 
>> Let’s split this into two issues then:
>> 
>> A. how do we get the information.
>> B. how do we present it to users.
>> 
>> 
>> As for B., the thought process went like this:
>> 
>> 1. _all_docs + Erlang filter.
>> 
>> As Robert pointed out, that’s a no-go for large databases.
>> 
>> 
>> 2. Add another index to the main database file like by-seq/by-id 
>> (_changes/_all_docs)
>> 
>> I pointed out that this will make all write operations slower, for everyone, 
>> not just for the people who want this. (A scenario where I wouldn’t want 
>> this is where CouchDB is the cloud-counterpart for one or more PouchDB 
>> instances, and conflict resolution only ever happens in PouchDB).
>> 
>> So I’d say this is a soft-no on adding this to the main database file, also 
>> given that we had similar discussions about adding another index to view 
>> files before.
>> 
>> 
>> 3. A view: Fauxton could hide creating a ddoc behind a button, and users 
>> could opt into this easily, while understanding the trade-offs.
>> 
>> Robert feels like tying this to Fauxton as opposed to CouchDB makes this 
>> approach useful for fewer people than it could (props for not being focussed 
>> on your own project there ;)
>> 
>> 
>> 3.1. An optimisation of 3. would be making this an Erlang view, but that 
>> would come with the additional security concern of opening up Erlang views.
>> 
>> 
>> 4. Given all of the above, how about this: a new CouchDB module 
>> (couch_conflicts) that is essentially an Erlang view for conflicts that is 
>> disabled by default, but when enabled uses the native query server to build 
>> an index that can give the list of conflicting documents (and the 
>> conflicting revisions?) *without* having to enable the native query server 
>> for everyone. The module can be enabled in the config (or admin PUT to the 
>> endpoint as other things in 2.0). We’d also build a basic 
>> keep-view-indexes-up-to-date that would trigger an update after, say, 1000 
>> doc updates (we’d make that configurable of course), something which we’d 
>> want for other views as well anyway.
>> 
>> * * *
>> 
>> As for A., how we present this to the user I have no strong feelings about. 
>> We could make this part of Mango, like Dale suggested, or a new 
>> /db/_all_conflicts with its own idiosyncrasies or something else ;)
>> 
>> 
>> I just want to make sure make the right trade-offs on the storage/indexing 
>> level, and, while not making everyone pay for the overhead, make it really 
>> easy to opt into this feature. (Unless we all agree that the performance hit 
>> for 2. is worth it :)
>> 
>> 
>> Best
>> Jan
>> --
>> 
>> 
>> 
>> 
>>> 
>>> 
>>> On 14 March 2016 at 14:07, Sebastian Rothbucher <
>>> sebastianrothbuc...@googlemail.com> wrote:
>>> 
>>>> Hi Robert,
>>>> 
>>>> this looks awesome already! I don't want to be the spoiler in this, but
>>>> wouldn't conflicts occur recently, e.g. using _changes (descending) might
>>>> do the trick of limit-ing? (Still you'd discard docs that simply don't have
>>>> conflicts, but probably way not that many)
>>>> 
>>>> If that doesn't do the trick: just forget what I just said ;-)
>>>> 
>>>> Best
>>>>   Sebastian
>>>> 
>>>> On Mon, Mar 14, 2016 at 2:58 PM, Robert Kowalski <r...@kowalski.gd> wrote:
>>>> 
>>>>> Hi folks,
>>>>> 
>>>>> it is hackweek for the Fauxton team and I am lucky enough to be able
>>>>> to work on whatever I want :)
>>>>> 
>>>>> Conflicts are an integral part of CouchDB. Right now I dream of making
>>>>> conflict-resolution a first class citizen in Couch. Conflict
>>>>> resolution requires a lot of manual steps. The idea is to give the
>>>>> user all the tools they need to easily solve conflicts, and also to
>>>>> help them to avoid conflicts in the future.
>>>>> 
>>>>> To empower every user to detect and solve conflicts easily on their
>>>>> own, instead of writing some custom bash/js scripts and custom view
>>>>> hackery I would like to have a list of conflicts in Fauxton for every
>>>>> database.
>>>>> 
>>>>> The list, provided by Couch, shows which documents have conflicts. I
>>>>> can then click on the conflicting doc and get a nice diffing editor
>>>>> which helps me to solve the conflict. Here's an early draft: [1]
>>>>> 
>>>>> Discussing the matter in couchdb-dev we thought about serverside
>>>>> filtering of _all_docs - which is a problem for large databases.
>>>>> 
>>>>> Another option is a new endpoint, e.g. /db/_all_conflicts. Behind this
>>>>> endpoint is an index which is listing the conflicting documents.
>>>>> 
>>>>> Jan and Alex suggested the index could be opt-in. They suggested an
>>>>> "auto-warmer" - it would update the index every 1000 doc updates or
>>>>> so. This way not every doc write would get slower. In later iteration
>>>>> we could even expose the "auto-warming" feature to other views.
>>>>> 
>>>>> Do you want to join me on my quest to provide the best conflict
>>>>> resolution tools and education?
>>>>> What do you think about it?
>>>>> 
>>>>> Best,
>>>>> Robert :)
>>>>> 
>>>>> [1]
>>>>> 
>>>> https://cloud.githubusercontent.com/assets/298166/13741539/c4ecf6d0-e9ce-11e5-84c5-502b0989c290.png
>>>>> 
>>>> 
>> 
>> --
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
>> 

Reply via email to