> On 29 Mar 2016, at 20:14, Adam Kocoloski <kocol...@apache.org> wrote: > > Neat stuff. Years ago I actually committed this feature to the codebase using > a table scan and then Damien backed it out because of the scalability > concern. Glad to see we’re approaching it in a more considered fashion this > time around :) > > One thing we might consider is to maintain a *count* of the number of > conflicted documents in the database automatically. If the count is nonzero > when you expected it to be zero, build the conflicted documents index and do > your inspection. In the happy case where there are no conflicts we just saved > you a bunch of effort. > > We don’t really need a separate index to accomplish this; we just need to > modify the reducer function supplied to the by_id btree. We’ve played that > game before to add things like data size accumulators to the DB info object. > There may be a modest hit to the write performance to count the number of > non-deleted leafs in the rev tree on document update, but honestly that says > as much about the inefficiencies in couch_key_tree as anything else - that > quantity ought to be very cheap to uncover.
Bob Newson and I talked about this on IRC some more and I think this is all similar if not the same thinking: remember how we optimised `skip` in view results? We could keep track of the number of conflicts per b-tree node and then easily skip over the subtrees that don’t have any conflicts, so a table-scan would be relatively cheap. Best Jan -- > > Adam > >> On Mar 29, 2016, at 1:26 PM, Robert Kowalski <r...@kowalski.gd> wrote: >> >> Hi, >> >> good points! >> >>> 3.1. An optimisation of 3. would be making this an Erlang view, but that >>> would come with >>> the additional security concern of opening up Erlang views. >> >> The great thing about Mango is, with an index Mango is faster than JS >> views as it is Erlang based. >> >> >> And Dale is making a good suggestion. >> >> ``` >> { >> selector: { >> _conflicts: {'$exists`: true} >> } >> ``` >> >> The selector already works without an index with the latest change in >> Mango, it doesn't strictly require an index for ad-hoc queries any >> more: >> https://github.com/apache/couchdb-mango/commit/01252f971bef0c8da1d78bf5a7b506b71926ce1b >> >> Cool so we are already almost done! :) >> >> This is great for development and I wonder if we could reduce the >> friction for people that would like to use an index for conflicts, >> e.g. in their production systems. Remember, the mission is to make >> conflict handling a first class citizen in CouchDB and make it as easy >> as possible for our users. >> >> Current state: >> >> POST to `$DB/_index`: >> >> ``` >> >> { >> "index": { >> "fields": ["_conflicts"] >> }, >> "name" : "conflict-index", >> "type" : "json" >> } >> >> ``` >> >> I feel it is hard to type on the terminal, e.g. when I use curl. With >> a JS HTTP client it is also a lot to type. >> >> >> I thought about API sugar. I feel unsure about API-sugar which could >> abstract this somehow, as I don't want to pollute the API. At the same >> time I would also like to make it as easy as possible for users to >> handle their conflicts. >> >> Rough idea: >> >> POST to `$DB/_index`: >> >> ``` >> { "type" : "conflicts" } >> ``` >> >> Hmmm.... >> >> What do you think? >> >> On Mon, Mar 14, 2016 at 4:54 PM, Jan Lehnardt <j...@apache.org> wrote: >>> >>>> On 14 Mar 2016, at 16:22, Dale Harvey <d...@arandomurl.com> wrote: >>>> >>>> I would really like to give users better abilities to handle conflict >>>> resolution, I am however extremely worried about considering to introduce >>>> another API endpoint. We have like 6/7 read API's each of them having their >>>> own idiosyncrasies and its extremely confusing for users to know which to >>>> use when. >>>> >>>> If we could extend our existing APIs to cater for this use case it seems >>>> hugely preferable, ie something like mango / pouchdb find >>>> >>>> db.find({ >>>> selector: { >>>> _conflicts: {'$exists`: true} >>>> } >>>> }).then(function (result) { >>>> ... >>>> }); >>> >>> Great input Dale! >>> >>> Let’s split this into two issues then: >>> >>> A. how do we get the information. >>> B. how do we present it to users. >>> >>> >>> As for B., the thought process went like this: >>> >>> 1. _all_docs + Erlang filter. >>> >>> As Robert pointed out, that’s a no-go for large databases. >>> >>> >>> 2. Add another index to the main database file like by-seq/by-id >>> (_changes/_all_docs) >>> >>> I pointed out that this will make all write operations slower, for >>> everyone, not just for the people who want this. (A scenario where I >>> wouldn’t want this is where CouchDB is the cloud-counterpart for one or >>> more PouchDB instances, and conflict resolution only ever happens in >>> PouchDB). >>> >>> So I’d say this is a soft-no on adding this to the main database file, also >>> given that we had similar discussions about adding another index to view >>> files before. >>> >>> >>> 3. A view: Fauxton could hide creating a ddoc behind a button, and users >>> could opt into this easily, while understanding the trade-offs. >>> >>> Robert feels like tying this to Fauxton as opposed to CouchDB makes this >>> approach useful for fewer people than it could (props for not being >>> focussed on your own project there ;) >>> >>> >>> 3.1. An optimisation of 3. would be making this an Erlang view, but that >>> would come with the additional security concern of opening up Erlang views. >>> >>> >>> 4. Given all of the above, how about this: a new CouchDB module >>> (couch_conflicts) that is essentially an Erlang view for conflicts that is >>> disabled by default, but when enabled uses the native query server to build >>> an index that can give the list of conflicting documents (and the >>> conflicting revisions?) *without* having to enable the native query server >>> for everyone. The module can be enabled in the config (or admin PUT to the >>> endpoint as other things in 2.0). We’d also build a basic >>> keep-view-indexes-up-to-date that would trigger an update after, say, 1000 >>> doc updates (we’d make that configurable of course), something which we’d >>> want for other views as well anyway. >>> >>> * * * >>> >>> As for A., how we present this to the user I have no strong feelings about. >>> We could make this part of Mango, like Dale suggested, or a new >>> /db/_all_conflicts with its own idiosyncrasies or something else ;) >>> >>> >>> I just want to make sure make the right trade-offs on the storage/indexing >>> level, and, while not making everyone pay for the overhead, make it really >>> easy to opt into this feature. (Unless we all agree that the performance >>> hit for 2. is worth it :) >>> >>> >>> Best >>> Jan >>> -- >>> >>> >>> >>> >>>> >>>> >>>> On 14 March 2016 at 14:07, Sebastian Rothbucher < >>>> sebastianrothbuc...@googlemail.com> wrote: >>>> >>>>> Hi Robert, >>>>> >>>>> this looks awesome already! I don't want to be the spoiler in this, but >>>>> wouldn't conflicts occur recently, e.g. using _changes (descending) might >>>>> do the trick of limit-ing? (Still you'd discard docs that simply don't >>>>> have >>>>> conflicts, but probably way not that many) >>>>> >>>>> If that doesn't do the trick: just forget what I just said ;-) >>>>> >>>>> Best >>>>> Sebastian >>>>> >>>>> On Mon, Mar 14, 2016 at 2:58 PM, Robert Kowalski <r...@kowalski.gd> wrote: >>>>> >>>>>> Hi folks, >>>>>> >>>>>> it is hackweek for the Fauxton team and I am lucky enough to be able >>>>>> to work on whatever I want :) >>>>>> >>>>>> Conflicts are an integral part of CouchDB. Right now I dream of making >>>>>> conflict-resolution a first class citizen in Couch. Conflict >>>>>> resolution requires a lot of manual steps. The idea is to give the >>>>>> user all the tools they need to easily solve conflicts, and also to >>>>>> help them to avoid conflicts in the future. >>>>>> >>>>>> To empower every user to detect and solve conflicts easily on their >>>>>> own, instead of writing some custom bash/js scripts and custom view >>>>>> hackery I would like to have a list of conflicts in Fauxton for every >>>>>> database. >>>>>> >>>>>> The list, provided by Couch, shows which documents have conflicts. I >>>>>> can then click on the conflicting doc and get a nice diffing editor >>>>>> which helps me to solve the conflict. Here's an early draft: [1] >>>>>> >>>>>> Discussing the matter in couchdb-dev we thought about serverside >>>>>> filtering of _all_docs - which is a problem for large databases. >>>>>> >>>>>> Another option is a new endpoint, e.g. /db/_all_conflicts. Behind this >>>>>> endpoint is an index which is listing the conflicting documents. >>>>>> >>>>>> Jan and Alex suggested the index could be opt-in. They suggested an >>>>>> "auto-warmer" - it would update the index every 1000 doc updates or >>>>>> so. This way not every doc write would get slower. In later iteration >>>>>> we could even expose the "auto-warming" feature to other views. >>>>>> >>>>>> Do you want to join me on my quest to provide the best conflict >>>>>> resolution tools and education? >>>>>> What do you think about it? >>>>>> >>>>>> Best, >>>>>> Robert :) >>>>>> >>>>>> [1] >>>>>> >>>>> https://cloud.githubusercontent.com/assets/298166/13741539/c4ecf6d0-e9ce-11e5-84c5-502b0989c290.png >>>>>> >>>>> >>> >>> -- >>> Professional Support for Apache CouchDB: >>> https://neighbourhood.ie/couchdb-support/ >>> > -- Professional Support for Apache CouchDB: https://neighbourhood.ie/couchdb-support/