On Mar 30, 2016 11:03 AM, "Jan Lehnardt" <j...@apache.org> wrote:
>
> Heya Michael,
>
> thanks for taking the time to write this up!

No problem; Thanks for reading! :)

> Could the same* be achieved by taking _revs out of the _rev calculation?

It wouldn't make a difference unfortunately.  It's about different revId
algorithms coexisting in the same replication ecosystem.  When the
underlying calculation isn't the same between the systems; how they differ
is mostly moot.

> My question would be: how often does the “same doc, but different _revs
> history”-scenario happen as opposed to other conflicts?

In the case of third party system replication I expect it to happen every
time they replicate with each other for any updated doc id that the two
systems share.  They are using different revid algorithms and so when one
system loads its revision ids into the other system they won't match and
they'll generate a conflict doc as if it was tje same document with two
histories.

> I’m thinking, since content conflicts (_digest/_signature mismatch) still
> have to be handled outside of CouchDB, and while writing that logic, doing
> a content-equivalence check as a first shortcut in a conflict resolution
> function, isn’t the added overhead (more _fields, more keeping track of
> stuff, more entries in indexes etc) maybe not worth it, if clients have
> an easy way of doing their own autoresolve?

This is one of those cases where it feels like a lot of people do less work
if the server did it for us already.  For every document conflict that ever
arises some piece of code on some CPU somewhere should do the "is the
content actually different" test; and this ideally would only need to be
done once.  Given its ubiquity of execution, having the server do duplicate
content removal at the time it detects a revision conflict seems
reasonable.  They are fairly uncommon and it's already likely going to do
the work of parsing the document anyway so it can store it, so this only
adds the step of comparing the calculated md5s between just the currently
active branches.

The _signatures field isn't required, but it does speed things up if
revids don't match (like from a different revid algorithm).  In fact Couch
could change the revid to its own algorithm to make this even easier if it
wants.

This way, 1) if a conflict is present, the client would already have
reasonable assurances the content is actually different, and 2) when
replicating between systems with different revid algos, they won't create a
conflict.

Mike

Reply via email to