Re: rev hash stability

Brian Mitchell Fri, 17 Oct 2014 14:23:45 -0700

I don’t want to distract the conversation too much but what the semantics
rev’s cary supposed to be pretty opaque. The lack of deterministic revs
don’t matter for an embedded database (at least in its current form). As
far as a user can see it’s a rev and it’s a token that represents some sort
of place in history for a document.


Giving revs meaning outside of this scope is likely to bring up more meta
discussion about the CouchDB data model and a long history of
undocumented choices which only manifest in the particular
implementation we have today.

Beyond that, I’d like to hear what the specific use case is for standard rev
content.

As Dale says, multicasting to multiple databases asynchronously that are
only integrated via replication is a very likely place to find conflicts. Even
when revs are deterministic, there are problems with replication lag
which can cause the same problem anyway. It just doesn’t make sense
in an uncoordinated system.

Forcing revs to be computed exactly the same everywhere seems about
as complete a solution as throwing 409’s on updates, which basically is
never enough to prevent conflicts and probably one of the worst features
to rely on in the era of clustering and in any case where replication is used.

Brian.

> On Oct 17, 2014, at 4:47 PM, Chris Anderson <[email protected]> wrote:
> 
> I would never suggest that a random rev or other style rev shouldn't be
> functional/expected. It's just that if you do want to generate the same
> revs as somebody else right now, it's hard. Making it less hard it would be
> good for everyone.
> 
> Chris
> 
> On Friday, October 17, 2014, Brian Mitchell <[email protected]>
> wrote:
> 
>> 
>>> On Oct 17, 2014, at 3:41 PM, Jens Alfke <[email protected]
>> <javascript:;>> wrote:
>>> 
>>> 
>>>> On Oct 17, 2014, at 12:15 PM, Brian Mitchell <
>> [email protected] <javascript:;>> wrote:
>>>> 
>>>> Simply put: if and only if the revs match we should assume some
>> optimism just like we
>>>> do with things like atts_since. There’s already a lot of trust between
>> two nodes for replication
>>>> and we should assume that matching revs were either unique (or random)
>> or based on some
>>>> deterministic property that isn’t likely to collide unless it was an
>> equivalent operation.
>>> 
>>> I'm sorry, I've read this a few times and I can't figure out exactly
>> what your meaning is. Could you elaborate? Particularly, what does "if the
>> revs match" mean, exactly?
>>> 
>>> Also, I don't think your statement "there’s already a lot of trust
>> between two nodes for replication" is accurate in all cases. You seem to be
>> thinking of a server cluster (a la BigCouch) but CouchDB-style replication
>> is often used in a more distributed way. Both PouchDB and Couchbase Lite
>> use replication between servers and clients. A client can be trusted to be
>> acting on behalf of a user, but not beyond that.
>>> 
>>> —Jens
>> 
>> No problem. I probably kept the message too short.
>> 
>> The issue is that requiring revs to match is a bit assuming about the
>> context
>> different implementations are designed to operate in. The case of the
>> optimization
>> makes a lot of sense in some cases (clustering for availability being the
>> most
>> obvious).
>> 
>> This implies there is a contract to how any implementation should treat
>> revisions:
>> 
>> 1. Any revs that match between two documents should be assumed to be the
>> same
>> revision of the document. This is important outside of optimization (N-way
>> replications
>> for example).
>> 
>> 2. Each implementation must be trusted to generate unique revisions.
>> 
>> 3. Optionally: revisions can be generated deterministically to allow
>> idempotent
>> operations. This is really important for clusters (non-optional in
>> practice) but
>> has very little important for PouchDB.
>> 
>> I’d urge implementations to document what guarantees their revs have but
>> I would stop short in exposing the implementation (like the digest used or
>> RNG function) as that is out of scope for the _rev contract for compatible
>> implementations.
>> 
>> There are many reasons to settle at this level of detail, backwards
>> compatibility
>> being the most important. The other is that it could allow other sorts of
>> rev
>> encoding in the future for some implementations (cheaper tree merges being
>> one thing worth revisiting).
>> 
>> So PouchDB should generate revs that make sense for PouchDB’s
>> implementation.
>> The contract of how these revs are interpreted shouldn’t constrain it to
>> implementing
>> the same JSON normalization and digest that others do. Same goes for other
>> Couch’s.
>> 
>> Brian.
>> 
>> 
> 
> -- 
> Chris Anderson  @jchris
> http://www.couchbase.com

Re: rev hash stability

Reply via email to