Re: rev hash stability

Brian Mitchell Sun, 19 Oct 2014 11:46:53 -0700

> On Oct 19, 2014, at 2:22 PM, Jan Lehnardt <[email protected]> wrote:
> 
> 
>> On 19 Oct 2014, at 20:15 , Brian Mitchell <[email protected]> wrote:
>> 
>> 
>>> On Oct 19, 2014, at 1:49 PM, Jan Lehnardt <[email protected]> wrote:
>>> 
>>> 
>>>> On 18 Oct 2014, at 01:17 , Jens Alfke <[email protected]> wrote:
>>>> 
>>>> 
>>>>> On Oct 17, 2014, at 2:22 PM, Brian Mitchell <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> Giving revs meaning outside of this scope is likely to bring up more meta
>>>>> discussion about the CouchDB data model and a long history of
>>>>> undocumented choices which only manifest in the particular
>>>>> implementation we have today.
>>>> 
>>>> That does appear to be a danger. I'm not interested in bike-shedding; if 
>>>> the Apache CouchDB community can't make progress on this issue then we can 
>>>> discuss it elsewhere to come up with solutions. I can't speak for Chris, 
>>>> but I'm here as a courtesy and because I believe interoperability is 
>>>> important. But I believe making progress is more important.
>>> 
>>> +1000. I think so far we’ve had a brief chatter about this and we are ready 
>>> to move on.
>>> 
>>> How does moving this to a strawperson proposal sound? E.g. have a ticket, 
>>> or pad, or gist somewhere where we can hammer out the details of this and 
>>> what the various trade-offs of open decisions are?
>>> 
>>> JIRA obviously preferred, but happy to start this elsewhere if it provides 
>>> less friction.
>> 
>> My primary point is that interoperation does *not* require the rev hashes be 
>> done the same. Clustering does but I can’t see why we’d encourage people to 
>> write the same thing to two slightly different systems simultaneously. Doing 
>> that, I can guarantee that rev problems will not be the only thing to fix.
>> 
>> If we want to define rev interoperation in terms of the minimal and the 
>> stronger case, that might work just fine but defining interoperation as the 
>> latter is excludes a variety of strategies that implementations can have and 
>> will likely mean different versions of CouchDB don’t “interoperate” under 
>> this very definition, which is simply not a useful way to describe the 
>> situation.
> 
> I can’t parse this, can you rephrase? :)


I’m basically saying that they don’t need to be generated the same way to be 
defined as interoperable. There are a few invariants required and a specific 
digest algorithm isn’t one of them. Creating a bogus rev 1-abcfoobaz using 
new_edits=false shows exactly how this works. The foundation for interoperation 
should only assume some definition of “match” which I mean, intuitively, that 
1-abcfoobaz = 1-abcfoobaz, 2-abcfoobaz /= 1-abcfoobaz, 1-xyz /= 1-abc.

The need for a stronger set of rules is specific to how the implementation is 
*intended* on being used. In an eventually consistent cluster, it’s quite 
useful to have idempotents to repair via replication or to even duplicate 
writes to redundant nodes which replicate between one another. I don’t see a 
problem with defining rules to make this work well but it’s a very specific and 
demanding kind of interoperability.

Of course, revs matching are not going to solve cluster coherence between 
implementations on their own. For example, the abstraction still leaks in the 
multi-node replication case if there is replication lag (quite easily achieved, 
at least with how things work now). One can’t simply just write to two places 
and hope that my “idempotent operation” works. It’s a huge assumption of what 
was written prior to that and it relies on minimal knowledge being replication. 
It’s just a bad practice to assume that two distributed systems will always 
have the same view of things in relation to a third client. Clustering modes go 
through quite a bit of work to make it usable but it’s certainly far from 
automatic and not something that I’d put on the table for the definition of 
general interoperation. [1]

Thus a middle ground might be allowing two levels of interoperation to be 
defined. I still don’t see the value in focusing on this specific case. It’s my 
opinion that if there is something that breaks between vendors because of this, 
there are likely other assumptions to visit far before this one. I could be 
wrong as I don’t know what others are planning on doing.

>> Finally, if we really want to define a stable digest, I’d suggest that a 
>> reference implementation be created and proposed rather than forced upon the 
>> implementations before it materializes. This could possibly be made an 
>> option in the CouchDB configuration or build allowing it to be an 
>> experimental feature.
> 
> Hence my strawperson proposal that we can work on. I envision all 
> implementors getting a say in what works for them and what doesn’t and that 
> we find a consensus and a solution that we can roll this out harmlessly.

I agree but there seems to be a dismissal of the idea that we don’t need this 
rather than it really being a matter of just finding the right implementation 
that fits every useless. [2]

Brian.

[1]: I also alluded to the 409 issue in another email which shows the growing 
problem of how the old revision system isn’t well designed for anything but 
single node systems. I’d vote to remove this in 3.x since conflicts on write 
mean nothing in an eventually consistent system and the 409 actually makes it 
harder to test code in this case. It’s just trivial to poke holes in the setup 
and I don’t see how revs can possibly be the wall people actually hit.

[2]: I think there is a better need for revision control that applications can 
leverage more significantly. There’s a long history of, rightly, discouraging 
people of using the MVCC implementation for application concerns, but that’s a 
limitation of the API, not of the idea. I could easily see revs being a richer 
entity in some systems, which makes this whole digest thing seem so specific 
and low level, that we’re really just locking ourselves in rather than opening 
the protocol up. It depends on where one might want to go, I guess.

Re: rev hash stability

Reply via email to