Re: Unicode normalization (was Re: The 1.0 Thread)

Damien Katz Tue, 30 Jun 2009 08:47:07 -0700


On Jun 30, 2009, at 11:22 AM, Noah Slater wrote:

On Tue, Jun 30, 2009 at 07:12:07AM -0400, Damien Katz wrote:
Im not sure I understand why we can't just calculate and send the MD5
header for the content range.
We could, but are you not proposing that we use this value for thedocumentrevision? If that is the case, when you do range requests, the hashsent backdoesn't actually correspond to anything. If I used the hash from thefinal range
request of a document to post an update, it would presumably fail.

To clarify, the point of deterministic rev ids is only to avoidunnecessary conflicts when the identical edits are made on 2 differentreplicas. If the content was identical when editing the same revision,it should not be a conflict. If we had a canonical representation ofthe document, we could also use the determanistic rev ids forintegrity checking, but we don't have a canonical representation, andcreating one is very difficult to get right.

What I'm proposing is that we only use content-MD5 for payloadintegrity checking. It will not being used for security and it cannotbe validated against the rev id because they will always be different.The rev Id will be generated based on the erlang term format of thedocument, not the UTF8 JSON string that gets sent to the client.

So the server will send it's responses (perhaps optionally) with a MD5hash to detect packet corruption. Clients, when they send docs andattachments, can send the payload with a content-MD5 header and theserver will check it to make sure it's uncorrupted. As it writes thedata to disk the server will compute the MD5 hash, for it's ownintegrity checking later.

So for example, the replicator will check the md5 sig from the serverand send it's own md5 sig when writing data. This prevents networkproblems from introducing corruptions to data as it replicates.


-Damien


Best,

--
Noah Slater, http://tumbolia.org/nslater

Re: Unicode normalization (was Re: The 1.0 Thread)

Reply via email to