On Jun 25, 2009, at 6:53 PM, Noah Slater wrote:

On Thu, Jun 25, 2009 at 05:37:21PM -0400, Damien Katz wrote:
Integrity will be preserved by use of Content-MD5

Bike shed: what about the stronger SHA family of hashes?

Content-MD5 is standard header, I can find no others headers to do integrity hashing.

But it still is specific to the version of CouchDB and it's dependencies (version of Erlang, version of ICU, etc). It usually be the same across
versions, but is not guaranteed.

If we're doing content hashing, why would this matter?

Because we don't have a formal canonical format, so we aren't even trying. We'll be hashing whatever representation we have in-memory, and that could change version to version.


Optionally will allow that if 2 clients make byte identical saves for a document, they will get the same revision, and you don't need to return a
conflict error the second client to save.

Are there any security issues around possible hash collisions?

No, we aren't checking them later.


I think this is the most pragmatic way to do deterministic revs and integrity checking. That is, do as little as possible and let others deal with the problems and implications of canonicalization if they want to to do end to end
integrity checking.

Seems like a reasonable approach to me.

Best,

--
Noah Slater, http://tumbolia.org/nslater

Reply via email to