On Jun 25, 2009, at 6:53 PM, Noah Slater wrote:
On Thu, Jun 25, 2009 at 05:37:21PM -0400, Damien Katz wrote:
Integrity will be preserved by use of Content-MD5
Bike shed: what about the stronger SHA family of hashes?
Content-MD5 is standard header, I can find no others headers to do
integrity hashing.
But it still is specific to the version of CouchDB and it's
dependencies
(version of Erlang, version of ICU, etc). It usually be the same
across
versions, but is not guaranteed.
If we're doing content hashing, why would this matter?
Because we don't have a formal canonical format, so we aren't even
trying. We'll be hashing whatever representation we have in-memory,
and that could change version to version.
Optionally will allow that if 2 clients make byte identical saves
for a
document, they will get the same revision, and you don't need to
return a
conflict error the second client to save.
Are there any security issues around possible hash collisions?
No, we aren't checking them later.
I think this is the most pragmatic way to do deterministic revs and
integrity
checking. That is, do as little as possible and let others deal
with the
problems and implications of canonicalization if they want to to do
end to end
integrity checking.
Seems like a reasonable approach to me.
Best,
--
Noah Slater, http://tumbolia.org/nslater