On Thu, Jun 25, 2009 at 5:37 PM, Damien Katz<dam...@apache.org> wrote: > I am now working on an implementation of deterministic revs. After a lot of > thinking about this, I've decided to not reuse the revision ids for > integrity checking. The canonicalization problem is unresolved and using a > CouchDB specific canonicalization means other libs/langs/platforms can't > play easily with CouchDB replication. > > Integrity will be preserved by use of Content-MD5 when > transferring/replicating documents, and checking the document hashing when > reading from of disk. The replicator http client will check the integrity of > the network bodies. > > If you need end-to-end integrity checking, you can use an application > specific scheme to sign/hash various fields and attachments, if you can deal > with the string and floating point canonicalization issues. > > My plan is that when generating new rev ids, CouchDB will deterministically > generate the same revision id when edited with the same data. But it still > is specific to the version of CouchDB and it's dependencies (version of > Erlang, version of ICU, etc). It usually be the same across versions, but is > not guaranteed. > > What this will allow is for a single client to send the same edits to 2 > identical Erlang servers and see the same revids generated on both. > Optionally will allow that if 2 clients make byte identical saves for a > document, they will get the same revision, and you don't need to return a > conflict error the second client to save. I'm not sure about implementing > this though. > > To implement this couchdb will store a md5 hash of the all the attachments > along with the json document, when saving a new document we hash the native > document and the attachment hashes together to generate the revision id. > > CouchDB will also store a md5 hash of the json document itself. This will > give us disk integrity checking for all documents and their attachments in a > database. When CouchdB encounters a corrupt document or attachment it will > stop what it's doing and return an error. The admin can restore from backup > or recreate by deleting and re-replicating from a peer. > > I think this is the most pragmatic way to do deterministic revs and > integrity checking. That is, do as little as possible and let others deal > with the problems and implications of canonicalization if they want to to do > end to end integrity checking. > > Feedback please. >
One thing that strikes me as potentially bad is that the signature can't be recalculated. Not sure if that's important or not. > -Damien > >> > >