On Mon, Jun 22, 2009 at 1:22 PM, Paul Davis<[email protected]> wrote: > On Mon, Jun 22, 2009 at 3:32 PM, Noah Slater<[email protected]> wrote: >> On Mon, Jun 22, 2009 at 03:15:24PM -0400, Paul Davis wrote:
> > Exactly, though I would add a third choice that is > > * calculate the document hash from the deterministic binary serialization I think this is the 90% solution. Unicode normalization may be the other 10%. I just don't want to see the last 10% block the first 90. > > Which would include requirements like serializing document members > with some defined ordering. > > On a side note, I've also contemplated just hashing the incoming > binary representation as the new revision. Though that comes with its > own set of issues obviously. > I don't have anything against unicode normalization, I just don't think it buys us a *whole* lot. I do think recursive sorting and deterministic float handling are pretty crucial, as even the same Ruby client will order the keys differently on subsequent PUTs. Just hashing the PUT body would not be sufficient, I think. It'd be more like a 50% solution. Here's some code a wrote a while ago that handles floats and key sorting in JS: http://github.com/jchris/canonical-json/blob/83751a8b650c60a5fcf3ed4ad5337e3dd172b521/test.html The client should not send the string used for hashing as the document itself, the hash would be made from a string derived from the document, which would be lossy on floats. In Couch we'd want to do this in Erlang, and not store any effects of the function except the hash. Chris -- Chris Anderson http://jchrisa.net http://couch.io
