On Sun, Jun 21, 2009 at 11:21:00PM -0700, Chris Anderson wrote: > A normal > user is not going to understand the first bit of the fact that the > underlying binary representation of their text could be subtly > different in a way that would be invisible to them. ... > Secondly, we're a database, so I find highly suspicious the notion > that we should auto-normalize user input on-the-quiet.
Then maybe is it worth going the whole hog, and just storing the received JSON directly to disk as a string? This takes out the JSON->erlang parsing when storing documents, and the erlang->JSON serialisation when sending on to the view server, or when retrieving documents for the client. Of course, there is metadata which CouchDB adds, like _id, _rev etc. This could be stored separately alongside the document, and then shoehorned in when you retrieve the document (e.g. as simple as inserting some text after the initial '{'). This gives some interesting future options: e.g. moving the metadata into HTTP headers, at which point there is no requirement for the document to be in JSON form at all. It just has to be in some format that the view server is happy to parse. As an aside: I support that subtly different encodings of the "same" document (according to NFC) should have different revs, because (a) it's unlikely that multiple different client implementations will be making the same changes to the same documents (i.e. the clients in a cluster are likely to be homogeneous), and (b) such conflicts are easy to resolve anyway. Furthermore: > "don't mutilate strings you didn't edit" so as long as client software > doesn't go jiggling forms to other random look-alike codepoints > without asking, any potential trouble is confined to fields actually > effected by an update. It's probably not reasonable to make this requirement. Most client software will deserialise JSON into some internal form (a Ruby hash, a Python dict, or whatever), at which point transformations will take place, so turning it back into JSON may well not give exactly the same serialisation. Ruby 1.8 won't even maintain the member ordering.