On Tue, Feb 24, 2009 at 10:13 AM, Damien Katz <dam...@apache.org> wrote: > I'll once again state my objection to the newlines, which is actually kind > of weak. > > If we compute the revids deterministically (hash the canonical doc > contents), then when we return the document back to the client, we can send > as an integrity hash the same revid, because it is already pre-computed and > stored, etc. What it could save us is the CPU cycles of computing the hash. > I think we also get some nice free caching benefits too, but I'm not sure. > But if we do, it might even save us the disk reads to get the doc to compute > the hash. The problem is any standardized canonical representation is > unlikely to included a newline at the end. > > Now I'm not even sure this scheme is workable either way, or only workable > in very special instances which are too rare to be worth it. But if the > scheme works, then it can simplify the code and make things more efficient, > which are 2 very good things. However these benefits may never come, and > we'll not have the newlines anyway. That would suck. > > But the problem if we just add the newlines, then later remove them, > production apps and scripts that rely on that will break and make the change > is very painful. Or impossible. >
I don't see why we couldn't include the newlines in the input to the hash function... For something as intimately related to CouchDB as the calculation of revs, I think we have the freedom to get funny with the JSON responses. I think your point about getting the metadata out of the document could also be accomplished by defining our function from a JSON object to a hashable string, as one that ignores the value of _rev... Having a special CouchDB hashable-doc function is not the prettiest thing in the world, but we already have a bunch of other CouchDB - only API stuff (like rereduce, etc) so I don't think it crosses an important line. Chris -- Chris Anderson http://jchris.mfdz.com