On Tue, Feb 24, 2009 at 10:13 AM, Damien Katz <dam...@apache.org> wrote:
> I'll once again state my objection to the newlines, which is actually kind
> of weak.
>
> If we compute the revids deterministically (hash the canonical doc
> contents), then when we return the document back to the client, we can send
> as an integrity hash the same revid, because it is already pre-computed and
> stored, etc. What it could save us is the CPU cycles of computing the hash.
> I think we also get some nice free caching benefits too, but I'm not sure.
> But if we do, it might even save us the disk reads to get the doc to compute
> the hash. The problem is any standardized canonical representation is
> unlikely to included a newline at the end.
>
> Now I'm not even sure this scheme is workable either way, or only workable
> in very special instances which are too rare to be worth it. But if the
> scheme works, then it can simplify the code and make things more efficient,
> which are 2 very good things. However these benefits may never come, and
> we'll not have the newlines anyway. That would suck.
>
> But the problem if we just add the newlines, then later remove them,
> production apps and scripts that rely on that will break and make the change
> is very painful. Or impossible.
>

I don't see why we couldn't include the newlines in the input to the
hash function... For something as intimately related to CouchDB as the
calculation of revs, I think we have the freedom to get funny with the
JSON responses.

I think your point about getting the metadata out of the document
could also be accomplished by defining our function from a JSON object
to a hashable string, as one that ignores the value of _rev...

Having a special CouchDB hashable-doc function is not the prettiest
thing in the world, but we already have a bunch of other CouchDB -
only API stuff (like rereduce, etc) so I don't think it crosses an
important line.

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Reply via email to