On Sun, Jun 21, 2009 at 11:21:00PM -0700, Chris Anderson wrote: > My gut reaction is that normalizing strings using NFC [1] is not appropriate > for a database. Here's why we should treat strings as binary and not worry > about unicode normalization at all: [...] > First of all, I'm certain we can't require that all input already be NFC > normalized. [...] > Secondly, we're a database, so I find highly suspicious the notion that we > should auto-normalize user input on-the-quiet. [...] > So we can't require normalized input and we can't auto-normalize.
CouchDB would create a canonicalised copy of the document while creating the document hash. There is no reason why CouchDB, or the clients, should worry about canonicalising the actual documents. > Where does this leave us? Canonicalisation is a temporary step, so there are no problems. > > Unicode normalisation is an issue for clients because it requires they have > > access to a Unicode NFC function. Why would clients need to worry about this? CouchDB is creating the hashes. Best, -- Noah Slater, http://tumbolia.org/nslater