Re: Unicode normalization (was Re: The 1.0 Thread)

Noah Slater Mon, 22 Jun 2009 07:37:09 -0700

On Sun, Jun 21, 2009 at 11:21:00PM -0700, Chris Anderson wrote:
> My gut reaction is that normalizing strings using NFC [1] is not appropriate
> for a database. Here's why we should treat strings as binary and not worry
> about unicode normalization at all:
[...]
> First of all, I'm certain we can't require that all input already be NFC
> normalized.
[...]
> Secondly, we're a database, so I find highly suspicious the notion that we
> should auto-normalize user input on-the-quiet.
[...]
> So we can't require normalized input and we can't auto-normalize.


CouchDB would create a canonicalised copy of the document while creating the
document hash. There is no reason why CouchDB, or the clients, should worry
about canonicalising the actual documents.

> Where does this leave us?

Canonicalisation is a temporary step, so there are no problems.

> > Unicode normalisation is an issue for clients because it requires they have
> > access to a Unicode NFC function.

Why would clients need to worry about this? CouchDB is creating the hashes.

Best,

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Unicode normalization (was Re: The 1.0 Thread)

Reply via email to