Re: API suggestions

Antony Blakey Sun, 28 Dec 2008 20:17:25 -0800


On 29/12/2008, at 2:15 PM, Chris Anderson wrote:

Especially once CouchDB handles Unicode
collation properly.


I wasn't aware there was a problem with CouchDB's unicode collation.
Is there a ticket you can point me to?

No, I haven't raised it. The issue is that collation cannot bespecified per db, which IMO it needs to be, and I haven't seenanything in the code that does anything wrt collation i.e. I suspectit simply relies on the OS locale and icu's default handling. Ihaven't thought about it enough to know whether persisted stringsshould be stored in a normalized form, but certainly comparison needsto use both normalisation and a specified collation order.

It also affects what end-of-collation-order character one uses forprefix key searching, and would affect the computation ofsucc(string). That issue alone leads me to think that CouchDB needs todo more in that area because it's quite difficult to fix in theclient, whereas CouchDB is already fully unicode with icu. As anexample, I think the key boundary testing API could be richer,eliminating the need for the current key hacks, especially the use ofa high-numeric-value unicode character for prefix ranges.

As I say, I haven't thought enough about it to raise a ticket, but Ifeel strongly that it needs to be dealt with, and I suspect it's moreobvious to me because I'm deploying for an Asian/Arabic-scriptlocalised environment.


Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

One should respect public opinion insofar as is necessary to avoidstarvation and keep out of prison, but anything that goes beyond thisis voluntary submission to an unnecessary tyranny.

  -- Bertrand Russell

Re: API suggestions

Reply via email to