Excerpts from Brian Aker's message of Sat Apr 02 18:13:36 -0700 2011: > Hi! > > For latin1 and swe7 should we accept them as character set specifiers for > ease of use? I believe they are a subset of UTF-8.
As a sub-concern.. utf-8 leads to 3-bytes-per-position indexes right now. I have to wonder if it would be easy to create a new index type that only indexes 2-byte chars for situations where that is acceptable. The question of what to do w/ 3 byte chars would need some thought, but I think my first inclination would be that they would be rejected, or possibly just stripped out (meaning unique indexes and index scans would no longer be useful). Before people go all up in arms about full support of CJK, this would be something optional where users who don't ever expect to see 3-byte UTF-8 in their content could optimize. The current situation actually favors CJK, which typically carries more information in each character and so will likely get more use out of the 3-bytes-per-position scheme of indexes. Another index type I'd like to see is a hash index. Apologies if it already exists. :) _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

