I found some interesting explanations about why non-UTF-8 unicode is more or less a historical accident. http://www.utf8everywhere.org/ http://programmers.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful
It seems that it was originally thought that 65535 chars would be "enough for anyone" but now that UTF-16 has accepted it's fate as a variable width encoding, it's basically UTF-8 with padding, lack of backward (ascii) compatibility, and endianness. The most prominent argument for UTF-16 seems to be that it's more "culturally inclusive" since Arabic and Asian languages encode with more similar efficiency. I think speakers of Asian languages would care more about not having broken content because of UTF-16 implementations which still assume char == 16 bits. Which until 1.5, included Java: http://www.ibm.com/developerworks/java/library/j-unicode/ And in some places it still does: http://docs.oracle.com/javase/6/docs/api/java/lang/StringBuffer.html#reverse() Thanks, Caleb On 12/07/2012 04:56 PM, Caleb James DeLisle wrote: > > > On 12/07/2012 04:26 PM, Vincent Massol wrote: >> Hi, >> >> On Dec 7, 2012, at 9:59 PM, Sergiu Dumitriu <[email protected]> wrote: >> >>> Hi devs, >>> >>> We've moved more and more toward an UTF-8-only application, and XWiki >>> has only been tested with this configuration for several years. >>> >>> I propose that we require UTF-8 for a valid, supported installation. >>> This means: >>> - JVM encoding (-Dfile.encoding=UTF8) >>> - Container default URL encoding (Tomcat has ISO-8859-1 by default) >>> - Database encoding (MySql is still configured with latin1 on some distros) >>> >>> There's one big site to update on our side: xwiki.org. >>> >>> Here's my +1. This is a move toward a future web, since more and more >>> standards require (or at least assume as a default) UTF-8. >>> >>> >>> >>> After thinking a bit more, it would make sense to require a valid >>> Unicode encoding, including UTF-16, which is preferable in countries >>> that don't use a latin alphabet. However, XWiki doesn't currently work >>> under 16-bit encodings at all. >> >> For XWiki 4.x I'm -1 since it's a big change and we don't want to break our >> users that currently use 4.x with ISO8859-1 for example >> >> For XWiki 5.x I'm not sure. >> >> To be able to answer I need to understand more. For example what currently >> doesn't work with any encoding the user wants to use? Shouldn't we just be >> transparent and use whatever encoding is specified and not hardcode anything? > > +1 for UTF-8 only. > > If we want to support an encoding we need to run our test suite with it so > each encoding we support multiplies the test run time and it's not going to > bring features to the user's hands. > > +1 for waiting until 5.x at least before making it mandatory because we will > have to require MySQL >= 5.5.3 and set the encoding to utf8mb4 in order to > avoid errors when saving pages with 4 byte codepoints. > http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html > I understand that some users currently set the encoding to latin1 so MySQL > will just treat the data as opaque blobs. > > Thanks, > Caleb > >> >> Thanks >> -Vincent >> >> _______________________________________________ >> devs mailing list >> [email protected] >> http://lists.xwiki.org/mailman/listinfo/devs >> > > > _______________________________________________ > devs mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/devs > _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

