I found some interesting explanations about why non-UTF-8 unicode
is more or less a historical accident.
http://www.utf8everywhere.org/
http://programmers.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful

It seems that it was originally thought that 65535 chars would be
"enough for anyone" but now that UTF-16 has accepted it's fate as
a variable width encoding, it's basically UTF-8 with padding,
lack of backward (ascii) compatibility, and endianness.

The most prominent argument for UTF-16 seems to be that it's more
"culturally inclusive" since Arabic and Asian languages encode
with more similar efficiency. I think speakers of Asian languages
would care more about not having broken content because of UTF-16
implementations which still assume char == 16 bits.

Which until 1.5, included Java:
http://www.ibm.com/developerworks/java/library/j-unicode/

And in some places it still does:
http://docs.oracle.com/javase/6/docs/api/java/lang/StringBuffer.html#reverse()


Thanks,
Caleb



On 12/07/2012 04:56 PM, Caleb James DeLisle wrote:
> 
> 
> On 12/07/2012 04:26 PM, Vincent Massol wrote:
>> Hi,
>>
>> On Dec 7, 2012, at 9:59 PM, Sergiu Dumitriu <[email protected]> wrote:
>>
>>> Hi devs,
>>>
>>> We've moved more and more toward an UTF-8-only application, and XWiki
>>> has only been tested with this configuration for several years.
>>>
>>> I propose that we require UTF-8 for a valid, supported installation.
>>> This means:
>>> - JVM encoding (-Dfile.encoding=UTF8)
>>> - Container default URL encoding (Tomcat has ISO-8859-1 by default)
>>> - Database encoding (MySql is still configured with latin1 on some distros)
>>>
>>> There's one big site to update on our side: xwiki.org.
>>>
>>> Here's my +1. This is a move toward a future web, since more and more
>>> standards require (or at least assume as a default) UTF-8.
>>>
>>>
>>>
>>> After thinking a bit more, it would make sense to require a valid
>>> Unicode encoding, including UTF-16, which is preferable in countries
>>> that don't use a latin alphabet. However, XWiki doesn't currently work
>>> under 16-bit encodings at all.
>>
>> For XWiki 4.x I'm -1 since it's a big change and we don't want to break our 
>> users that currently use 4.x with ISO8859-1 for example
>>
>> For XWiki 5.x I'm not sure.
>>
>> To be able to answer I need to understand more. For example what currently 
>> doesn't work with any encoding the user wants to use? Shouldn't we just be 
>> transparent and use whatever encoding is specified and not hardcode anything?
> 
> +1 for UTF-8 only.
> 
> If we want to support an encoding we need to run our test suite with it so
> each encoding we support multiplies the test run time and it's not going to
> bring features to the user's hands.
> 
> +1 for waiting until 5.x at least before making it mandatory because we will
> have to require MySQL >= 5.5.3 and set the encoding to utf8mb4 in order to
> avoid errors when saving pages with 4 byte codepoints.
> http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
> I understand that some users currently set the encoding to latin1 so MySQL
> will just treat the data as opaque blobs.
> 
> Thanks,
> Caleb
> 
>>
>> Thanks
>> -Vincent
>>
>> _______________________________________________
>> devs mailing list
>> [email protected]
>> http://lists.xwiki.org/mailman/listinfo/devs
>>
> 
> 
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
> 


_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to