On Dec 11, 2012, at 12:43 AM, Sergiu Dumitriu <[email protected]> wrote:

> On 12/07/2012 04:56 PM, Caleb James DeLisle wrote:
>> 
>> 
>> On 12/07/2012 04:26 PM, Vincent Massol wrote:
>>> Hi,
>>> 
>>> On Dec 7, 2012, at 9:59 PM, Sergiu Dumitriu <[email protected]> wrote:
>>> 
>>>> Hi devs,
>>>> 
>>>> We've moved more and more toward an UTF-8-only application, and XWiki
>>>> has only been tested with this configuration for several years.
>>>> 
>>>> I propose that we require UTF-8 for a valid, supported installation.
>>>> This means:
>>>> - JVM encoding (-Dfile.encoding=UTF8)
>>>> - Container default URL encoding (Tomcat has ISO-8859-1 by default)
>>>> - Database encoding (MySql is still configured with latin1 on some distros)
>>>> 
>>>> There's one big site to update on our side: xwiki.org.
>>>> 
>>>> Here's my +1. This is a move toward a future web, since more and more
>>>> standards require (or at least assume as a default) UTF-8.
>>>> 
>>>> 
>>>> 
>>>> After thinking a bit more, it would make sense to require a valid
>>>> Unicode encoding, including UTF-16, which is preferable in countries
>>>> that don't use a latin alphabet. However, XWiki doesn't currently work
>>>> under 16-bit encodings at all.
>>> 
>>> For XWiki 4.x I'm -1 since it's a big change and we don't want to break our 
>>> users that currently use 4.x with ISO8859-1 for example
>>> 
>>> For XWiki 5.x I'm not sure.
>>> 
>>> To be able to answer I need to understand more. For example what currently 
>>> doesn't work with any encoding the user wants to use? Shouldn't we just be 
>>> transparent and use whatever encoding is specified and not hardcode 
>>> anything?
>> 
>> +1 for UTF-8 only.
>> 
>> If we want to support an encoding we need to run our test suite with it so
>> each encoding we support multiplies the test run time and it's not going to
>> bring features to the user's hands.
>> 
>> +1 for waiting until 5.x at least before making it mandatory because we will
>> have to require MySQL >= 5.5.3 and set the encoding to utf8mb4 in order to
>> avoid errors when saving pages with 4 byte codepoints.
>> http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
> 
> I'm afraid we'll get errors if we do that, since indexes are still
> limited to a total of 1024 bytes, and we're already maxing out with
> 255-varchar columns + other fields. In short, MySQL sucks for serious
> projects, but we can't really tell our users "use Postgres, it's
> better". So I'd rather keep it to the current utf-8, and hope that
> nobody will need the extended unicode planes, until we find a better
> solution.
> 
> To be more specific: we can't switch to 4-byte utf8 until we stop using
> names as primary key elements.
> 
> Just tried it, and indeed trying to save characters outside the BMP will
> fail. Thanks for pointing this out.
> 
>> I understand that some users currently set the encoding to latin1 so MySQL
>> will just treat the data as opaque blobs.
> 
> Except that it doesn't work like that. If you use latin1, you'll get
> errors with the default XE xar about invalid values in the RCS table.
> The connector doesn't send bytes, it sends characters, and the database
> will try to store them, which it can't. Every piece of MySQL has an
> encoding, which isn't opaque. Pushing characters outside the table's
> charset will trigger an exception.

Reviving this thread now that 5.0 dev is going to start.

xwiki.org is still running latin1 AFAIK and it's working well, including for 
page history so I'm not sure what the problem is.

Now I'm fine to require UTF8. It would be nice to check the environment at 
startup. I hope we can do so. This means checking that DB and container are set 
up correctly. This is important also for existing users who are using latin1. 
They need to know they have something to do. xwiki.org is a good example. We 
should also document how users can migrate their DBs to UTF8 in our admin guide 
on xwiki.org.

Does it mean we'll remove (deprecate to start with?) the xwiki.encoding config 
parameter?

Thanks
-Vincent

_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to