Re: [PHP-DEV] Turkish/Azeri locale support
+1 for option #2. Joël. On Sun, Apr 18, 2010 at 11:58 PM, Adam Harvey ahar...@php.net wrote: As at least some of you would already be aware, there's a long-standing issue with using PHP in a Turkish or Azeri locale, namely that case-insensitive lookups within the Zend engine (method names, for example) fail on lookups involving upper-case I characters, since lower-case I in those languages is ı instead of i (note the lack of a dot). The long term plan for this, per bug #35050 and any number of duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to happen in its original form, I think we're going to need to revisit how we want to deal with this. There's a patch linked in the bug from Tomas Kuliavas and Marcus that fixes the problem by simply redefining zend_tolower() to a simple locale-insensitive ASCII tolower() function, which does fix the Turkish and Azeri locales. The potential breakage from this is that single-byte locales will no longer get case-insensitive lookups of non-ASCII characters: for example, somebody using fr_FR.ISO-8859-1 as a locale could no longer call a method É() as é(). Since it doesn't break anything when using multi-byte locales (which have never had case-insensitive lookups anyway since the Zend Engine uses the single-byte tolower() internally), my inclination would be to apply the patch on trunk and document it as a BC issue. I've uploaded an updated version of Tomas's patch that applies cleanly to trunk to http://www.adamharvey.name/patches/35050/zend_operators.c.diff and a phpt file to test the fix to http://www.adamharvey.name/patches/35050/bug35050.phpt. It's likely that the test would require massaging before being committed to work on Windows, but since I don't have a Windows development box readily available and don't know a thing about how Windows implements locale support, this would require help from someone familiar with the platform. So: thoughts; concerns; alternate approaches? It would be nice to have this sorted for PHP.next. Thanks, Adam -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- I do know everything, just not all at once. It's a virtual memory problem. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Turkish/Azeri locale support
Hi, A definite, -1 for #2, it's a _massive_ BC break with no justification so far IMHO. The optimization point is quite moot, tolower could be restricted to compilation + dynamic accesses, which would remove most of them already. OTOH option #1 seems like the most sensible approach, breaking only in very limited cases, so +1 from me. Best, On Tue, May 4, 2010 at 5:00 PM, Joel Perras joel.per...@gmail.com wrote: +1 for option #2. Joël. On Sun, Apr 18, 2010 at 11:58 PM, Adam Harvey ahar...@php.net wrote: As at least some of you would already be aware, there's a long-standing issue with using PHP in a Turkish or Azeri locale, namely that case-insensitive lookups within the Zend engine (method names, for example) fail on lookups involving upper-case I characters, since lower-case I in those languages is ı instead of i (note the lack of a dot). The long term plan for this, per bug #35050 and any number of duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to happen in its original form, I think we're going to need to revisit how we want to deal with this. There's a patch linked in the bug from Tomas Kuliavas and Marcus that fixes the problem by simply redefining zend_tolower() to a simple locale-insensitive ASCII tolower() function, which does fix the Turkish and Azeri locales. The potential breakage from this is that single-byte locales will no longer get case-insensitive lookups of non-ASCII characters: for example, somebody using fr_FR.ISO-8859-1 as a locale could no longer call a method É() as é(). Since it doesn't break anything when using multi-byte locales (which have never had case-insensitive lookups anyway since the Zend Engine uses the single-byte tolower() internally), my inclination would be to apply the patch on trunk and document it as a BC issue. I've uploaded an updated version of Tomas's patch that applies cleanly to trunk to http://www.adamharvey.name/patches/35050/zend_operators.c.diff and a phpt file to test the fix to http://www.adamharvey.name/patches/35050/bug35050.phpt. It's likely that the test would require massaging before being committed to work on Windows, but since I don't have a Windows development box readily available and don't know a thing about how Windows implements locale support, this would require help from someone familiar with the platform. So: thoughts; concerns; alternate approaches? It would be nice to have this sorted for PHP.next. Thanks, Adam -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- I do know everything, just not all at once. It's a virtual memory problem. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- Etienne Kneuss http://www.colder.ch -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Turkish/Azeri locale support
2010.04.19 07:59 Stan Vassilev rašė: As at least some of you would already be aware, there's a long-standing issue with using PHP in a Turkish or Azeri locale, namely that case-insensitive lookups within the Zend engine (method names, for example) fail on lookups involving upper-case I characters, since lower-case I in those languages is ı instead of i (note the lack of a dot). The long term plan for this, per bug #35050 and any number of duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to happen in its original form, I think we're going to need to revisit how we want to deal with this. There's a patch linked in the bug from Tomas Kuliavas and Marcus that fixes the problem by simply redefining zend_tolower() to a simple locale-insensitive ASCII tolower() function, which does fix the Turkish and Azeri locales. As you illustrated in your post, fixing it for locales becomes... complicated. If you ask me, there's only one way to fix this, which is how most other languages fixed it: make the next major version of PHP case-sensitive for all identifiers. For less bugs, less locale problems and more performance. It was somewhat-the-plan, even before the Turkish locale issue was brought up. Fixing issue is not complicated. I could do that without any C coding background. Your (@php.net) developers only have to learn that they should not use locale sensitive functions and assume that English case sensitivity rules apply. This is main lesson Turkey presents to any coder. If you continue to ignore it, you will continue to trigger PHP bugs in Turkey. For n years PHP used only locale sensitive case-insensitive functions. You never bothered to fix it. Offsetting it to some distant PHP6 feature does not help Turks. Patch for 35050 is not something that should break things. You reviewed patch, commented it, I have updated patch based on your style comments and you continued to ignore the problem. Excuse that patch breaks something is funny, because Win32 builds are set to use internal (not for public use) Microsoft C library calls that are locale insensitive. If some PHP code breaks when string functions are locale insensitive, please show that code. I would like to see how i18n unportable PHP5 programming looks like. If users want to use Turkish locale here and now, they must set LC_CTYPE to C. This workaround disables all locale specific quirks and only gettext must be set to use correct charset for all translations. Other fix is more complex. php scripts must replace all locale sensitive native functions with own locale insensitive replacements and pray that supported PHP versions don't trigger bugs, when LC_CTYPE is not C. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Turkish/Azeri locale support
As at least some of you would already be aware, there's a long-standing issue with using PHP in a Turkish or Azeri locale, namely that case-insensitive lookups within the Zend engine (method names, for example) fail on lookups involving upper-case I characters, since lower-case I in those languages is ı instead of i (note the lack of a dot). The long term plan for this, per bug #35050 and any number of duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to happen in its original form, I think we're going to need to revisit how we want to deal with this. There's a patch linked in the bug from Tomas Kuliavas and Marcus that fixes the problem by simply redefining zend_tolower() to a simple locale-insensitive ASCII tolower() function, which does fix the Turkish and Azeri locales. The potential breakage from this is that single-byte locales will no longer get case-insensitive lookups of non-ASCII characters: for example, somebody using fr_FR.ISO-8859-1 as a locale could no longer call a method É() as é(). Since it doesn't break anything when using multi-byte locales (which have never had case-insensitive lookups anyway since the Zend Engine uses the single-byte tolower() internally), my inclination would be to apply the patch on trunk and document it as a BC issue. I've uploaded an updated version of Tomas's patch that applies cleanly to trunk to http://www.adamharvey.name/patches/35050/zend_operators.c.diff and a phpt file to test the fix to http://www.adamharvey.name/patches/35050/bug35050.phpt. It's likely that the test would require massaging before being committed to work on Windows, but since I don't have a Windows development box readily available and don't know a thing about how Windows implements locale support, this would require help from someone familiar with the platform. So: thoughts; concerns; alternate approaches? It would be nice to have this sorted for PHP.next. Thanks, Adam -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Turkish/Azeri locale support
As at least some of you would already be aware, there's a long-standing issue with using PHP in a Turkish or Azeri locale, namely that case-insensitive lookups within the Zend engine (method names, for example) fail on lookups involving upper-case I characters, since lower-case I in those languages is ı instead of i (note the lack of a dot). The long term plan for this, per bug #35050 and any number of duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to happen in its original form, I think we're going to need to revisit how we want to deal with this. There's a patch linked in the bug from Tomas Kuliavas and Marcus that fixes the problem by simply redefining zend_tolower() to a simple locale-insensitive ASCII tolower() function, which does fix the Turkish and Azeri locales. As you illustrated in your post, fixing it for locales becomes... complicated. If you ask me, there's only one way to fix this, which is how most other languages fixed it: make the next major version of PHP case-sensitive for all identifiers. For less bugs, less locale problems and more performance. It was somewhat-the-plan, even before the Turkish locale issue was brought up. Regards, Stan Vassilev -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Turkish/Azeri locale support
On 19 April 2010 12:59, Stan Vassilev sv_for...@fmethod.com wrote: If you ask me, there's only one way to fix this, which is how most other languages fixed it: make the next major version of PHP case-sensitive for all identifiers. For less bugs, less locale problems and more performance. Definitely another option — and one I personally like — although I suspect the BC implications of that are considerably greater than breaking high-bit characters in single-byte locales. Adam -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php