Re: [PHP-DEV] Turkish/Azeri locale support

2010-05-04 Thread Joel Perras
+1 for option #2.

Joël.

On Sun, Apr 18, 2010 at 11:58 PM, Adam Harvey ahar...@php.net wrote:
 As at least some of you would already be aware, there's a
 long-standing issue with using PHP in a Turkish or Azeri locale,
 namely that case-insensitive lookups within the Zend engine (method
 names, for example) fail on lookups involving upper-case I characters,
 since lower-case I in those languages is ı instead of i (note the lack
 of a dot).

 The long term plan for this, per bug #35050 and any number of
 duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to
 happen in its original form, I think we're going to need to revisit
 how we want to deal with this. There's a patch linked in the bug from
 Tomas Kuliavas and Marcus that fixes the problem by simply redefining
 zend_tolower() to a simple locale-insensitive ASCII tolower()
 function, which does fix the Turkish and Azeri locales.

 The potential breakage from this is that single-byte locales will no
 longer get case-insensitive lookups of non-ASCII characters: for
 example, somebody using fr_FR.ISO-8859-1 as a locale could no longer
 call a method É() as é(). Since it doesn't break anything when using
 multi-byte locales (which have never had case-insensitive lookups
 anyway since the Zend Engine uses the single-byte tolower()
 internally), my inclination would be to apply the patch on trunk and
 document it as a BC issue.

 I've uploaded an updated version of Tomas's patch that applies cleanly
 to trunk to http://www.adamharvey.name/patches/35050/zend_operators.c.diff
 and a phpt file to test the fix to
 http://www.adamharvey.name/patches/35050/bug35050.phpt. It's likely
 that the test would require massaging before being committed to work
 on Windows, but since I don't have a Windows development box readily
 available and don't know a thing about how Windows implements locale
 support, this would require help from someone familiar with the
 platform.

 So: thoughts; concerns; alternate approaches? It would be nice to have
 this sorted for PHP.next.

 Thanks,

 Adam

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php





-- 
I do know everything, just not all at once. It's a virtual memory problem.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Turkish/Azeri locale support

2010-05-04 Thread Etienne Kneuss
Hi,

A definite, -1 for #2, it's a _massive_ BC break with no justification
so far IMHO.

The optimization point is quite moot, tolower could be restricted to
compilation + dynamic accesses, which would remove most of them
already.

OTOH option #1 seems like the most sensible approach, breaking only in
very limited cases, so +1 from me.

Best,

On Tue, May 4, 2010 at 5:00 PM, Joel Perras joel.per...@gmail.com wrote:
 +1 for option #2.

 Joël.

 On Sun, Apr 18, 2010 at 11:58 PM, Adam Harvey ahar...@php.net wrote:
 As at least some of you would already be aware, there's a
 long-standing issue with using PHP in a Turkish or Azeri locale,
 namely that case-insensitive lookups within the Zend engine (method
 names, for example) fail on lookups involving upper-case I characters,
 since lower-case I in those languages is ı instead of i (note the lack
 of a dot).

 The long term plan for this, per bug #35050 and any number of
 duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to
 happen in its original form, I think we're going to need to revisit
 how we want to deal with this. There's a patch linked in the bug from
 Tomas Kuliavas and Marcus that fixes the problem by simply redefining
 zend_tolower() to a simple locale-insensitive ASCII tolower()
 function, which does fix the Turkish and Azeri locales.

 The potential breakage from this is that single-byte locales will no
 longer get case-insensitive lookups of non-ASCII characters: for
 example, somebody using fr_FR.ISO-8859-1 as a locale could no longer
 call a method É() as é(). Since it doesn't break anything when using
 multi-byte locales (which have never had case-insensitive lookups
 anyway since the Zend Engine uses the single-byte tolower()
 internally), my inclination would be to apply the patch on trunk and
 document it as a BC issue.

 I've uploaded an updated version of Tomas's patch that applies cleanly
 to trunk to http://www.adamharvey.name/patches/35050/zend_operators.c.diff
 and a phpt file to test the fix to
 http://www.adamharvey.name/patches/35050/bug35050.phpt. It's likely
 that the test would require massaging before being committed to work
 on Windows, but since I don't have a Windows development box readily
 available and don't know a thing about how Windows implements locale
 support, this would require help from someone familiar with the
 platform.

 So: thoughts; concerns; alternate approaches? It would be nice to have
 this sorted for PHP.next.

 Thanks,

 Adam

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php





 --
 I do know everything, just not all at once. It's a virtual memory problem.

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php





-- 
Etienne Kneuss
http://www.colder.ch

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Turkish/Azeri locale support

2010-04-19 Thread Tomas Kuliavas
2010.04.19 07:59 Stan Vassilev rašė:
As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

The long term plan for this, per bug #35050 and any number of
duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to
happen in its original form, I think we're going to need to revisit
how we want to deal with this. There's a patch linked in the bug from
Tomas Kuliavas and Marcus that fixes the problem by simply redefining
zend_tolower() to a simple locale-insensitive ASCII tolower()
function, which does fix the Turkish and Azeri locales.

 As you illustrated in your post, fixing it for locales becomes...
 complicated.

 If you ask me, there's only one way to fix this, which is how most other
 languages fixed it: make the next major version of PHP case-sensitive for
 all identifiers. For less bugs, less locale problems and more performance.

 It was somewhat-the-plan, even before the Turkish locale issue was brought
 up.

Fixing issue is not complicated. I could do that without any C coding
background. Your (@php.net) developers only have to learn that they should
not use locale sensitive functions and assume that English case
sensitivity rules apply. This is main lesson Turkey presents to any coder.
If you continue to ignore it, you will continue to trigger PHP bugs in
Turkey. For n years PHP used only locale sensitive case-insensitive
functions. You never bothered to fix it. Offsetting it to some distant
PHP6 feature does not help Turks.

Patch for 35050 is not something that should break things. You reviewed
patch, commented it, I have updated patch based on your style comments and
you continued to ignore the problem. Excuse that patch breaks something is
funny, because Win32 builds are set to use internal (not for public use)
Microsoft C library calls that are locale insensitive. If some PHP code
breaks when string functions are locale insensitive, please show that
code. I would like to see how i18n unportable PHP5 programming looks like.

If users want to use Turkish locale here and now, they must set LC_CTYPE
to C. This workaround disables all locale specific quirks and only gettext
must be set to use correct charset for all translations. Other fix is more
complex. php scripts must replace all locale sensitive native functions
with own locale insensitive replacements and pray that supported PHP
versions don't trigger bugs, when LC_CTYPE is not C.

-- 
Tomas



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Turkish/Azeri locale support

2010-04-18 Thread Adam Harvey
As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).

The long term plan for this, per bug #35050 and any number of
duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to
happen in its original form, I think we're going to need to revisit
how we want to deal with this. There's a patch linked in the bug from
Tomas Kuliavas and Marcus that fixes the problem by simply redefining
zend_tolower() to a simple locale-insensitive ASCII tolower()
function, which does fix the Turkish and Azeri locales.

The potential breakage from this is that single-byte locales will no
longer get case-insensitive lookups of non-ASCII characters: for
example, somebody using fr_FR.ISO-8859-1 as a locale could no longer
call a method É() as é(). Since it doesn't break anything when using
multi-byte locales (which have never had case-insensitive lookups
anyway since the Zend Engine uses the single-byte tolower()
internally), my inclination would be to apply the patch on trunk and
document it as a BC issue.

I've uploaded an updated version of Tomas's patch that applies cleanly
to trunk to http://www.adamharvey.name/patches/35050/zend_operators.c.diff
and a phpt file to test the fix to
http://www.adamharvey.name/patches/35050/bug35050.phpt. It's likely
that the test would require massaging before being committed to work
on Windows, but since I don't have a Windows development box readily
available and don't know a thing about how Windows implements locale
support, this would require help from someone familiar with the
platform.

So: thoughts; concerns; alternate approaches? It would be nice to have
this sorted for PHP.next.

Thanks,

Adam

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Turkish/Azeri locale support

2010-04-18 Thread Stan Vassilev

As at least some of you would already be aware, there's a
long-standing issue with using PHP in a Turkish or Azeri locale,
namely that case-insensitive lookups within the Zend engine (method
names, for example) fail on lookups involving upper-case I characters,
since lower-case I in those languages is ı instead of i (note the lack
of a dot).



The long term plan for this, per bug #35050 and any number of
duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to
happen in its original form, I think we're going to need to revisit
how we want to deal with this. There's a patch linked in the bug from
Tomas Kuliavas and Marcus that fixes the problem by simply redefining
zend_tolower() to a simple locale-insensitive ASCII tolower()
function, which does fix the Turkish and Azeri locales.


As you illustrated in your post, fixing it for locales becomes... 
complicated.


If you ask me, there's only one way to fix this, which is how most other 
languages fixed it: make the next major version of PHP case-sensitive for 
all identifiers. For less bugs, less locale problems and more performance.


It was somewhat-the-plan, even before the Turkish locale issue was brought 
up.


Regards,
Stan Vassilev 



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Turkish/Azeri locale support

2010-04-18 Thread Adam Harvey
On 19 April 2010 12:59, Stan Vassilev sv_for...@fmethod.com wrote:
 If you ask me, there's only one way to fix this, which is how most other
 languages fixed it: make the next major version of PHP case-sensitive for
 all identifiers. For less bugs, less locale problems and more performance.

Definitely another option — and one I personally like — although I
suspect the BC implications of that are considerably greater than
breaking high-bit characters in single-byte locales.

Adam

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php