Re: [PHP-DEV] Removal of unicode_semantics

Tomas Kuliavas Mon, 05 May 2008 01:32:32 -0700

> Lester Caine schrieb:
>> That sounds like just the sort of edge case that Derick is suggesting
>> needs logging for fixing up. unicode_semantics=on is just another bodge
>> to to make it happen rather than a solution. I think I understand your
>> description, and to my eyes it looks like a unicode bug that needs
>> addressing?
>
> No, it's a misunderstanding of how things work that has been explained
> to Tomas countless times. A unicode string consists of codepoints, not
> of bytes. Having \xXX and \XXX insert bytes instead of codepoints does
> not make sense, because  a) That would require a defined unicode
> encoding to be used, and even if that is the case b) would allow you to
> insert broken data into the unicode string, so it's not a unicode string
> anymore, which is a no-no. If you want to do that sort of fiddling with
> binary details, use binary strings, not unicode strings.


I agree that it is not a bug, because I declare invalid encoding in
scripts in order to make sure that binary and unicode bytes are equal.

You haven't explained me how things work. All your explanations ask me to
use code compatible only with PHP 5.2.1+, drop code that worked fine in
older PHP versions and take away control of charset conversions. I want
backwards compatibility with PHP 5.2.0 and PHP4. I want to be able to
control charset conversions. Where are warranties that charset conversions
will work better in PHP6? In current setups it is safer to do charset
conversions internally instead of relying on PHP to do things. And I can't
drop that code entirely because Unicode implementation in PHP 5.2.1 is
dummy. It is there only to avoid E_PARSE errors in PHP6 compatible code.

-- 
Tomas



-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Removal of unicode_semantics

Reply via email to