Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6?

Tomas Kuliavas Fri, 29 Jun 2007 02:04:10 -0700

>> Unicode code points can be defined with \u, but PHP6 breaks existing
>> octal
>> and hex escape sequences.
>
> What do you mean? Doesn't \x20 create U0020 character? Or you mean you'd
> expect it to create just one-byte 0x20? Doesn't binary string do that?


Try higher than 0x7F values.

If I write "\xA0", I expect one byte with A0 hex value and not 0xC2\xA0
(\u00A0). If I use \x80-\xFF range, I expect functions to match bytes and
not only \u0080 - \u00FF

Binary strings can do that, but they are not backwards compatible. In
order to do same thing in PHP4/5 and PHP6, I'll have to move code into
separate libraries.

>> PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer
>> downcoded for binary stream runtime_encoding", "Warning: base64_encode()
>> expects parameter 1 to be strictly a binary string, Unicode string
>> given")
>
> Well, exporting and importing to and from non-unicode contexts are
> tricky, and fwrite and base64_encode do exactly that. Maybe some
> functions need to be less noisy, I don't know - but when people work
> with unicode they must be aware that interoperating with non-unicode
> contexts brings some complexity, I don't see how that can be avoided.

For me it means that I have to maintain wrappers for fwrite,
base64_encode, ord, crc32 and all other unicode aware functions. Any
direct PHP string or stream function call can cause compatibility issues
or notices. Any function working with binary data will need separate
version for PHP6. Instead of having unicode switches in interpreter
itself, I'll have to implement them in scripts. Talk about performance
issues after that.


-- 
Tomas

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6?

Reply via email to