On Tue, 2007-08-21 at 08:31 +0100, Lester Caine wrote: > How much work do people think *IS* involved in porting a large application > over to PHP6? Reading between the lines it looks like we are talking file > conversion to UTF-16 + what? What is currently a show stopper to simply > running a PHP5 application? I
I'm bored of the unicode.semantics discussion, but a few words on this: No, UTF-16 is the internal encoding of (textual) strings in PHP 6 (with u.s.=On) as a user you should never ever see any text in that encoding. You're scripts use the encoding specified as "unicode.script_encoding" which defaults to UTF-8 or the one specified in a declare() statement. Internally they will be converted to UTF-16 then. When being printed to the output stream they will be converted from UTF-16 to the encoding specified using unicode.outputencoding (default again UTF-8). When porting a PHP 5 application there aren't that many problems from my experience with quite small applications (while I just did simple tests mainly for testing simple stuff in PHP...) - You might have some files with different encodings than the configured one, für example some applications I'm involved with have my lastname with an ISO-8859-1 encoded umlaut in some DocComment or string, either the files have to be converted (using recode/iconv should do the trick) or you need a declare statement. (these declare statements, btw. are compatible [as in being ignored] with PHP 4 and PHP 5) - You might get a few warnings on stream operations if you're not giving a specific encoding, problem there is that some stream operations expect different numbers of parameters, so running the same thing with PHP 5 and 6 might give a few warnings, but well, most of them can be ignored - Some function want only binary strings and won't convert uniocode strings themselves (which would be done by using unicode.runtime_encoding) or the other way round. Most of these places will be fixed, some of these will need a specific cast by the user. An example is rawurlencode() which expects for good reasons a binary string. In such an case a (binary) cast, which exists as no-op in PHP 5.2, too, might be enough. Sometimes you might need a unicode_[en|de]code() call. This might need some work. - A bit more work might be involved when you expect to work on bytes when doing string operations, if your applications only use English texts using ASCII characters that's no issue et all, if not the results of operations like strlen("äöü"); or $a = "äöü"; echo $a{2}; might be different depending in the version. But as said in ASCII text it's no issue since a single character takes a single byte. A really, really bad workaround for most issues related to this, is using an encoding like ISO-8859-1 for all unicode*_encoding settings. Then most byte sequences can be converted to UTF-16 and a single byte is a UTF-16 character and everything "seems" to work, but well, that's bad and shouldn't be advertised. Well, these are most of the things I saw when porting simple applications from PHP 5 to PHP 6 half a year ago (so maybe I forgot something important I did...), some of them even are still compatible to all PHP versions from 4 to 6 (with u.s) while not really making use of the benefits of the Unicode support. So for porting: A good first step is simply installing PHP 6, making sure u.s is On and then fix the errors appearing :-) And as a final statement: From my experience with rather small apps: It's possible to make applicatiosn run with PHP 5 and PHP 6 with u.s On... (while "run" there means "it works but won't benefit from the unicode stuff") johannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php