On Tue, 2007-08-21 at 08:31 +0100, Lester Caine wrote:
> How much work do people think *IS* involved in porting a large application 
> over to PHP6? Reading between the lines it looks like we are talking file 
> conversion to UTF-16 + what? What is currently a show stopper to simply 
> running a PHP5 application? I 

I'm bored of the unicode.semantics discussion, but a few words on this:
No, UTF-16 is the internal encoding of (textual) strings in PHP 6 (with
u.s.=On) as a user you should never ever see any text in that encoding.

You're scripts use the encoding specified as "unicode.script_encoding"
which defaults to UTF-8 or the one specified in a declare() statement.
Internally they will be converted to UTF-16 then. When being printed to
the output stream they will be converted from UTF-16 to the encoding
specified using unicode.outputencoding (default again UTF-8).

When porting a PHP 5 application there aren't that many problems from my
experience with quite small applications (while I just did simple tests
mainly for testing simple stuff in PHP...)

- You might have some files with different encodings than the configured
  one, für example some applications I'm involved with have my lastname
  with an ISO-8859-1 encoded umlaut in some DocComment or string, either
  the files have to be converted (using recode/iconv should do the
  trick) or you need a declare statement. (these declare statements,
  btw. are compatible [as in being ignored] with PHP 4 and PHP 5)

- You might get a few warnings on stream operations if you're not giving
  a specific encoding, problem there is that some stream operations
  expect different numbers of parameters, so running the same thing with
  PHP 5 and 6 might give a few warnings, but well, most of them can be
  ignored

- Some function want only binary strings and won't convert uniocode
  strings themselves (which would be done by using
  unicode.runtime_encoding) or the other way round. Most of these places
  will be fixed, some of these will need a specific cast by the user.

  An example is rawurlencode() which expects for good reasons a binary
  string. In such an case a (binary) cast, which exists as no-op in PHP
  5.2, too, might be enough. Sometimes you might need a
  unicode_[en|de]code() call.

  This might need some work.

- A bit more work might be involved when you expect to work on bytes
  when doing string operations, if your applications only use English
  texts using ASCII characters that's no issue et all, if not the
  results of operations like
      strlen("äöü");
  or
      $a = "äöü"; echo $a{2};
  might be different depending in the version. But as said in ASCII text
  it's no issue since a single character takes a single byte.

  A really, really bad workaround for most issues related to this, is
  using an encoding like ISO-8859-1 for all unicode*_encoding settings.
  Then most byte sequences can be converted to UTF-16 and a single byte
  is a UTF-16 character and everything "seems" to work, but well, that's
  bad and shouldn't be advertised.

Well, these are most of the things I saw when porting simple
applications from PHP 5 to PHP 6 half a year ago (so maybe I forgot
something important I did...), some of them even are still compatible to
all PHP versions from 4 to 6 (with u.s) while not really making use of
the benefits of the Unicode support.

So for porting: A good first step is simply installing PHP 6, making
sure u.s is On and then fix the errors appearing :-)

And as a final statement: From my experience with rather small apps:
It's possible to make applicatiosn run with PHP 5 and PHP 6 with u.s
On... (while "run" there means "it works but won't benefit from the
unicode stuff")

johannes

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to