Removing multibyte encoding support from PHP 5.3 will cause
the severe incompatibility problem with the older PHP 5.x.
As Stefan noted, Shift_JIS character encoding which is widely used in
Japan is not flex safe encoding because it includes 0x5c (backslash) as
second byte of a multibyte character. BIG5 character encoding used by
Chinese is also non flex safe encoding.
Today, I committed a patch for zend multibyte support into PHP_5_3.
It is still in experimental staus because I am not an expert of re2c/flex.
A couple of test scripts is already existing in
Zend/tests/multibyte/*.phpt, but, of course, we need more test scripts
for zend multibute.
(we need to have TestFesta in Japan :) )
The script encoding is specified by a couple of different ways.
(1) mbstinrg.script_encoding in php.ini
(2) declare(encopding="Shift_JIS") on each PHP script
-> multibyte_encoding_001.phpt
(3) BOM in Unicode script
-> multibyte_encoding_00[23].phpt
(4) auto detection based on mbstring.language,mbstring.detect_order
The test scripts are already existing for (2),(3), but nothing for
(1),(4).
I already confirmed my patch for PHP 5.3 is working for (1),(2)
for Shift_JIS encoding. But, I didn't confirmed yet for Unicode BOM
and other encodings.
We need to have more test scripts to maintain the reliability,
to minimize security risks.
Rui
On Tue, 24 Jun 2008 16:21:33 +0200
Stefan Esser <[EMAIL PROTECTED]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> >> This is used when reading scripts that are in encodings like Shift-JIS
> >> which is very common in Japan. In any case, I have tried to get
> >> involvement from some people I know over there without much success.
> >
> > I've asked around a bit as well with our customers/partners, and all
> > they seem to answer is "we simply use UTF-8".
>
> It is very unlikely that anyone on internals uses Shift-JIS (EUC-xx).
> Mainly because (nearly) noone here is Japanese (Korean, Chinese).
>
> However google for phpinfo() and you will see that zend_multibyte is
> compiled into several PHP servers. You can also google for Shift-JIS and
> co...
>
> The problem here is that newer Asian systems will use UTF-8 (except
> those nations using characters not possible in utf-8) and therefore the
> customers of the PHP developers (on this list) will not need that
> support. However there are many legacy systems out there who depend on
> this feature. They most probably don't know about this discussion or
> internals at all, so they cannot speak up.
>
> If PHP 5.3 drops this feature it might close some multibyte security
> problems. However this also means that all those
> Japanese/Chinese/Korean/Taiwanese/... multibyte scripts will not run
> anymore. This forces systems to stay on PHP 5.2 which will most probably
> don't get security updates once PHP 5.3 is out of the door.
>
> Stefan Esser
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.8 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkhhAu0ACgkQSuF5XhWr2njCswCcDCyWnFi4jInpX+BPhmSp6ec7
> pAEAoKfDzhhpFKifgwlsn99WMwkve5bp
> =2qIJ
> -----END PGP SIGNATURE-----
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
--
Rui Hirokawa <[EMAIL PROTECTED]>
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php