>> It comes down to predicting the future. Whichever way we go, the >> decision is going to be second-guessed. If we have critical mass for >> a >> clean BC break, then I am ok with it. For me personally it would make >> things a bit easier, but I think it would be a long long time before >> we >> saw any large hosts out there switch to a PHP 6 that can't run common >> PHP 5 apps. > > If they switch to 6 with unicode off, and never ever get around to > turning unicode on, will it really be any better? > > They'll just be running some weird-o setup that causes all kinds of > bugs and issues and you'll have users with php 6 apps that won't work > in php 6 and who submit bogus bug reports about it, because of the > setting. > > A clean break is probably better, especially if it makes php 6 much > more maintainable. > > Large-scale hosts won't switch to 6 any faster than they switched to > 5, unless there are ZERO BC breaks. > > And nobody can guarantee zero breaks, because there are always buglets.
buglet = small break and not something that requires massive code rewrite. Rewritten code is no longer backwards compatible. So developers have to maintain two code branches or two different sets of libraries. If code is maintained in one branch, scripts will need wrapper functions for most of PHP string and stream function calls. Instead of having performance loss in interpreter, you will force performance loss in portable scripts. > The effort to have unicode off in 6 is probably larger than the effort > to document what needs to be done to a PHP 5 app to make it be > 6-friendly, or even write tools to auto-convert the buik of a script. > > If unicode semantics are "on" what exactly is borked in PHP 5? In Unicode mode \[0-7]{1,3} and \x[0-9A-Fa-f]{1,2} refer to unicode code points and not to octal or hexadecimal byte values. Fix is not backwards compatible. Scripts can't match bytes. How they are supposed to check if string is in plain ascii or in 8bit? Do conversion to ASCII and check for errors instead of looking for 8bit byte values? How can scripts replace 8bit bytes with some other strings? ISO-8859-2 decoding table contains 95 entries written and evaluated as binary strings. Same thing applies to other iso-8859 and windows-125x character sets. iso-89859-1 and utf-8 decoding does not use mapping tables and performs complex calculations with byte values. multibyte character set decoding might actually benefit from unicode_encode(), if Table 325 (http://www.php.net/unicode) provides more information about U_INVALID_SUBSTITUTE and other unicode. settings. PHP6 does not provide backwards compatible functions to work with bytes. Provided constructs are not backwards compatible. If scripts want to do MIME Q encoding, they must work with bytes. Doing Q encoding with provided PHP extensions adds extra dependencies. ICU does not support HTML target. Text conversion to iso-8859-x or windows-125x targets will be lossy. > Can that be fixed to be BC without resorting to this toggle? Unicode and binary typecasting causes E_PARSE error in PHP 5.2.0 and older. PHP6 could introduce new Unicode aware functions, but Unicode implementation choose to modify existing ones. All low level string operations ($string[1]) are Unicode aware by default and not when script actually asks for it. Such implementation is designed for developers, who don't care about Unicode support and want it out of the box without any changes in their Unicode unaware scripts. It is not designed for developers that actually need it and want to have code working in PHP6 and PHP4/5. Unicode code points can be defined with \u, but PHP6 breaks existing octal and hex escape sequences. PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer downcoded for binary stream runtime_encoding", "Warning: base64_encode() expects parameter 1 to be strictly a binary string, Unicode string given") about data stream and string operations. even when fwrite() or base64_encode() works only with plain ascii data. PHP script developers are not used to strict variable type checks in string functions. Which functions are modified to require binary typecasting? Do I have to make a list myself every time some function freaks out? -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php