Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Steph Fox wrote: OK, so there are some things that should maybe 'just happen' here, and also when 'S' is used: PHP_NAMED_FUNCTION(php_if_crc32) { char *p; int len, nr; php_uint32 crcinit = 0; register php_uint32 crc; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, S, p, nr) == FAILURE) { return; } ... Warning: crc32() expects parameter 1 to be strictly a binary string, Unicode string given in ... Surely if a function's *expecting* a binary string it should do a silent conversion, and only throw a warning if the conversion fails? I don't see why the onus should always be on the user to adapt to this. No, it shouldn't. Because crc32() and other HMACs depend on the actual bytes, there shouldn't be any automatic conversion. The user needs to pass in the binary string (bytes). -Andrei -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Hi Andrei, I see where you're coming from over the (binary)|(string) issue, just trying to find some way that would preserve a little more back compatibility. UG(ascii_conv) is for the format string, not the arguments. The warning is thrown by this code: convert_to_string_ex(argv[currentarg]); It should check the arg type and return with a warning if it's Unicode. OK, so there are some things that should maybe 'just happen' here, and also when 'S' is used: PHP_NAMED_FUNCTION(php_if_crc32) { char *p; int len, nr; php_uint32 crcinit = 0; register php_uint32 crc; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, S, p, nr) == FAILURE) { return; } ... Warning: crc32() expects parameter 1 to be strictly a binary string, Unicode string given in ... Surely if a function's *expecting* a binary string it should do a silent conversion, and only throw a warning if the conversion fails? I don't see why the onus should always be on the user to adapt to this. - Steph -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Hi, On Sat, 2008-05-24 at 13:55 +0100, Steph Fox wrote: Warning: crc32() expects parameter 1 to be strictly a binary string, Unicode string given in ... Surely if a function's *expecting* a binary string it should do a silent conversion, and only throw a warning if the conversion fails? I don't see why the onus should always be on the user to adapt to this. For some functions taking binary strings is critical for working nicely with an automatic conversion in this case crc32(uäöü) and crc32(bäöü) would give completely different results depending on the runtime encoding, relying on a implicit conversion there is most likely a bug (at least for apps written with PHP 6 in mind). Oh and I might probably also argue that crc32(uäöü) should give the crc32 of the internal representation (utf-16...) of the string, which is a total wtf for the user then. The correct solution is to make safe use of the S modifier and not using it too much. As binary casts are allowed in modern PHP versions I don't see this as an issue, while such a cast isn't absolutely the best thing to do: I'd go with unicode_encode() to be sure about the encoding being used, everything else is prone to fail some time (some code changing unicode.runtime_encoding for some random reason...) johannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
On 21.05.2008 17:51, Antony Dovgal wrote: On 21.05.2008 17:50, Steph Fox wrote: Not fix the streams issue? Felipe has taken care of that issue, so now we have only one uber-major issue left. Your patch? Or the changed error message? Yes, I mean that problem with storing string/Unicode function names. That patch seems to be no good, run-tests.php doesn't work after I apply it (some functions are not found). This patch fixes it: http://dev.daylessday.org/diff/shell_exec_unicode.diff Still, I see no point to fix something until we remove the switch. -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
This patch fixes it: http://dev.daylessday.org/diff/shell_exec_unicode.diff Coolth :) Still, I see no point to fix something until we remove the switch. Let's give Scott a chance to get home from php|tek and present his patch to do just that. - Steph (loin-girder extraordinaire) -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
UG(ascii_conv) is for the format string, not the arguments. The warning is thrown by this code: convert_to_string_ex(argv[currentarg]); It should check the arg type and return with a warning if it's Unicode. -Andrei Steph Fox wrote: It does - 'UG(ascii_conv)' - but sometimes there's a unicode-to-binary conversion warning thrown by that, and then you have to cast the variable to a binary empty string to suppress the warning before you assign the pack() result to it. (See ext/phar/tests/phar_test.inc.) There are similar issues with unpack(), and with fwrite()/fread() and friends when dealing with a purely binary string. What concerns me at this point is that we could and should be telling PHP users how to future-proof their code during the move from PHP 4 to PHP 5. If we can get a fix on that now and get word out, we'll have made their future migration path *much* smoother. - Steph -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Hey Andrei, UG(unicode) checks are still secondary I think - they don't prevent us from doing tests and moving forward, although cleaning them up would be nice. Cleaning them up would make it possible to find and fix the bugs we already know are there ;) There don't seem to be too many, but when it comes to tracking down conversions... bleh... pack() should take binary strings only, methinks. It does - 'UG(ascii_conv)' - but sometimes there's a unicode-to-binary conversion warning thrown by that, and then you have to cast the variable to a binary empty string to suppress the warning before you assign the pack() result to it. (See ext/phar/tests/phar_test.inc.) There are similar issues with unpack(), and with fwrite()/fread() and friends when dealing with a purely binary string. What concerns me at this point is that we could and should be telling PHP users how to future-proof their code during the move from PHP 4 to PHP 5. If we can get a fix on that now and get word out, we'll have made their future migration path *much* smoother. - Steph -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Steph Fox wrote: What concerns me at this point is that we could and should be telling PHP users how to future-proof their code during the move from PHP 4 to PHP 5. If we can get a fix on that now and get word out, we'll have made their future migration path *much* smoother. As far as I can tell, it's effectively impossible for anyone who wants to support low PHP 5 versions. That's because the b prefix wasn't introduced until PHP 5.2.1, and that prefix is *crucial* to future-proofing any code that deals with binary data. -- Edward Z. YangGnuPG: 0x869C48DA HTML Purifier http://htmlpurifier.org Anti-XSS Filter [[ 3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA ]] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
On 21.05.2008 00:21, Steph Fox wrote: I think the test script and ZE problems should be resolved first... And to do that we need to drop the Unicode Switch =) plus I just discovered that my php.ini isn't being found when I run commandline tests (e.g. php -r var_dump(strcasecmp()); doesn't throw an error unless I explicitly set error_reporting from the commandline, despite having E_ALL|E_STRICT in the INI). These are basic things, nobody can really go much further until they're fixed. Your php.ini is being found read allright, the problem is that the INI scanner looks for _string_ constants, while all constants are registered as _unicode_ constants in Unicode mode. So it fails to find E_ALL constant and ends up with atoi(E_ALL), which is 0. I committed a fix for this one a minute ago. The broken function names in error messages are easy to fix too, as soon as we drop the Switch - functions in Unicode mode are registered using their Unicode names, but then a weird thing happens: zend_register_standard_ini_entries() makes Unicode copies of function/class/constant tables (ignoring the fact that they are already Unicode ones), this is why you see only the first character in function names. The patch is as easy as this: http://dev.daylessday.org/diff/fix_unicode_function_names.diff I didn't commit this one yet as I didn't investigate how it would affect non-Unicode mode (and whether I should worry about non-Unicode mode at all). -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Hey Tony, First off, thanks for caring :) And to do that we need to drop the Unicode Switch =) Not fix the streams issue? I committed a fix for this one a minute ago. Thanks! The patch is as easy as this: http://dev.daylessday.org/diff/fix_unicode_function_names.diff This is the biggest problem, isn't it - that things like this obviously hadn't been tested until now. I fixed the same issue in Phar last week - it probably will need un-fixing there once your patch goes in :\ I didn't commit this one yet as I didn't investigate how it would affect non-Unicode mode (and whether I should worry about non-Unicode mode at all). I would've hoped that PHP 6 would be Unicode throughout. Maybe jumping around when Andrei removed the physical switch was premature... obviously the UG(unicode) checks are still in place, whereas their binary alternatives shouldn't still exist. Then again some things are unclear. What should happen with pack(), for example? - Steph -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
On 21.05.2008 15:12, Steph Fox wrote: Hey Tony, First off, thanks for caring :) And to do that we need to drop the Unicode Switch =) Not fix the streams issue? That, too =) -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
On 21.05.2008 15:12, Steph Fox wrote: Hey Tony, First off, thanks for caring :) And to do that we need to drop the Unicode Switch =) Not fix the streams issue? Felipe has taken care of that issue, so now we have only one uber-major issue left. -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Not fix the streams issue? Felipe has taken care of that issue, so now we have only one uber-major issue left. Your patch? Or the changed error message? FWIW the --EXPECT-- sections on those tests are different in 5_3 and would pass, so maybe they should just match those? - Steph -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
On 21.05.2008 17:50, Steph Fox wrote: Not fix the streams issue? Felipe has taken care of that issue, so now we have only one uber-major issue left. Your patch? Or the changed error message? Yes, I mean that problem with storing string/Unicode function names. That patch seems to be no good, run-tests.php doesn't work after I apply it (some functions are not found). -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Yes, I mean that problem with storing string/Unicode function names. That patch seems to be no good, run-tests.php doesn't work after I apply it (some functions are not found). shrug / it'll be a lot easier to see what's going on if/when the UG(unicode) if'n'butting goes. There's obviously a conversion too many somewhere down the line @ present. - Steph -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
On 21.05.2008 17:58, Steph Fox wrote: Yes, I mean that problem with storing string/Unicode function names. That patch seems to be no good, run-tests.php doesn't work after I apply it (some functions are not found). shrug / it'll be a lot easier to see what's going on if/when the UG(unicode) if'n'butting goes. There's obviously a conversion too many somewhere down the line @ present. Right. That's exactly what I meant. -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Nothing premature about it. It was time. UG(unicode) checks are still secondary I think - they don't prevent us from doing tests and moving forward, although cleaning them up would be nice. pack() should take binary strings only, methinks. -Andrei Steph Fox wrote: I would've hoped that PHP 6 would be Unicode throughout. Maybe jumping around when Andrei removed the physical switch was premature... obviously the UG(unicode) checks are still in place, whereas their binary alternatives shouldn't still exist. Then again some things are unclear. What should happen with pack(), for example? - Steph -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
The idea here was to find volunteers who'd try and run their apps and some standard apps on this so we can figure out what the migration path looks like. That'd also give us the necessary data point to make a decision on default for strings. I think none of us right now truly knows what to expect. Andi -Original Message- From: Andrei Zmievski [mailto:[EMAIL PROTECTED] Sent: Monday, May 19, 2008 2:54 PM To: Lukas Kahwe Smith Cc: Cristian Rodríguez; Steph Fox; internals Subject: Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum] Yes, but we should run some sort of test suite and see what exactly breaks with unicode strings as the default ones. On Mon, May 19, 2008 at 2:42 PM, Lukas Kahwe Smith [EMAIL PROTECTED] wrote: On 19.05.2008, at 23:34, Cristian Rodríguez wrote: Andrei Zmievski escribió: Done. See latest commit: switch is removed and mode is defaulted to On. Thank you , VERY MUCH Andrei, now we can **really** move forward.. well the next question is binary or unicode as the default for strings .. regards, Lukas
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Hi Andi, et. al, Am Montag, den 19.05.2008, 23:03 -0700 schrieb Andi Gutmans: The idea here was to find volunteers who'd try and run their apps and some standard apps on this so we can figure out what the migration path looks like. That'd also give us the necessary data point to make a decision on default for strings. I think none of us right now truly knows what to expect. This might be the time, where we need to think about are more substantial solution for this problem. What I have in mind is to provide a system, with the current development versions of PHP (today is would be 5_2, 5_3 and HEAD) installed. We would liberally grant Shell-Accounts and DBs to those who are willing to test their web applications there. Say the S9Y-developers get an account to test their blog system, they get three Vhosts: s9y.php6.func.php.net, s9y.php53.func.php.net, s9y.php52.func.php.net. We would continously build CVS snapshots on this systems to make sure they test against the latest version. Projects with more resources could run a selenium grid against the server for their functional tests, the rest would simply test the applications by hand. We could even install a cruisecontrol (resp. phpUnderControl) instance to run projects unit tests if the have some. cu, Lars signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
On 20.05.2008 01:54, Andrei Zmievski wrote: Yes, but we should run some sort of test suite and see what exactly breaks with unicode strings as the default ones. Right, but in order to do this we need to add filtered streams support to stream_select(). At the moment all tests in HEAD fail because of this: Warning: stream_select(): cannot cast a filtered stream on this system in /local/qa/head/run-tests.php on line 938 Warning: stream_select(): No stream arrays were passed in /local/qa/head/run-tests.php on line 938 -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
At the moment all tests in HEAD fail because of this: Warning: stream_select(): cannot cast a filtered stream on this system in /local/qa/head/run-tests.php on line 938 Warning: stream_select(): No stream arrays were passed in /local/qa/head/run-tests.php on line 938 Why not just use the 5_3 run-tests.php, which works? - Steph -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
On 20.05.2008 17:49, Steph Fox wrote: At the moment all tests in HEAD fail because of this: Warning: stream_select(): cannot cast a filtered stream on this system in /local/qa/head/run-tests.php on line 938 Warning: stream_select(): No stream arrays were passed in /local/qa/head/run-tests.php on line 938 Why not just use the 5_3 run-tests.php, which works? run-tests.php in 5_3 uses stream_select(), too. And stream_select() in HEAD doesn't work because Unicode is now enabled by default. -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
run-tests.php in 5_3 uses stream_select(), too. And stream_select() in HEAD doesn't work because Unicode is now enabled by default. Bizarre. I ran 400-odd tests with run-tests from 5_3 and never saw that error... - Steph -- Wbr, Antony Dovgal -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
run-tests.php in 5_3 uses stream_select(), too. And stream_select() in HEAD doesn't work because Unicode is now enabled by default. Bizarre. I ran 400-odd tests with run-tests from 5_3 and never saw that error... Ah, I have it. My TEST_PHP_EXECUTABLE isn't the same as the one running the script - it's 5.3.0-dev. So make test will fail, but my manual tests don't. I'm seeing 85% pass rate in Zend/tests, investigating. - Steph -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Andi Gutmans wrote: The idea here was to find volunteers who'd try and run their apps and some standard apps on this so we can figure out what the migration path looks like. That'd also give us the necessary data point to make a decision on default for strings. I think none of us right now truly knows what to expect. As I a PHP developer on Windows, I pay very close attention to development work on PHP and regularly test my code against PHP 5.3, hand-built from the CVS. Nearly all of the PECL extensions are broken on Windows for PHP 6. The http extension has a very nasty problem: its fatal error appears to loop nearly ad infinitum. This makes compiling snaps locally very difficult--I've given up. Why is the log file 4.4 MB long? Also, as stands it appears most non-trivial applications are fairly broken. Those issues need to be resolved first, before we toss Unicode onto the pile. -- Edward Z. YangGnuPG: 0x869C48DA HTML Purifier http://htmlpurifier.org Anti-XSS Filter [[ 3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA ]] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Edward Z. Yang wrote: Also, as stands it appears most non-trivial applications are fairly broken. Those issues need to be resolved first, before we toss Unicode onto the pile. I retract this statement; I didn't realize we had already turned on Unicode semantics. -- Edward Z. YangGnuPG: 0x869C48DA HTML Purifier http://htmlpurifier.org Anti-XSS Filter [[ 3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA ]] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Hi Edward, As I a PHP developer on Windows, I pay very close attention to development work on PHP and regularly test my code against PHP 5.3, hand-built from the CVS. Nearly all of the PECL extensions are broken on Windows for PHP 6. The http extension has a very nasty problem: its fatal error appears to loop nearly ad infinitum. This makes compiling snaps locally very difficult--I've given up. Why is the log file 4.4 MB long? Because nobody's been trying to compile snaps against PHP 6? Basically most extension developers decided to stay with 5_3 for now and haven't migrated their code to work with Unicode on. I would hope, though, that if PHP 6 is Unicode-only there might be more enthusiasm for adapting code to work with it. Also, as stands it appears most non-trivial applications are fairly broken. Those issues need to be resolved first, before we toss Unicode onto the pile. I think the test script and ZE problems should be resolved first... plus I just discovered that my php.ini isn't being found when I run commandline tests (e.g. php -r var_dump(strcasecmp()); doesn't throw an error unless I explicitly set error_reporting from the commandline, despite having E_ALL|E_STRICT in the INI). These are basic things, nobody can really go much further until they're fixed. So I agree with you - it's far too soon for apps testing at this stage. - Steph -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Andrei Zmievski escribió: Done. See latest commit: switch is removed and mode is defaulted to On. Thank you , VERY MUCH Andrei, now we can **really** move forward.. -- Progress is possible only if we train ourselves to think about programs without thinking of them as pieces of executable code.” - Edsger W. Dijkstra Cristian Rodríguez R. Platform/OpenSUSE - Core Services SUSE LINUX Products GmbH Research Development http://www.opensuse.org/ signature.asc Description: OpenPGP digital signature
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
On 19.05.2008, at 23:34, Cristian Rodríguez wrote: Andrei Zmievski escribió: Done. See latest commit: switch is removed and mode is defaulted to On. Thank you , VERY MUCH Andrei, now we can **really** move forward.. well the next question is binary or unicode as the default for strings .. regards, Lukas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode progress [Was: unicode.semantics ad infinitum]
Yes, but we should run some sort of test suite and see what exactly breaks with unicode strings as the default ones. On Mon, May 19, 2008 at 2:42 PM, Lukas Kahwe Smith [EMAIL PROTECTED] wrote: On 19.05.2008, at 23:34, Cristian Rodríguez wrote: Andrei Zmievski escribió: Done. See latest commit: switch is removed and mode is defaulted to On. Thank you , VERY MUCH Andrei, now we can **really** move forward.. well the next question is binary or unicode as the default for strings .. regards, Lukas