Re: [PHP-DEV] Escape \0 in var_dump() output
On Thu, May 27, 2021 at 6:11 PM Sara Golemon wrote: > On Thu, May 27, 2021 at 9:07 AM Nikita Popov wrote: > >> Ah, I think I explained the original issue badly: The test runner output >> isn't really a problem, or at least I never perceived it to be. >> >> The problem is that if you change a test that contains a null byte >> anywhere (read: in var_dump output), then github will not show a diff for >> that file. You can see this nicely in the proposed PR itself: >> https://github.com/php/php-src/pull/7059/files Most of the updated test >> files show up as "Binary file not shown." As you can imagine, this makes >> review hard. Using the .patch file doesn't help either, because it only >> contains entries like: >> >> > AH! Yeah, I completely misunderstood the problem space. I didn't imagine > for a second we had non-ASCII in any test because that's bad and it should > feel bad. > > Okay, then I still disagree with changing var_dump()'s output, but for > different reasons. > > So take Zend/tests/bug60569.phpt for example, which has this EXPECT > section (where ^@ is actually the null byte): > > --EXPECT-- > string(20) "Some error ^@ message" > > Instead of changing how var_dump() works in order to produce different > output, I'd change the EXPECT to and EXPECTF and add a new %0 pattern > > --EXPECTF-- > string(20) "Some error %0 message" > > This creates zero BC breaks, allows extensions to continue working just > fine with raw nulls or update to use %0 if they'd like, and creates no > ambiguities. > I like that approach. Implemented in https://github.com/php/php-src/pull/7069. Regards, Nikita
Re: [PHP-DEV] Escape \0 in var_dump() output
On Thu, May 27, 2021 at 9:07 AM Nikita Popov wrote: > Ah, I think I explained the original issue badly: The test runner output > isn't really a problem, or at least I never perceived it to be. > > The problem is that if you change a test that contains a null byte > anywhere (read: in var_dump output), then github will not show a diff for > that file. You can see this nicely in the proposed PR itself: > https://github.com/php/php-src/pull/7059/files Most of the updated test > files show up as "Binary file not shown." As you can imagine, this makes > review hard. Using the .patch file doesn't help either, because it only > contains entries like: > > AH! Yeah, I completely misunderstood the problem space. I didn't imagine for a second we had non-ASCII in any test because that's bad and it should feel bad. Okay, then I still disagree with changing var_dump()'s output, but for different reasons. So take Zend/tests/bug60569.phpt for example, which has this EXPECT section (where ^@ is actually the null byte): --EXPECT-- string(20) "Some error ^@ message" Instead of changing how var_dump() works in order to produce different output, I'd change the EXPECT to and EXPECTF and add a new %0 pattern --EXPECTF-- string(20) "Some error %0 message" This creates zero BC breaks, allows extensions to continue working just fine with raw nulls or update to use %0 if they'd like, and creates no ambiguities. -Sara
Re: [PHP-DEV] Escape \0 in var_dump() output
On Thu, May 27, 2021 at 3:54 PM Sara Golemon wrote: > On Thu, May 27, 2021 at 8:42 AM Nikita Popov wrote: > >> The most prudent and BC-safe thing would be to add another function >>> `var_dump_escaped()` or even to prefer/advise json_encode() when content >>> safety is relevant, but additional type information is not (most of the >>> uses of var_dump we have). Unfortunately, this doesn't actually fix the >>> initial problem you stated, which is just getting useful data out of CI >>> failures. >>> >> >> Well, we had https://wiki.php.net/rfc/readable_var_representation a few >> weeks ago, which is basically a better var_dump() with escaping and all -- >> but you can tell how it went ;) >> >> > > I don't specifically remember that thread, but suddenly I feel like I can > guess everything I need to know about the outcome. :( > > > Rather than change the output at all, why not just have a post-process >>> step in our test runner that transforms the output in the test report? >>> Then we could be as aggressive as we want, going as far as escaping all >>> non-printables plus backslash since at that point we're in it for the human >>> readability (and knowing specific byte sequences is essential) and we break >>> zero BC for anyone else? >>> >> >> Transforming the test output (such that the transformed output is checked >> in EXPECT) would indeed be a way to address the original motivation. >> There's two caveats though: >> 1. It does not address the issue for other uses of var_dump(). People >> being confused by var_dump((array) $object) output because of the null >> bytes in the property names is practically a PHP classic. Of course, fixing >> this would take away from the authentic PHP experience :) >> > > True, but neither (IMO) would just escaping null byte, which again I > regard as a half-measure. It has all the negative impact of breaking BC > (your point about extension phpts is a good one), without going far enough > to address the human-readability problem for many common cases (CR/NL chief > among them). > > Insert Karate Kid "squash like a bug" reference here. Escape it all, or > escape none of it. > > >> 2. Changes to run-tests.php behavior also affect extensions, and I think >> this kind of change would be even harder to address for PECL extensions >> than a simple var_dump() behavior change (where you can at least easily >> fork two versions of the file for the tests that need it). >> >> > Would it? The report at the end of the run is for human eyes, generally > speaking. A human can interpret escaping (or not) as needed. > Also, we could make the post-processing optional. > > run-test.php --escape-test-output > > Then we use that mode in CI where we need git-friendly diffs, and it works > "normally" elsewhere. > Ah, I think I explained the original issue badly: The test runner output isn't really a problem, or at least I never perceived it to be. The problem is that if you change a test that contains a null byte anywhere (read: in var_dump output), then github will not show a diff for that file. You can see this nicely in the proposed PR itself: https://github.com/php/php-src/pull/7059/files Most of the updated test files show up as "Binary file not shown." As you can imagine, this makes review hard. Using the .patch file doesn't help either, because it only contains entries like: diff --git a/Zend/tests/bug60569.phpt b/Zend/tests/bug60569.phpt index 480c9c8f8a1285956616a119825520471bbe88b5..c658fa28a00017412a3a840498605efa7f8fe10b 100644 GIT binary patch delta 25 gcmZ3i_@% delta 23 ecmZ3@w32DUA4Z0W{}j0y6mnCGixbmRmAC+9#0Sy< To avoid this issue, we need the transformation to apply the actual EXPECTed output, otherwise our test files will still be considered as binary files. Regards, Nikita
Re: [PHP-DEV] Escape \0 in var_dump() output
On Thu, May 27, 2021 at 8:42 AM Nikita Popov wrote: > The most prudent and BC-safe thing would be to add another function >> `var_dump_escaped()` or even to prefer/advise json_encode() when content >> safety is relevant, but additional type information is not (most of the >> uses of var_dump we have). Unfortunately, this doesn't actually fix the >> initial problem you stated, which is just getting useful data out of CI >> failures. >> > > Well, we had https://wiki.php.net/rfc/readable_var_representation a few > weeks ago, which is basically a better var_dump() with escaping and all -- > but you can tell how it went ;) > > I don't specifically remember that thread, but suddenly I feel like I can guess everything I need to know about the outcome. :( Rather than change the output at all, why not just have a post-process step >> in our test runner that transforms the output in the test report? Then we >> could be as aggressive as we want, going as far as escaping all >> non-printables plus backslash since at that point we're in it for the human >> readability (and knowing specific byte sequences is essential) and we break >> zero BC for anyone else? >> > > Transforming the test output (such that the transformed output is checked > in EXPECT) would indeed be a way to address the original motivation. > There's two caveats though: > 1. It does not address the issue for other uses of var_dump(). People > being confused by var_dump((array) $object) output because of the null > bytes in the property names is practically a PHP classic. Of course, fixing > this would take away from the authentic PHP experience :) > True, but neither (IMO) would just escaping null byte, which again I regard as a half-measure. It has all the negative impact of breaking BC (your point about extension phpts is a good one), without going far enough to address the human-readability problem for many common cases (CR/NL chief among them). Insert Karate Kid "squash like a bug" reference here. Escape it all, or escape none of it. > 2. Changes to run-tests.php behavior also affect extensions, and I think > this kind of change would be even harder to address for PECL extensions > than a simple var_dump() behavior change (where you can at least easily > fork two versions of the file for the tests that need it). > > Would it? The report at the end of the run is for human eyes, generally speaking. A human can interpret escaping (or not) as needed. Also, we could make the post-processing optional. run-test.php --escape-test-output Then we use that mode in CI where we need git-friendly diffs, and it works "normally" elsewhere. -Sara
Re: [PHP-DEV] Escape \0 in var_dump() output
On Thu, May 27, 2021 at 3:22 PM Sara Golemon wrote: > On Thu, May 27, 2021 at 5:44 AM Nikita Popov wrote: > >> https://github.com/php/php-src/pull/7059 does a surgical change to >> replace >> null bytes with \0. >> >> > Before delving into any other observations, I first want to state > emphatically that we should not ONLY escape chr(0). We either apply full > on addcslashes() (or similar) or do nothing at all. Half-escaped is worse > than not escaped. Sadly, this presents more BC issues than just chr(0) or > '\\0' do alone since there are many more potential places where output will > change for anyone using it programmatically. > > So should we? > > On the one hand: The intent of the var_dump() function is for human > readable debugging. Anyone using this in a programmatic way is Doing It > Wrong™, so I'm not too fussed about changing the output of this function. > On the other hand: WE use this function programmatically in our tests > across many thousands of occurrences. Others probably rightly do as well. > Yes, the dissonance is real. > FWIW, updating our own tests is easy, because the bulk of the change is automated, just need to fix up individual tests afterwards. It may be an inconvenience for extension tests though, which require compatibility with multiple PHP versions. > The most prudent and BC-safe thing would be to add another function > `var_dump_escaped()` or even to prefer/advise json_encode() when content > safety is relevant, but additional type information is not (most of the > uses of var_dump we have). Unfortunately, this doesn't actually fix the > initial problem you stated, which is just getting useful data out of CI > failures. > Well, we had https://wiki.php.net/rfc/readable_var_representation a few weeks ago, which is basically a better var_dump() with escaping and all -- but you can tell how it went ;) > The PR you offered is the lightest touch and fixes the issue you cited > without causing any likely damage to current users, but I don't think we > should ignore the ambiguity of the output. Additionally, I think I could > make an easy argument in favor of escaping CR and NL at the very least. At > this point the urge to escape backslash is extra-real as with windows paths > an innocent \n is far more likely. > > -- > > Actually. Maybe we're thinking of this wrong. > > Rather than change the output at all, why not just have a post-process > step in our test runner that transforms the output in the test report? > Then we could be as aggressive as we want, going as far as escaping all > non-printables plus backslash since at that point we're in it for the human > readability (and knowing specific byte sequences is essential) and we break > zero BC for anyone else? > Transforming the test output (such that the transformed output is checked in EXPECT) would indeed be a way to address the original motivation. There's two caveats though: 1. It does not address the issue for other uses of var_dump(). People being confused by var_dump((array) $object) output because of the null bytes in the property names is practically a PHP classic. Of course, fixing this would take away from the authentic PHP experience :) 2. Changes to run-tests.php behavior also affect extensions, and I think this kind of change would be even harder to address for PECL extensions than a simple var_dump() behavior change (where you can at least easily fork two versions of the file for the tests that need it). Regards, Nikita
Re: [PHP-DEV] Escape \0 in var_dump() output
On Thu, May 27, 2021 at 5:44 AM Nikita Popov wrote: > https://github.com/php/php-src/pull/7059 does a surgical change to replace > null bytes with \0. > > Before delving into any other observations, I first want to state emphatically that we should not ONLY escape chr(0). We either apply full on addcslashes() (or similar) or do nothing at all. Half-escaped is worse than not escaped. Sadly, this presents more BC issues than just chr(0) or '\\0' do alone since there are many more potential places where output will change for anyone using it programmatically. So should we? On the one hand: The intent of the var_dump() function is for human readable debugging. Anyone using this in a programmatic way is Doing It Wrong™, so I'm not too fussed about changing the output of this function. On the other hand: WE use this function programmatically in our tests across many thousands of occurrences. Others probably rightly do as well. Yes, the dissonance is real. The most prudent and BC-safe thing would be to add another function `var_dump_escaped()` or even to prefer/advise json_encode() when content safety is relevant, but additional type information is not (most of the uses of var_dump we have). Unfortunately, this doesn't actually fix the initial problem you stated, which is just getting useful data out of CI failures. The PR you offered is the lightest touch and fixes the issue you cited without causing any likely damage to current users, but I don't think we should ignore the ambiguity of the output. Additionally, I think I could make an easy argument in favor of escaping CR and NL at the very least. At this point the urge to escape backslash is extra-real as with windows paths an innocent \n is far more likely. -- Actually. Maybe we're thinking of this wrong. Rather than change the output at all, why not just have a post-process step in our test runner that transforms the output in the test report? Then we could be as aggressive as we want, going as far as escaping all non-printables plus backslash since at that point we're in it for the human readability (and knowing specific byte sequences is essential) and we break zero BC for anyone else? -Sara
Re: [PHP-DEV] Escape \0 in var_dump() output
Hey, > whether var_dump() should be performing any escaping at all, and if yes Well, since var_dump is essentially for debugging purposes, it makes sense to escape certain characters. Like you mentioned, some IDEs do not print/support null bytes. Hence, I am all for it, as it simplifies debugging in that scenario (which is the main objective). > how much we should be escaping. Just \0, both \0 and \ to avoid ambiguity, or a larger set of control characters? Probably a larger set of control characters as well. As mentioned before, var_dump is for *debugging*, if you can not get full debugging information, it will only lead to headaches and frustrations. Regardless, I do believe we should provide some kind of 'setting' for this in php.ini (like precision for example) just so the user can decide what they want to do.
[PHP-DEV] Escape \0 in var_dump() output
Hi internals, A regular annoyance with our test suite is that var_dump() prints null bytes literally, those null bytes get into our test expectations, the test is interpreted as a binary file by git, and diffs will not be shown on GitHub. Of course, the null bytes are also confusing to the reader, as many editors will not display them, and the only indication that they are there is the string length. https://github.com/php/php-src/pull/7059 does a surgical change to replace null bytes with \0. A potential problem is that this no longer allows an easy distinction between "\0" and "\\0", as both will be printed as "\0". We could resolve this by additionally escaping \ (and presumably "). See https://gist.github.com/nikic/d3d74d88178ea6946305e0b2d38a84f9 for an rough idea on the impact on Zend and ext/standard tests. So, I'd be interested in knowing a) whether var_dump() should be performing any escaping at all, and if yes b) how much we should be escaping. Just \0, both \0 and \ to avoid ambiguity, or a larger set of control characters? Regards, Nikita