On Thu, May 27, 2021 at 3:22 PM Sara Golemon <poll...@php.net> wrote:

> On Thu, May 27, 2021 at 5:44 AM Nikita Popov <nikita....@gmail.com> wrote:
>
>> https://github.com/php/php-src/pull/7059 does a surgical change to
>> replace
>> null bytes with \0.
>>
>>
> Before delving into any other observations, I first want to state
> emphatically that we should not ONLY escape chr(0).  We either apply full
> on addcslashes() (or similar) or do nothing at all.  Half-escaped is worse
> than not escaped.  Sadly, this presents more BC issues than just chr(0) or
> '\\0' do alone since there are many more potential places where output will
> change for anyone using it programmatically.
>
> So should we?
>
> On the one hand: The intent of the var_dump() function is for human
> readable debugging.  Anyone using this in a programmatic way is Doing It
> Wrong™, so I'm not too fussed about changing the output of this function.
> On the other hand: WE use this function programmatically in our tests
> across many thousands of occurrences.  Others probably rightly do as well.
> Yes, the dissonance is real.
>

FWIW, updating our own tests is easy, because the bulk of the change is
automated, just need to fix up individual tests afterwards. It may be an
inconvenience for extension tests though, which require compatibility with
multiple PHP versions.


> The most prudent and BC-safe thing would be to add another function
> `var_dump_escaped()` or even to prefer/advise json_encode() when content
> safety is relevant, but additional type information is not (most of the
> uses of var_dump we have).  Unfortunately, this doesn't actually fix the
> initial problem you stated, which is just getting useful data out of CI
> failures.
>

Well, we had https://wiki.php.net/rfc/readable_var_representation a few
weeks ago, which is basically a better var_dump() with escaping and all --
but you can tell how it went ;)


> The PR you offered is the lightest touch and fixes the issue you cited
> without causing any likely damage to current users, but I don't think we
> should ignore the ambiguity of the output.  Additionally, I think I could
> make an easy argument in favor of escaping CR and NL at the very least.  At
> this point the urge to escape backslash is extra-real as with windows paths
> an innocent \n is far more likely.
>
> ------
>
> Actually. Maybe we're thinking of this wrong.
>
> Rather than change the output at all, why not just have a post-process
> step in our test runner that transforms the output in the test report?
> Then we could be as aggressive as we want, going as far as escaping all
> non-printables plus backslash since at that point we're in it for the human
> readability (and knowing specific byte sequences is essential) and we break
> zero BC for anyone else?
>

Transforming the test output (such that the transformed output is checked
in EXPECT) would indeed be a way to address the original motivation.
There's two caveats though:
1. It does not address the issue for other uses of var_dump(). People being
confused by var_dump((array) $object) output because of the null bytes in
the property names is practically a PHP classic. Of course, fixing this
would take away from the authentic PHP experience :)
2. Changes to run-tests.php behavior also affect extensions, and I think
this kind of change would be even harder to address for PECL extensions
than a simple var_dump() behavior change (where you can at least easily
fork two versions of the file for the tests that need it).

Regards,
Nikita

Reply via email to