Hi Craig,
> On Jun 12, 2021, at 1:00 PM, Craig Francis <[email protected]> wrote:
>
> Hi Internals,
>
> I'd like to start the discussion on the is_literal() RFC:
>
> https://wiki.php.net/rfc/is_literal
Nice! There is an awful lot to like here.
And few bits of concern.
What's to like?
-------------------
1. Adding a proactive method for guarding against injection vulnerabilities
would be a huge step forward for PHP. This should not be underemphasized.
2. That you added the section "Previous Work" and covered what other languages
are doing in this regard.
3. The "Thanks" section and all the external references and prior comments that
provides insight into how this RFC was developed.
4. And I love that you publicly stated these points in an RFC:
a. Why not use static analysis? It will never be used by most
developers.
b. Why not educate everyone? You can't - developer training simply does
not scale, and mistakes still happen.
What's of concern?
-------------------------
(Note: the last point could address concerns #2 and #4.)
1. Bike-shedding a bit, but I find a bit of cognitive dissonance over the
semantic incorrectness of is_literal(), similar to Jakob Givoni's concern.
I also fear it will be confusing for developers who know what a literal is, and
thus I agree with Jakob.
Minimally I'd ask for a separate vote on is_literal() vs. is_from_literal().
2. Regarding the sections "Non-Parameterised Values" and "Future Scope," if I
understand correctly, your vision is to ultimately require all SQL and other
vulnerable literals to be written as literals in the source code for them to be
used or otherwise disallowed by future library code and/or PHP internal
functionality?
If I did not misunderstand I believe you would disallow an important class of
functionality from being implemented in PHP, including several existing and
widely used applications such as but not limited to PhpMyAdmin and Adminer that
open and interact with arbitrary SQL databases. It would also disallow using
any data returned by APIs where the PHP application cannot know in advance what
data will be acquired, even if that data has been properly escaped and/or
sanitized.
3. I notice your RFC does not grant sprintf() the ability to return a string
with the "literal" flag even though sprintf() can frequently return strings
composed from known literals and safe numeric value. And I have frequently see
it used for this purpose, e.g.:
$fields = ['id','name'];
$table = 'my_table';
$sql = sprintf( 'SELECT %s FROM %s WHERE ID=%d',
implode( ',', $fields ),
$my_table,
intval($_GET['id']));
Of course your section "WHERE IN" "addresses" this, but please see #4 below.
I argue sprintf() should be able to pass thru literal flags and validate
integer values to determine that the result is literal. Am I missing something
here?
4. Regarding the section "WHERE IN" you argue that we should push everyone to
use parameters properly — which is a great idea in theory — but that would have
a significant breakage in existing sites online. While many here in Internals
seem to look down on WordPress if does power over 40%[1] of the web at this
point and this would break most if not all those sites if the site owners or
their hosts were to upgrade to a version of PHP that was militant about
requiring is_literal() for mysqli_query()[2].
And yes, WordPress could upgrade their core code[3] to address this, but that
would not also fix the 60k plugins and 10k themes on WordPress.org
<http://wordpress.org/>, or likely the million+ combined custom plugins and
themes in use on the web.
Is it really reasonable to change PHP to just break of all those sites so that
we can force all the site owners to beg and plead with their plugin and theme
developers to better protect them from injection errors? Seems that medicine
might be worse than the cure, at least for all those site owners.
5. Given #2 and #4, I think your section "Support Functions" is missing what
would allow WordPress, PhpMyAdmin, Adminer and all other PHP applications to
address the concerns in #2 and #4 and it is neither literal_concat() nor
literal_implode().
The function we would need is the one Lauri Kenttä wrote (in jest?), but we'd
need one that is performant, not a userland version. Name it as_literal() or
make_literal() or even as_unsafe_literal() — whichever chosen is unimportant
IMO — but applications like the ones I mentioned need to be able to say "I
absolutely know what I am explicitly doing so do not throw a warning or an
error telling me my value is `not a literal`, dammit!"
Certainly most people would not use such a method by accident, especially if
called something like `as_unsafe_literal()`. There might even be ways to
validate that this was not used by a copy-and-paste artist, but I will leave
that validation to other people who are smarter than me.
6. Regarding the section "Reflection API," shouldn't their be an
is_literal()/is_from_literal() method added to the ReflectionParameter() and
ReflectionProperty() classes?
So assuming you address at least #5 — and thus #2 and #4 indirectly — I would
really like to see this RFC pass.
-Mike
[1] https://w3techs.com/blog/entry/40_percent_of_the_web_uses_wordpress
<https://w3techs.com/blog/entry/40_percent_of_the_web_uses_wordpress>
[2]
https://www.php.net/manual/en/mysqli.query.php#refsect1-mysqli.query-parameters
<https://www.php.net/manual/en/mysqli.query.php#refsect1-mysqli.query-parameters>
[3]
https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2056
<https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2056>