On Mon, 5 Jul 2021 at 20:15, Craig Francis <[email protected]> wrote:
> Hi Internals, > > I have opened voting on https://wiki.php.net/rfc/is_literal for the > is-literal function. > > The vote closes 2021-07-19 > > The proposal is to add the function is_literal(), a simple way to identify > if a string was written by a developer, removing the risk of a variable > containing an Injection Vulnerability. > > This implementation is for literal strings ONLY (after discussion over > allowing integers) and, thanks to the amazing work of Joe Watkins, now > works fully with compiler optimisations, interned strings etc. > > Craig > Hi Craig, Although I think the idea of the feature is useful, I'm not so sure about the implementation. I watched the talk you referenced a couple of times at different points in time (the first being a couple of years back), and I fail to see how this RFC is a similar implementation to it. As how they do it at Google is to have it part of the type systems (arguably in a weird way but nonaless), and due to the language being compiled the compiler will just flat out refuse to produce an executable if the types mismatch. >From my understanding, the RFC's implementation is similar to what Google does, which is to "annotate" the string, but without having the guarantees of a compiler to back it up. This approach is totally reasonable for static analysis, as running it is akin to the compilation step in checking the validity. However, having this approach built into the language itself seems rather problematic to me. Ideally we would want to assign a variable to be of 'literal' type to ensure none of the actions applied to it demote it from being a literal, and when such a demotion would occur, for it to TypeError. Due to PHP's nature we cannot do this (yet?), therefore overloading the concatenation operation seems rather unwise. The case where concatenation between a literal and a non-literal happens, without error, is very similar to passing around a nullable type until one function/method/property doesn't accept null where it blows up into your face, and you need to track down where on earth did the null value came from, which might be multiple calls prior. And there has been a hell of a lot of talks/articles/etc. about *not* using nullable types due to this issue. This is I think the main issue with the current shape of the proposal. This implementation will detect certain security issues, but finding the root cause for them is going to be rather complicated, as the concatenation operation is basically kicking the can down the road about the responsibility of checking whether or not the result is a literal. Whereas using a function like concat_literal() which checks that the inputs are indeed literals provides immediate feedback that the type constraint is not being violated. Due to this reason, I'm voting against this proposal. Best regards, George P. Banyard
