> On 20 Jul 2021, at 01:23, tyson andre <[email protected]> wrote: > >> As an aside, only 4 of 23 'no' voters provided any comment as to why they >> voted that way on the mailing list, which I feel undermines the point of >> the Request For Comment process, with an additional 5 responding personally >> off-list after prompting. This makes it harder (or impossible) for points >> to be discussed and addressed. > > 1. My earlier comments about static analysis, and on behavior depending on > whether opcache is enabled
Thanks Tyson, Just to check I've not missed anything, are you referring to your comments from March 2020? https://news-web.php.net/php.internals/109192 <https://news-web.php.net/php.internals/109192> I mentioned your email in the RFC, as I agree that developers should use Static Analysis, but I think it's highly unlikely we will be able to get most PHP developers to use it: https://wiki.php.net/rfc/is_literal#static_analysis <https://wiki.php.net/rfc/is_literal#static_analysis> As to the OPcache, Joe's implementation already works with its optimisations, so the results are consistent. https://wiki.php.net/rfc/is_literal#compiler_optimisations <https://wiki.php.net/rfc/is_literal#compiler_optimisations> > 2. This might prevent certain optimizations in the future. For example, > currently, 1-byte strings are all interned to save memory. > If is_literal was part of php prior to proposing that optimization, then > that optimization may be rejected. Like the OPcache optimisations, the 1-byte strings and other existing optimisations work with Joe's implementation. I cannot comment or predict any future optimisation work, but this argument could be made for any change to PHP. > 3. PHP's `string` type is used both for (hopefully) valid unicode strings and > for low level operations on literal byte arrays (e.g. cryptogrophy). > It seems really, really strange for a type system to track trustedness for > a low level primitive to track byte arrays. (the php 6 unicode string > refactoring failed) > > Further, if this were to be extended in the future beyond the original > proposal (e.g. literal strings that are entirely digits are automatically > interned or marked as trusted), > this might open previously safe code acting on byte arrays to side channel > issues such as timing attacks (https://en.wikipedia.org/wiki/Timing_attack) We're just adding a flag to say if the string was written by the developer, which is key for identifying Injection Vulnerabilities (it's similar to the Interned flag). With cryptography and timing attacks, the only part affected would be during string concatenation, which has already got several optimisations that introduce variability in timings (e.g. concatenating variables containing 'a' with 'b' is about twice as slow than 'a' with ''). We did have a discussion about integer support, but the way integers/floats/booleans are currently stored in memory would require a big change to note if those values were defined in the developers PHP source code. We also explored the idea of allowing all integers, and while they cannot lead to an Injection Vulnerability, the consensus was that a developer defined string is easier to understand: https://wiki.php.net/rfc/is_literal#integer_values <https://wiki.php.net/rfc/is_literal#integer_values> > 4. Internal functions and userland polyfills for those functions may > unintentionally differ significantly for the resulting taintedness, > e.g. base64_decode in userland being built up byte by byte would end up > being possibly untainted? All of these functions would return a non-literal string, as the values were not written by the developer (as in, directly defined in their source code). > 5. The fact that 1-byte strings are almost always interned seems like a > noticeable inconsistency (though library authors can deal with it once > they're aware of it), though for it to become an issue a library may need to > take multiple strings as input > (e.g. a contrived example`"echo -- " . $trustedPrefix . > shell_escape($notTrusted)` for $trustedPrefix of "'" (or "\n") and > $notTrusted of "; evaluate command" 1-byte strings have already been addressed in the implementation. Libraries need to take user values separately from developer defined strings, because developers frequently make mistakes: https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4 <https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4> Especially when it comes to escaping values: https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4 <https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4> In your command example, the library could copy how Parameterised Queries work in SQL. https://github.com/craigfrancis/php-is-literal-rfc/blob/main/examples/cli-basic.php <https://github.com/craigfrancis/php-is-literal-rfc/blob/main/examples/cli-basic.php> > 6. Including it in core would make it harder to remove later if it interfered > with opcache or jit work, or to migrate code to alternative interpreters for > php if those were ever implemented (if frameworks were to extensively depend > on is_literal) This is why Joe was very careful with the implementation. Overall the approach is simple, where it's just providing libraries with the key bit of information to identify their primary source of Injection Vulnerabilities (when a developer has incorrectly included user data). > 7. Tracking whether a string is untrusted is a definition only suitable for a > few (extremely common) formats for php. But for less common features, even > stringified integers may be a problem (e.g. binary file formats, etc) > > This is relatively minor given that php is typically used in a web > programming context with json or html or js/css output > > I'd think is_interned()/intern_string() is much closer to tracking > something that corresponds with php's internals (e.g. and may allow saving > memory in long-running processes which receive duplicate strings as input), > though the 10 people who wanted fully featured trustedness checking would > probably want is_literal instead The implementation is pretty close to matching the Interned flag, but isn't exactly the same, as the Interned flag can be unpredictable for normal PHP developers (e.g. the chr() function), and it could be changed in the future (e.g. Joe noted how there was a suggestion to intern keys while JSON decoding or unserializing). > 8. Serializing and unserializing data would lose information about > trustedness of inputs, unpredictably (e.g. unserialize() in php 8.0 interns > array keys). > > There's no (efficient) way to change trusted strings to untrusted or vice > versa, though there are inefficient workarounds (modifying a byte and > restoring it to stop trusting it, imploding single characters to create a > trusted string) > > This may done implicitly in frameworks using APCu/memcached/redis as a > cache > > (I definitely don't think the serialization data format should track > is_literal()) Agreed, serializing and unserializing (or any other modifications) does not set the literal flag. As to providing an easy way to mark a non-literal string to a literal - in short, we do not want to recreate the biggest flaw of Taint Checking, where naive developers would see this as a way to mark escaped strings as "safe", giving them a false sense of security that these strings can now be trusted. > 9. Future refactorings, optimizations or deoptimizations (or security fixes) > to unserialize(), etc. may unexpectedly break code using is_literal that > throw instead of warn (more bug reports, discourage users from upgrading, > etc.) The implementation already includes several tests, and output from functions like unserialize() do not set the literal flag. > 10. This RFC adds an unknown amount of future work for php-src and PECLs to > *intuitively* support mapping trusted inputs to trusted outputs or vice versa > - less commonly used or unmaintained functions may not behave as expected for > a while This shouldn't be an issue, as the flag is only being used to identify literal strings. If new strings are being created or modified, they are no longer considered literal strings (the flag is off by default). > 11. https://pecl.php.net/package/taint is available already for a use case > with some overlap for setups that need this Libraries need something that's available to everyone (we all make mistakes; and the developers most likely to introduce these vulnerabilities are unlikely to install an extension, or use Static Analysis). And with the Taint extension in particular, it has really helped me experiment with this concept, but that approach does give a false sense of security, which is why this RFC did not re-create it: https://wiki.php.net/rfc/is_literal#taint_checking <https://wiki.php.net/rfc/is_literal#taint_checking> > Aside: I'd have to wonder if ZSTR_IS_INTERNED (and the function to make an > interned string) would make sense to expose in a PECL as a regular > `extension` (not a `zend_extension`) if is_interned also fails. > Unlike the zend_extension for > https://www.php.net/manual/en/function.is-tainted.php , > something simple may be possible without needing the performance hit and > future conflicts with XDebug that I assume > https://www.php.net/manual/en/function.is-tainted.php would be prone to. > (https://pecl.php.net/package/taint seems to use a separate bit to track > this. The latest release of the Taint pecl fixes XDebug compatibility) As noted above, we couldn't expose the Interned flag directly, as it's a little unpredictable. That's why we added a new flag - one that at first glance might look the same, but handles the edge cases. Thanks again, Craig
