Hi Craig Francis, > As an aside, only 4 of 23 'no' voters provided any comment as to why they > voted that way on the mailing list, which I feel undermines the point of > the Request For Comment process, with an additional 5 responding personally > off-list after prompting. This makes it harder (or impossible) for points > to be discussed and addressed.
1. My earlier comments about static analysis, and on behavior depending on whether opcache is enabled 2. This might prevent certain optimizations in the future. For example, currently, 1-byte strings are all interned to save memory. If is_literal was part of php prior to proposing that optimization, then that optimization may be rejected. 3. PHP's `string` type is used both for (hopefully) valid unicode strings and for low level operations on literal byte arrays (e.g. cryptogrophy). It seems really, really strange for a type system to track trustedness for a low level primitive to track byte arrays. (the php 6 unicode string refactoring failed) Further, if this were to be extended in the future beyond the original proposal (e.g. literal strings that are entirely digits are automatically interned or marked as trusted), this might open previously safe code acting on byte arrays to side channel issues such as timing attacks (https://en.wikipedia.org/wiki/Timing_attack) 4. Internal functions and userland polyfills for those functions may unintentionally differ significantly for the resulting taintedness, e.g. base64_decode in userland being built up byte by byte would end up being possibly untainted? 5. The fact that 1-byte strings are almost always interned seems like a noticeable inconsistency (though library authors can deal with it once they're aware of it), though for it to become an issue a library may need to take multiple strings as input (e.g. a contrived example`"echo -- " . $trustedPrefix . shell_escape($notTrusted)` for $trustedPrefix of "'" (or "\n") and $notTrusted of "; evaluate command" 6. Including it in core would make it harder to remove later if it interfered with opcache or jit work, or to migrate code to alternative interpreters for php if those were ever implemented (if frameworks were to extensively depend on is_literal) 7. Tracking whether a string is untrusted is a definition only suitable for a few (extremely common) formats for php. But for less common features, even stringified integers may be a problem (e.g. binary file formats, etc) This is relatively minor given that php is typically used in a web programming context with json or html or js/css output I'd think is_interned()/intern_string() is much closer to tracking something that corresponds with php's internals (e.g. and may allow saving memory in long-running processes which receive duplicate strings as input), though the 10 people who wanted fully featured trustedness checking would probably want is_literal instead 8. Serializing and unserializing data would lose information about trustedness of inputs, unpredictably (e.g. unserialize() in php 8.0 interns array keys). There's no (efficient) way to change trusted strings to untrusted or vice versa, though there are inefficient workarounds (modifying a byte and restoring it to stop trusting it, imploding single characters to create a trusted string) This may done implicitly in frameworks using APCu/memcached/redis as a cache (I definitely don't think the serialization data format should track is_literal()) 9. Future refactorings, optimizations or deoptimizations (or security fixes) to unserialize(), etc. may unexpectedly break code using is_literal that throw instead of warn (more bug reports, discourage users from upgrading, etc.) 10. This RFC adds an unknown amount of future work for php-src and PECLs to *intuitively* support mapping trusted inputs to trusted outputs or vice versa - less commonly used or unmaintained functions may not behave as expected for a while 11. https://pecl.php.net/package/taint is available already for a use case with some overlap for setups that need this Aside: I'd have to wonder if ZSTR_IS_INTERNED (and the function to make an interned string) would make sense to expose in a PECL as a regular `extension` (not a `zend_extension`) if is_interned also fails. Unlike the zend_extension for https://www.php.net/manual/en/function.is-tainted.php , something simple may be possible without needing the performance hit and future conflicts with XDebug that I assume https://www.php.net/manual/en/function.is-tainted.php would be prone to. (https://pecl.php.net/package/taint seems to use a separate bit to track this. The latest release of the Taint pecl fixes XDebug compatibility) - Other languages, such as Java, have exposed this for memory management purposes (rather than security) though it's rarely used directly or in frameworks, e.g. https://docs.oracle.com/javase/10/docs/api/java/lang/String.html#intern() (https://docs.python.org/3/library/sys.html#sys.intern) Thanks, Tyson -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php