Hi Craig Francis,
> As an aside, only 4 of 23 'no' voters provided any comment as to why they
> voted that way on the mailing list, which I feel undermines the point of
> the Request For Comment process, with an additional 5 responding personally
> off-list after prompting. This makes it harder (or impossible) for points
> to be discussed and addressed.
1. My earlier comments about static analysis, and on behavior depending on
whether opcache is enabled
2. This might prevent certain optimizations in the future. For example,
currently, 1-byte strings are all interned to save memory.
If is_literal was part of php prior to proposing that optimization, then
that optimization may be rejected.
3. PHP's `string` type is used both for (hopefully) valid unicode strings and
for low level operations on literal byte arrays (e.g. cryptogrophy).
It seems really, really strange for a type system to track trustedness for
a low level primitive to track byte arrays. (the php 6 unicode string
refactoring failed)
Further, if this were to be extended in the future beyond the original
proposal (e.g. literal strings that are entirely digits are automatically
interned or marked as trusted),
this might open previously safe code acting on byte arrays to side channel
issues such as timing attacks (https://en.wikipedia.org/wiki/Timing_attack)
4. Internal functions and userland polyfills for those functions may
unintentionally differ significantly for the resulting taintedness,
e.g. base64_decode in userland being built up byte by byte would end up
being possibly untainted?
5. The fact that 1-byte strings are almost always interned seems like a
noticeable inconsistency (though library authors can deal with it once they're
aware of it), though for it to become an issue a library may need to take
multiple strings as input
(e.g. a contrived example`"echo -- " . $trustedPrefix .
shell_escape($notTrusted)` for $trustedPrefix of "'" (or "\n") and $notTrusted
of "; evaluate command"
6. Including it in core would make it harder to remove later if it interfered
with opcache or jit work, or to migrate code to alternative interpreters for
php if those were ever implemented (if frameworks were to extensively depend on
is_literal)
7. Tracking whether a string is untrusted is a definition only suitable for a
few (extremely common) formats for php. But for less common features, even
stringified integers may be a problem (e.g. binary file formats, etc)
This is relatively minor given that php is typically used in a web
programming context with json or html or js/css output
I'd think is_interned()/intern_string() is much closer to tracking
something that corresponds with php's internals (e.g. and may allow saving
memory in long-running processes which receive duplicate strings as input),
though the 10 people who wanted fully featured trustedness checking would
probably want is_literal instead
8. Serializing and unserializing data would lose information about trustedness
of inputs, unpredictably (e.g. unserialize() in php 8.0 interns array keys).
There's no (efficient) way to change trusted strings to untrusted or vice
versa, though there are inefficient workarounds (modifying a byte and restoring
it to stop trusting it, imploding single characters to create a trusted string)
This may done implicitly in frameworks using APCu/memcached/redis as a cache
(I definitely don't think the serialization data format should track
is_literal())
9. Future refactorings, optimizations or deoptimizations (or security fixes) to
unserialize(), etc. may unexpectedly break code using is_literal that throw
instead of warn (more bug reports, discourage users from upgrading, etc.)
10. This RFC adds an unknown amount of future work for php-src and PECLs to
*intuitively* support mapping trusted inputs to trusted outputs or vice versa -
less commonly used or unmaintained functions may not behave as expected for a
while
11. https://pecl.php.net/package/taint is available already for a use case with
some overlap for setups that need this
Aside: I'd have to wonder if ZSTR_IS_INTERNED (and the function to make an
interned string) would make sense to expose in a PECL as a regular `extension`
(not a `zend_extension`) if is_interned also fails.
Unlike the zend_extension for
https://www.php.net/manual/en/function.is-tainted.php ,
something simple may be possible without needing the performance hit and
future conflicts with XDebug that I assume
https://www.php.net/manual/en/function.is-tainted.php would be prone to.
(https://pecl.php.net/package/taint seems to use a separate bit to track this.
The latest release of the Taint pecl fixes XDebug compatibility)
- Other languages, such as Java, have exposed this for memory management
purposes (rather than security) though it's rarely used directly or in
frameworks, e.g.
https://docs.oracle.com/javase/10/docs/api/java/lang/String.html#intern()
(https://docs.python.org/3/library/sys.html#sys.intern)
Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php