Hi Craig Francis,

> As an aside, only 4 of 23 'no' voters provided any comment as to why they
> voted that way on the mailing list, which I feel undermines the point of
> the Request For Comment process, with an additional 5 responding personally
> off-list after prompting. This makes it harder (or impossible) for points
> to be discussed and addressed.

1. My earlier comments about static analysis, and on behavior depending on 
whether opcache is enabled
2. This might prevent certain optimizations in the future. For example, 
currently, 1-byte strings are all interned to save memory.
    If is_literal was part of php prior to proposing that optimization, then 
that optimization may be rejected.
3. PHP's `string` type is used both for (hopefully) valid unicode strings and 
for low level operations on literal byte arrays (e.g. cryptogrophy).
    It seems really, really strange for a type system to track trustedness for 
a low level primitive to track byte arrays. (the php 6 unicode string 
refactoring failed)

    Further, if this were to be extended in the future beyond the original 
proposal (e.g. literal strings that are entirely digits are automatically 
interned or marked as trusted),
    this might open previously safe code acting on byte arrays to side channel 
issues such as timing attacks (https://en.wikipedia.org/wiki/Timing_attack)
4. Internal functions and userland polyfills for those functions may 
unintentionally differ significantly for the resulting taintedness,
    e.g. base64_decode in userland being built up byte by byte would end up 
being possibly untainted?
5. The fact that 1-byte strings are almost always interned seems like a 
noticeable inconsistency (though library authors can deal with it once they're 
aware of it), though for it to become an issue a library may need to take 
multiple strings as input
    (e.g. a contrived example`"echo -- " . $trustedPrefix . 
shell_escape($notTrusted)` for $trustedPrefix of "'" (or "\n") and $notTrusted 
of "; evaluate command"
6. Including it in core would make it harder to remove later if it interfered 
with opcache or jit work, or to migrate code to alternative interpreters for 
php if those were ever implemented (if frameworks were to extensively depend on 
is_literal)
7. Tracking whether a string is untrusted is a definition only suitable for a 
few (extremely common) formats for php. But for less common features, even 
stringified integers may be a problem (e.g. binary file formats, etc)
    
    This is relatively minor given that php is typically used in a web 
programming context with json or html or js/css output

    I'd think is_interned()/intern_string() is much closer to tracking 
something that corresponds with php's internals (e.g. and may allow saving 
memory in long-running processes which receive duplicate strings as input), 
though the 10 people who wanted fully featured trustedness checking would 
probably want is_literal instead
8. Serializing and unserializing data would lose information about trustedness 
of inputs, unpredictably (e.g. unserialize() in php 8.0 interns array keys).

    There's no (efficient) way to change trusted strings to untrusted or vice 
versa, though there are inefficient workarounds (modifying a byte and restoring 
it to stop trusting it, imploding single characters to create a trusted string)

    This may done implicitly in frameworks using APCu/memcached/redis as a cache

    (I definitely don't think the serialization data format should track 
is_literal())

9. Future refactorings, optimizations or deoptimizations (or security fixes) to 
unserialize(), etc. may unexpectedly break code using is_literal that throw 
instead of warn (more bug reports, discourage users from upgrading, etc.)
10. This RFC adds an unknown amount of future work for php-src and PECLs to 
*intuitively* support mapping trusted inputs to trusted outputs or vice versa - 
less commonly used or unmaintained functions may not behave as expected for a 
while
11. https://pecl.php.net/package/taint is available already for a use case with 
some overlap for setups that need this


Aside: I'd have to wonder if ZSTR_IS_INTERNED (and the function to make an 
interned string) would make sense to expose in a PECL as a regular `extension` 
(not a `zend_extension`) if is_interned also fails.
Unlike the zend_extension for 
https://www.php.net/manual/en/function.is-tainted.php ,
something simple may be possible without needing the performance hit and
future conflicts with XDebug that I assume 
https://www.php.net/manual/en/function.is-tainted.php would be prone to.
(https://pecl.php.net/package/taint seems to use a separate bit to track this. 
The latest release of the Taint pecl fixes XDebug compatibility)

- Other languages, such as Java, have exposed this for memory management 
purposes (rather than security) though it's rarely used directly or in 
frameworks, e.g. 
https://docs.oracle.com/javase/10/docs/api/java/lang/String.html#intern()
  (https://docs.python.org/3/library/sys.html#sys.intern)

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to