On Tue, Dec 5, 2023 at 9:43 AM Stefan Schiller via internals <internals@lists.php.net> wrote: > > On Mon, Dec 4, 2023 at 8:45 PM Alex <alexinbeij...@gmail.com> wrote: > > > > Stefan, > > > >> > >> My biggest concern is that this quirk can cause security issues in > >> user code. I came across this in the first place when discovering an > >> exploitable security vulnerability in an application. From my point of > >> view, this is not only about inconsistent behavior but also violates > >> the documentation for specific functions like mb_strstr. I agree that > >> a lot of mbstring operations should not be used on invalid strings, > >> and an exception seems to be an appropriate answer despite the huge BC > >> impact. > > > > > > 1) I'm sure the vulnerable application is proprietary, but can you at least > > tell us which mbstring functions were directly involved in this > > vulnerability, give an overview of the mechanism by which the exploit > > worked (scrubbed of any specific details about the application), and let us > > know the general nature of the resulting compromise? (i.e. > > denial-of-service, information disclosure, allowing users to impersonate > > other users, etc) > > > > Real-life examples of actual security impact provide far more compelling > > grounds for a BC break than hypothetical security impact. > > The case I am referring to involves mb_strpos, which is used to > determine the index of a specific character, and mb_substr, which is > used to extract the substring until the first occurrence of this > character. For this specific case, this results in a Cross-Site > Scripting (XSS) vulnerability, which would allow an attacker to > perform actions on behalf of an authenticated user. The actual code is > more complex, but the following lines should be a suitable > representation of the issue: > > $data = $_GET['data']; > $pos = mb_strpos($data, "<"); > $out = $pos ? mb_substr($data, 0, $pos) : $data; > echo $out; > > The developer's assumption here is that $out should never contain an > opening HTML tag, and it is thus safe to echo it back in the response. > Despite the fact that e.g. htmlspecialchars should be used to encode > the output, this assumption seems reasonable to me. However, due to > the inconsistent behavior, an attacker can break this assumption. If > $data, e.g., contains the sequence "\xf0AAA<b>", str_pos will consider > the index of "<" to be 4. However, mb_substr assumes the full sequence > to have a length of 4 (1 x 4-byte Unicode character + 3 x 1-byte > Unicode characters). Accordingly, the full sequence, including the > "<b>" tag, is reflected in the response.
This feels like an exact case of where mb_* functions shouldn't be used. HTML spec very clearly states that only "<" (\x3C LESS-THAN SIGN) is considered for html opening tags. I don't think there is a reason to consider multibyte strings here. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php