Re: [Rd] Segfault when parsing UTF-8 text with srcrefs

Tomas Kalibera Fri, 31 May 2024 07:41:53 -0700


On 5/28/24 20:41, Tomas Kalibera wrote:

On 5/28/24 19:35, Hadley Wickham wrote:
Hi all,

When I run the following code, R segfaults:

text <- "×"
srcfile <- srcfilecopy("test.r", text)
parse(textConnection(text), srcfile = srcfile)

It doesn't segfault if text is ASCII, or it's not wrapped in
textConnection, or srcfile isn't set.
Thanks, this is because R parser doesn't support non-ASCII UTF-8outside string literals and comments, plus a missing bounds check. The"correct" result should be an R error, which I get in a debug build.

To be more precise, the current implementation of the parser allows abit more than that, but there are recommendations in WRE 1.1.5 "Packagesubdirectories" on (not) using non-ASCII characters in packages.

"×" (\ud7) is not an allowed symbol name and the current implementationshould throw an error.

The tokenizer ends up with a negative token and then when the parsedata are being finalized, creating a table of token names, there is anout of bounds access (yytname array). Probably the check should goright away into the tokenizer.


Fixed in R-devel.

Tomas


Tomas


Hadley


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Segfault when parsing UTF-8 text with srcrefs

Reply via email to