Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
On Fri, 3 May 2024, Ivan Krylov via R-package-devel wrote: Dear Maciej Nasinski, On Fri, 3 May 2024 11:37:57 +0200 Maciej Nasinski wrote: I believe we must conduct a comprehensive review of all existing CRAN packages. Why now? R packages are already code. You don't need poisoned RDS files to wreak havoc using an R package. On the other hand, R data files contain R objects, which contain code. You don't need exploits to smuggle code inside an R object. I think the confusion arises because users expect "R data files" to only contain data, i.e. numbers, but they can contain any R object, including functions. I, personally, never use them out of concern that accidentally saved function can override some functionality and be difficult to debug. And, of course, I never save R sessions. If you need to pass data it is a good idea to use some common format like tab-separated CSV files with column names. One can also use MVL files (RMVL package). best Vladimir Dergachev __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
Yes, this may have hit the news as a problem but any code anywhere can be a security issue. If you want to read lots of R code and also the code for add-ins from libraries and compile everything from scratch with a trusted set of tools, and refuse to open any of the files being discussed and so on, and only use packages on your machine and already examined, sure. You can be a tad safer. But as shown for years, it is quite possible to obfuscate the code in many languages to the point where you may not easily figure out what the code will do! And most people cannot and will not read source code as at some point it is easier to do what they want another way. What is sort of new here is a level of indirection that happens because of the way you can store things in a file and read them in so they execute. But is it all that much more dangerous than regular R code that opens up some remote file or reads records from a database and then does an eval() on the random text? Having said that, this is a bit like the Virus Detection industry. You may scan files in endless ways to recognize a KNOWN signature and then find lots of false positives too. Obviously places like CRAN might be able to do a scan on files in packages, or maybe you could open files with a wrapper that checks the innards for known dangers. But unless this becomes a widely used exploitation before it is fixed, ... -Original Message- From: R-package-devel On Behalf Of Josiah Parry Sent: Friday, May 3, 2024 5:25 PM To: Ivan Krylov Cc: r-package-devel@r-project.org Subject: Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit I agree with Ivan here. And more generally, R is a fully featured programming language. You don't need just this one "exploit" (though, it really does feel like a feature to some degree lol!) to be a bad guy with R. You can link to a pre-compiled binary (like my team makes for an R package that contains proprietary code https://github.com/R-ArcGIS/r-bridge/tree/master/libs/x64) and call completely compiled function that have bad side effects. You can initialize a logger in `.onLoad()` or have a function that sends your data to someone using httr quietly while doing something actually useful. There are also fairly widely used R packages that exist on GitHub/Lab or r-universe or elsewhere. You'd be taking on a sisyphean task trying to route out all the evil code from the R world. There's also likely little to none of it (shouts out to CRAN maintainers for being really good at what they do even if it does grind my gears sometimes ) On Fri, May 3, 2024 at 4:57 PM Ivan Krylov via R-package-devel < r-package-devel@r-project.org> wrote: > On Fri, 3 May 2024 18:17:52 +0200 > Maciej Nasinski wrote: > > > I found the https://github.com/hrbrmstr/rdaradar solution and ran it > > on the 100 most downloaded R packages. > > Happily, all data/inst rda files are safe/non-exposed to RDS exploit > > (using the linked solution). > > This is a bit useful - knowing that there are no obvious exploits in > the 100 most downloaded CRAN packages is better that not knowing that - > but it is important to keep the big picture in mind. Bob himself said > that the script is "super basic". Currently, it only checks whether an > *.rda file, when loaded in the global environment, would shadow certain > important functions. This is not an attack a package author would > perform; this is something one would send directly to the victim. > > In order to defeat an attacker, you must think like an attacker. > > Here's someone jokingly describing how they would trojan the world's > online shop checkout systems if they wanted to commit financial crimes: > https://archive.ph/FCdBu > (With kindness and pull requests.) > > Here's someone spending two years to plant a fake maintainer with a > backdoor in a key free software project: > https://lwn.net/Articles/967192/ > (The backdoor was assembled from obfuscated "test files for the > decompressor".) > > Here's the 2015 Underhanded C Contest, where people competed in writing > the most harmless-looking code that would instead do something > nefarious: http://www.underhanded-c.org/ > > On the one hand, hiding the bad functions in a data file (which is > compressed and binary) instead of the R files (which are plain text and > indexed everywhere) would be the obvious first step, so it may be > useful to flag data files with functions in them for human review. > > On the other hand, an evil package author has so many tools at their > disposal that they may not need this one in particular. There are CRAN > packages with tens of megabytes of compiled code inside. Sneaking a > little extra something in a file starting with "// This is generated > grammar parser. Do not edit!" followed by an impenetrable wall of C > could be easier and stay undetected for longer. How many packages use > Java? You don't even have to ship the Java source together with an R >
Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
I agree with Ivan here. And more generally, R is a fully featured programming language. You don't need just this one "exploit" (though, it really does feel like a feature to some degree lol!) to be a bad guy with R. You can link to a pre-compiled binary (like my team makes for an R package that contains proprietary code https://github.com/R-ArcGIS/r-bridge/tree/master/libs/x64) and call completely compiled function that have bad side effects. You can initialize a logger in `.onLoad()` or have a function that sends your data to someone using httr quietly while doing something actually useful. There are also fairly widely used R packages that exist on GitHub/Lab or r-universe or elsewhere. You'd be taking on a sisyphean task trying to route out all the evil code from the R world. There's also likely little to none of it (shouts out to CRAN maintainers for being really good at what they do even if it does grind my gears sometimes ) On Fri, May 3, 2024 at 4:57 PM Ivan Krylov via R-package-devel < r-package-devel@r-project.org> wrote: > On Fri, 3 May 2024 18:17:52 +0200 > Maciej Nasinski wrote: > > > I found the https://github.com/hrbrmstr/rdaradar solution and ran it > > on the 100 most downloaded R packages. > > Happily, all data/inst rda files are safe/non-exposed to RDS exploit > > (using the linked solution). > > This is a bit useful - knowing that there are no obvious exploits in > the 100 most downloaded CRAN packages is better that not knowing that - > but it is important to keep the big picture in mind. Bob himself said > that the script is "super basic". Currently, it only checks whether an > *.rda file, when loaded in the global environment, would shadow certain > important functions. This is not an attack a package author would > perform; this is something one would send directly to the victim. > > In order to defeat an attacker, you must think like an attacker. > > Here's someone jokingly describing how they would trojan the world's > online shop checkout systems if they wanted to commit financial crimes: > https://archive.ph/FCdBu > (With kindness and pull requests.) > > Here's someone spending two years to plant a fake maintainer with a > backdoor in a key free software project: > https://lwn.net/Articles/967192/ > (The backdoor was assembled from obfuscated "test files for the > decompressor".) > > Here's the 2015 Underhanded C Contest, where people competed in writing > the most harmless-looking code that would instead do something > nefarious: http://www.underhanded-c.org/ > > On the one hand, hiding the bad functions in a data file (which is > compressed and binary) instead of the R files (which are plain text and > indexed everywhere) would be the obvious first step, so it may be > useful to flag data files with functions in them for human review. > > On the other hand, an evil package author has so many tools at their > disposal that they may not need this one in particular. There are CRAN > packages with tens of megabytes of compiled code inside. Sneaking a > little extra something in a file starting with "// This is generated > grammar parser. Do not edit!" followed by an impenetrable wall of C > could be easier and stay undetected for longer. How many packages use > Java? You don't even have to ship the Java source together with an R > package, so one of your *.jars could have a poisoned dependency with > nobody being the wiser. > > Attackers are very cunning, and we don't even know what exactly we are > looking for. We can automate some of it, but the kind of code review > that will spot an evil function tucked 50 layers inside a giant > auxiliary data object is a lot of effort, hours to days per package. > > > It will be great to run it on all CRAN packages, but I imagine we > > should be sure that the check is decent enough to not overload the > > servers without a need. > > This probably counts as creating an unofficial CRAN mirror: > https://cran.r-project.org/mirror-howto.html > > (I remember someone sending too many requests to download packages one > my one and losing access from a university address to CRAN as a result.) > > You'll need 12.7 Gb for the current versions of the packages or >400 Gb > for the whole archive. > > -- > Best regards, > Ivan > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel > [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
On Fri, 3 May 2024 18:17:52 +0200 Maciej Nasinski wrote: > I found the https://github.com/hrbrmstr/rdaradar solution and ran it > on the 100 most downloaded R packages. > Happily, all data/inst rda files are safe/non-exposed to RDS exploit > (using the linked solution). This is a bit useful - knowing that there are no obvious exploits in the 100 most downloaded CRAN packages is better that not knowing that - but it is important to keep the big picture in mind. Bob himself said that the script is "super basic". Currently, it only checks whether an *.rda file, when loaded in the global environment, would shadow certain important functions. This is not an attack a package author would perform; this is something one would send directly to the victim. In order to defeat an attacker, you must think like an attacker. Here's someone jokingly describing how they would trojan the world's online shop checkout systems if they wanted to commit financial crimes: https://archive.ph/FCdBu (With kindness and pull requests.) Here's someone spending two years to plant a fake maintainer with a backdoor in a key free software project: https://lwn.net/Articles/967192/ (The backdoor was assembled from obfuscated "test files for the decompressor".) Here's the 2015 Underhanded C Contest, where people competed in writing the most harmless-looking code that would instead do something nefarious: http://www.underhanded-c.org/ On the one hand, hiding the bad functions in a data file (which is compressed and binary) instead of the R files (which are plain text and indexed everywhere) would be the obvious first step, so it may be useful to flag data files with functions in them for human review. On the other hand, an evil package author has so many tools at their disposal that they may not need this one in particular. There are CRAN packages with tens of megabytes of compiled code inside. Sneaking a little extra something in a file starting with "// This is generated grammar parser. Do not edit!" followed by an impenetrable wall of C could be easier and stay undetected for longer. How many packages use Java? You don't even have to ship the Java source together with an R package, so one of your *.jars could have a poisoned dependency with nobody being the wiser. Attackers are very cunning, and we don't even know what exactly we are looking for. We can automate some of it, but the kind of code review that will spot an evil function tucked 50 layers inside a giant auxiliary data object is a lot of effort, hours to days per package. > It will be great to run it on all CRAN packages, but I imagine we > should be sure that the check is decent enough to not overload the > servers without a need. This probably counts as creating an unofficial CRAN mirror: https://cran.r-project.org/mirror-howto.html (I remember someone sending too many requests to download packages one my one and losing access from a university address to CRAN as a result.) You'll need 12.7 Gb for the current versions of the packages or >400 Gb for the whole archive. -- Best regards, Ivan __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Error handling in C code
Most functions in R have a prefix on their name, with aliases defined so you can use the function without the prefix. But you can turn off the aliasing, in which case you need the true name. I think for all of the functions you list the prefix is "Rf_", so they are "Rf_error", etc. Perhaps you turned off the aliasing? Duncan Murdoch On 03/05/2024 11:17 a.m., Jarrod Hadfield wrote: Hi, I have an R library with C code in it. It has failed the CRAN checks for Debian. The problem is with the error function being undefined. Section 6.2 of the Writing R extensions (see below) suggests error handling can be handled by error and the appropriate header file is included in R.h, but this seems not to be the case? Any help would be appreciated! Thanks, Jarrod 6.2 Error signaling The basic error signaling routines are the equivalents of stop and warning in R code, and use the same interface. void error(const char * format, ...); void warning(const char * format, ...); void errorcall(SEXP call, const char * format, ...); void warningcall(SEXP call, const char * format, ...); void warningcall_immediate(SEXP call, const char * format, ...); These have the same call sequences as calls to printf, but in the simplest case can be called with a single character string argument giving the error message. (Don�t do this if the string contains �%� or might otherwise be interpreted as a format.) These are defined in header R_ext/Error.h included by R.h. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th� ann an Oilthigh Dh�n �ideann, cl�raichte an Alba, �ireamh cl�raidh SC005336. [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[R-pkg-devel] Error handling in C code
Hi, I have an R library with C code in it. It has failed the CRAN checks for Debian. The problem is with the error function being undefined. Section 6.2 of the Writing R extensions (see below) suggests error handling can be handled by error and the appropriate header file is included in R.h, but this seems not to be the case? Any help would be appreciated! Thanks, Jarrod 6.2 Error signaling The basic error signaling routines are the equivalents of stop and warning in R code, and use the same interface. void error(const char * format, ...); void warning(const char * format, ...); void errorcall(SEXP call, const char * format, ...); void warningcall(SEXP call, const char * format, ...); void warningcall_immediate(SEXP call, const char * format, ...); These have the same call sequences as calls to printf, but in the simplest case can be called with a single character string argument giving the error message. (Don�t do this if the string contains �%� or might otherwise be interpreted as a format.) These are defined in header R_ext/Error.h included by R.h. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th� ann an Oilthigh Dh�n �ideann, cl�raichte an Alba, �ireamh cl�raidh SC005336. [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
Hey All, Once more, Ivan, thank you for your great blog post. I found the https://github.com/hrbrmstr/rdaradar solution and ran it on the 100 most downloaded R packages. Happily, all data/inst rda files are safe/non-exposed to RDS exploit (using the linked solution). Please access my fork for the results https://github.com/Polkas/rdaradar/blob/main/cran_top_results.txt and the run https://github.com/Polkas/rdaradar/blob/main/iter_all.R It will be great to run it on all CRAN packages, but I imagine we should be sure that the check is decent enough to not overload the servers without a need. KR Maciej Nasinski University of Warsaw On Fri, 3 May 2024 at 12:23, Maciej Nasinski wrote: > Dear Ivan, > > Your blog post is fantastic and I already start to promote it on LinkedIn > with full credit to you. > > KR > Maciej Nasinski > University of Warsaw > > > On 3 May 2024, at 12:04, Maciej Nasinski > wrote: > > > > Dear Ivan, > > > > Thank you for such a quick response. > > “It may be worth teaching people that, in general, R data files should be > > as trusted as R code.” I totally agree and that why I wrote that any > code can be dangerous if run without proper scrutiny. > > A few linkedin post generated most probably by Chat GPT (a lot of icons > in them) make a lot of harm lastly. For sure I will try to make a post in > my community and will remind that any code can be dangerous. > > > > BTW. we can limit the possible scan with crandb downloads stats to only > those which have more than x downloads a day:) I image it will be a > demanding project. > > > > KR > > Maciej Nasinski > > University of Warsaw > > > >> On 3 May 2024, at 11:52, Ivan Krylov wrote: > >> > >> It may be worth teaching people that in general, R data files should be > >> as trusted as R code. > [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
Dear Ivan, Your blog post is fantastic and I already start to promote it on LinkedIn with full credit to you. KR Maciej Nasinski University of Warsaw > On 3 May 2024, at 12:04, Maciej Nasinski wrote: > > Dear Ivan, > > Thank you for such a quick response. > “It may be worth teaching people that, in general, R data files should be > as trusted as R code.” I totally agree and that why I wrote that any code can > be dangerous if run without proper scrutiny. > A few linkedin post generated most probably by Chat GPT (a lot of icons in > them) make a lot of harm lastly. For sure I will try to make a post in my > community and will remind that any code can be dangerous. > > BTW. we can limit the possible scan with crandb downloads stats to only those > which have more than x downloads a day:) I image it will be a demanding > project. > > KR > Maciej Nasinski > University of Warsaw > >> On 3 May 2024, at 11:52, Ivan Krylov wrote: >> >> It may be worth teaching people that in general, R data files should be >> as trusted as R code. __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
Dear Ivan, Thank you for such a quick response. “It may be worth teaching people that, in general, R data files should be as trusted as R code.” I totally agree and that why I wrote that any code can be dangerous if run without proper scrutiny. A few linkedin post generated most probably by Chat GPT (a lot of icons in them) make a lot of harm lastly. For sure I will try to make a post in my community and will remind that any code can be dangerous. BTW. we can limit the possible scan with crandb downloads stats to only those which have more than x downloads a day:) I image it will be a demanding project. KR Maciej Nasinski University of Warsaw > On 3 May 2024, at 11:52, Ivan Krylov wrote: > > It may be worth teaching people that in general, R data files should be > as trusted as R code. __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
Dear Maciej Nasinski, On Fri, 3 May 2024 11:37:57 +0200 Maciej Nasinski wrote: > I believe we must conduct a comprehensive review of all existing CRAN > packages. Why now? R packages are already code. You don't need poisoned RDS files to wreak havoc using an R package. On the other hand, R data files contain R objects, which contain code. You don't need exploits to smuggle code inside an R object. > Additionally, I will expect an introduction of an additional > step in the R CMD check process. What exactly would you like this step to be? > It is stated that R Team is aware of > that, and the exploit is fixed in R 4.4.0, but I can not find any > clear bullet point in the NEWS file for 4.4.0 > (https://cran.r-project.org/doc/manuals/r-release/NEWS.html). This has recently been discussed in the R-help thread: https://stat.ethz.ch/pipermail/r-help/2024-May/479287.html > I look forward to your thoughts and collaborating closely on this > urgent review. It may be worth teaching people that in general, R data files should be as trusted as R code. It may also be worth setting aside a strict subset of the R data format to carry data only, without any executable code [*], but it may turn out to be much less useful than it sounds. For example, you won't be able to save many kinds of model objects using this plain data format, which makes it unrealistic to require plain data only inside data files in CRAN packages. An independent review of the whole >2 packages on CRAN for malicious behaviour is a noble endeavour, but it will require people and funding. Perhaps you could try to apply for an R Consortium infrastructure grant to do that. -- Best regards, Ivan [*] https://aitap.github.io/2024/05/02/unserialize.html#subset __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit
I hope this message finds you well. Following the recent announcement of a vulnerability related to the RDS exploit in R (https://hiddenlayer.com/research/r-bitrary-code-execution/). Recent discussions on social media have raised concerns about the credibility of the R language. Any code, including pure R code, can potentially be malicious if it is executed without proper scrutiny. It is worth noting that a similar problem was reported for the Python pickle a few years ago: https://hiddenlayer.com/research/weaponizing-machine-learning-models-with-ransomware/#Exploiting-Serialization. In my opinion, not an exploit is a central problem, but if it is introduced in any CRAN package. I believe we must conduct a comprehensive review of all existing CRAN packages. Additionally, I will expect an introduction of an additional step in the R CMD check process. It is stated that R Team is aware of that, and the exploit is fixed in R 4.4.0, but I can not find any clear bullet point in the NEWS file for 4.4.0 (https://cran.r-project.org/doc/manuals/r-release/NEWS.html). I look forward to your thoughts and collaborating closely on this urgent review. KR Maciej Nasinski University of Warsaw __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel