Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread Vladimir Dergachev




On Fri, 3 May 2024, Ivan Krylov via R-package-devel wrote:


Dear Maciej Nasinski,

On Fri, 3 May 2024 11:37:57 +0200
Maciej Nasinski  wrote:


I believe we must conduct a comprehensive review of all existing CRAN
packages.


Why now? R packages are already code. You don't need poisoned RDS files
to wreak havoc using an R package.

On the other hand, R data files contain R objects, which contain code.
You don't need exploits to smuggle code inside an R object.



I think the confusion arises because users expect "R data files" to only 
contain data, i.e. numbers, but they can contain any R object, including 
functions.


I, personally, never use them out of concern that accidentally saved 
function can override some functionality and be difficult to debug. And, 
of course, I never save R sessions.


If you need to pass data it is a good idea to use some common format like 
tab-separated CSV files with column names. One can also use MVL files 
(RMVL package).


best

Vladimir Dergachev

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread avi.e.gross
Yes, this may have hit the news as a problem but any code anywhere can be a 
security issue.

If you want to read lots of R code and also the code for add-ins from libraries 
and compile everything from scratch with a  trusted set of tools, and refuse to 
open any of the files being discussed and so on, and only use packages on your 
machine and already examined, sure. You can be a tad safer.

But as shown for years, it is quite possible to obfuscate the code in many 
languages to the point where you may not easily figure out what the code will 
do! And most people cannot and will not read source code as at some point it is 
easier to do what they want another way.

What is sort of new here is a level of indirection that happens because of the 
way you can store things in a file and read them in so they execute. But is it 
all that much more dangerous than regular R code that opens up some remote file 
or reads records from a database and then does an eval() on the random text?

Having said that, this is a bit like the Virus Detection industry. You may scan 
files in endless ways to recognize a KNOWN signature and then find lots of 
false positives too. Obviously places like CRAN might be able to do a scan on 
files in packages, or maybe you could open files with a wrapper that checks the 
innards for known dangers. But unless this becomes a widely used exploitation 
before it is fixed, ...


-Original Message-
From: R-package-devel  On Behalf Of 
Josiah Parry
Sent: Friday, May 3, 2024 5:25 PM
To: Ivan Krylov 
Cc: r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS 
Exploit

I agree with Ivan here. And more generally, R is a fully featured
programming language. You don't need just this one "exploit" (though, it
really does feel like a feature to some degree lol!) to be a bad guy with
R.

You can link to a pre-compiled binary (like my team makes for an R package
that contains proprietary code
https://github.com/R-ArcGIS/r-bridge/tree/master/libs/x64) and call
completely compiled function that have bad side effects. You can initialize
a logger in `.onLoad()` or have a function that sends your data to someone
using httr quietly while doing something actually useful.

There are also fairly widely used R packages that exist on GitHub/Lab or
r-universe or elsewhere.

You'd be taking on a  sisyphean task trying to route out all the evil code
from the R world.
There's also likely little to none of it (shouts out to CRAN maintainers
for being really good at what they do even if it does grind my gears
sometimes )



On Fri, May 3, 2024 at 4:57 PM Ivan Krylov via R-package-devel <
r-package-devel@r-project.org> wrote:

> On Fri, 3 May 2024 18:17:52 +0200
> Maciej Nasinski  wrote:
>
> > I found the https://github.com/hrbrmstr/rdaradar solution and ran it
> > on the 100 most downloaded R packages.
> > Happily, all data/inst rda files are safe/non-exposed to RDS exploit
> > (using the linked solution).
>
> This is a bit useful - knowing that there are no obvious exploits in
> the 100 most downloaded CRAN packages is better that not knowing that -
> but it is important to keep the big picture in mind. Bob himself said
> that the script is "super basic". Currently, it only checks whether an
> *.rda file, when loaded in the global environment, would shadow certain
> important functions. This is not an attack a package author would
> perform; this is something one would send directly to the victim.
>
> In order to defeat an attacker, you must think like an attacker.
>
> Here's someone jokingly describing how they would trojan the world's
> online shop checkout systems if they wanted to commit financial crimes:
> https://archive.ph/FCdBu
> (With kindness and pull requests.)
>
> Here's someone spending two years to plant a fake maintainer with a
> backdoor in a key free software project:
> https://lwn.net/Articles/967192/
> (The backdoor was assembled from obfuscated "test files for the
> decompressor".)
>
> Here's the 2015 Underhanded C Contest, where people competed in writing
> the most harmless-looking code that would instead do something
> nefarious: http://www.underhanded-c.org/
>
> On the one hand, hiding the bad functions in a data file (which is
> compressed and binary) instead of the R files (which are plain text and
> indexed everywhere) would be the obvious first step, so it may be
> useful to flag data files with functions in them for human review.
>
> On the other hand, an evil package author has so many tools at their
> disposal that they may not need this one in particular. There are CRAN
> packages with tens of megabytes of compiled code inside. Sneaking a
> little extra something in a file starting with "// This is generated
> grammar parser. Do not edit!" followed by an impenetrable wall of C
> could be easier and stay undetected for longer. How many packages use
> Java? You don't even have to ship the Java source together with an R
> 

Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread Josiah Parry
I agree with Ivan here. And more generally, R is a fully featured
programming language. You don't need just this one "exploit" (though, it
really does feel like a feature to some degree lol!) to be a bad guy with
R.

You can link to a pre-compiled binary (like my team makes for an R package
that contains proprietary code
https://github.com/R-ArcGIS/r-bridge/tree/master/libs/x64) and call
completely compiled function that have bad side effects. You can initialize
a logger in `.onLoad()` or have a function that sends your data to someone
using httr quietly while doing something actually useful.

There are also fairly widely used R packages that exist on GitHub/Lab or
r-universe or elsewhere.

You'd be taking on a  sisyphean task trying to route out all the evil code
from the R world.
There's also likely little to none of it (shouts out to CRAN maintainers
for being really good at what they do even if it does grind my gears
sometimes )



On Fri, May 3, 2024 at 4:57 PM Ivan Krylov via R-package-devel <
r-package-devel@r-project.org> wrote:

> On Fri, 3 May 2024 18:17:52 +0200
> Maciej Nasinski  wrote:
>
> > I found the https://github.com/hrbrmstr/rdaradar solution and ran it
> > on the 100 most downloaded R packages.
> > Happily, all data/inst rda files are safe/non-exposed to RDS exploit
> > (using the linked solution).
>
> This is a bit useful - knowing that there are no obvious exploits in
> the 100 most downloaded CRAN packages is better that not knowing that -
> but it is important to keep the big picture in mind. Bob himself said
> that the script is "super basic". Currently, it only checks whether an
> *.rda file, when loaded in the global environment, would shadow certain
> important functions. This is not an attack a package author would
> perform; this is something one would send directly to the victim.
>
> In order to defeat an attacker, you must think like an attacker.
>
> Here's someone jokingly describing how they would trojan the world's
> online shop checkout systems if they wanted to commit financial crimes:
> https://archive.ph/FCdBu
> (With kindness and pull requests.)
>
> Here's someone spending two years to plant a fake maintainer with a
> backdoor in a key free software project:
> https://lwn.net/Articles/967192/
> (The backdoor was assembled from obfuscated "test files for the
> decompressor".)
>
> Here's the 2015 Underhanded C Contest, where people competed in writing
> the most harmless-looking code that would instead do something
> nefarious: http://www.underhanded-c.org/
>
> On the one hand, hiding the bad functions in a data file (which is
> compressed and binary) instead of the R files (which are plain text and
> indexed everywhere) would be the obvious first step, so it may be
> useful to flag data files with functions in them for human review.
>
> On the other hand, an evil package author has so many tools at their
> disposal that they may not need this one in particular. There are CRAN
> packages with tens of megabytes of compiled code inside. Sneaking a
> little extra something in a file starting with "// This is generated
> grammar parser. Do not edit!" followed by an impenetrable wall of C
> could be easier and stay undetected for longer. How many packages use
> Java? You don't even have to ship the Java source together with an R
> package, so one of your *.jars could have a poisoned dependency with
> nobody being the wiser.
>
> Attackers are very cunning, and we don't even know what exactly we are
> looking for. We can automate some of it, but the kind of code review
> that will spot an evil function tucked 50 layers inside a giant
> auxiliary data object is a lot of effort, hours to days per package.
>
> > It will be great to run it on all CRAN packages, but I imagine we
> > should be sure that the check is decent enough to not overload the
> > servers without a need.
>
> This probably counts as creating an unofficial CRAN mirror:
> https://cran.r-project.org/mirror-howto.html
>
> (I remember someone sending too many requests to download packages one
> my one and losing access from a university address to CRAN as a result.)
>
> You'll need 12.7 Gb for the current versions of the packages or >400 Gb
> for the whole archive.
>
> --
> Best regards,
> Ivan
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread Ivan Krylov via R-package-devel
On Fri, 3 May 2024 18:17:52 +0200
Maciej Nasinski  wrote:

> I found the https://github.com/hrbrmstr/rdaradar solution and ran it
> on the 100 most downloaded R packages.
> Happily, all data/inst rda files are safe/non-exposed to RDS exploit
> (using the linked solution).

This is a bit useful - knowing that there are no obvious exploits in
the 100 most downloaded CRAN packages is better that not knowing that - 
but it is important to keep the big picture in mind. Bob himself said
that the script is "super basic". Currently, it only checks whether an
*.rda file, when loaded in the global environment, would shadow certain
important functions. This is not an attack a package author would
perform; this is something one would send directly to the victim.

In order to defeat an attacker, you must think like an attacker.

Here's someone jokingly describing how they would trojan the world's
online shop checkout systems if they wanted to commit financial crimes:
https://archive.ph/FCdBu
(With kindness and pull requests.)

Here's someone spending two years to plant a fake maintainer with a
backdoor in a key free software project:
https://lwn.net/Articles/967192/
(The backdoor was assembled from obfuscated "test files for the
decompressor".)

Here's the 2015 Underhanded C Contest, where people competed in writing
the most harmless-looking code that would instead do something
nefarious: http://www.underhanded-c.org/

On the one hand, hiding the bad functions in a data file (which is
compressed and binary) instead of the R files (which are plain text and
indexed everywhere) would be the obvious first step, so it may be
useful to flag data files with functions in them for human review.

On the other hand, an evil package author has so many tools at their
disposal that they may not need this one in particular. There are CRAN
packages with tens of megabytes of compiled code inside. Sneaking a
little extra something in a file starting with "// This is generated
grammar parser. Do not edit!" followed by an impenetrable wall of C
could be easier and stay undetected for longer. How many packages use
Java? You don't even have to ship the Java source together with an R
package, so one of your *.jars could have a poisoned dependency with
nobody being the wiser.

Attackers are very cunning, and we don't even know what exactly we are
looking for. We can automate some of it, but the kind of code review
that will spot an evil function tucked 50 layers inside a giant
auxiliary data object is a lot of effort, hours to days per package.

> It will be great to run it on all CRAN packages, but I imagine we
> should be sure that the check is decent enough to not overload the
> servers without a need.

This probably counts as creating an unofficial CRAN mirror:
https://cran.r-project.org/mirror-howto.html

(I remember someone sending too many requests to download packages one
my one and losing access from a university address to CRAN as a result.)

You'll need 12.7 Gb for the current versions of the packages or >400 Gb
for the whole archive.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Error handling in C code

2024-05-03 Thread Duncan Murdoch
Most functions in R have a prefix on their name, with aliases defined so 
you can use the function without the prefix.  But you can turn off the 
aliasing, in which case you need the true name.  I think for all of the 
functions you list the prefix is "Rf_", so they are "Rf_error", etc.


Perhaps you turned off the aliasing?

Duncan Murdoch

On 03/05/2024 11:17 a.m., Jarrod Hadfield wrote:

Hi,

I have an R library with C code in it. It has failed the CRAN checks for 
Debian.  The problem is with the error function being undefined. Section 6.2 of 
the Writing R extensions (see below) suggests error handling can be handled by 
error and the appropriate header file is included in R.h, but this seems not to 
be the case?

Any help would be appreciated!

Thanks,

Jarrod

6.2 Error signaling

The basic error signaling routines are the equivalents of stop and warning in R 
code, and use the same interface.

void error(const char * format, ...);
void warning(const char * format, ...);
void errorcall(SEXP call, const char * format, ...);
void warningcall(SEXP call, const char * format, ...);
void warningcall_immediate(SEXP call, const char * format, ...);

These have the same call sequences as calls to printf, but in the simplest case 
can be called with a single character string argument giving the error message. 
(Don�t do this if the string contains �%� or might otherwise be interpreted as 
a format.)

These are defined in header R_ext/Error.h included by R.h.
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. Is e buidheann carthannais a th� ann an Oilthigh 
Dh�n �ideann, cl�raichte an Alba, �ireamh cl�raidh SC005336.

[[alternative HTML version deleted]]


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Error handling in C code

2024-05-03 Thread Jarrod Hadfield
Hi,

I have an R library with C code in it. It has failed the CRAN checks for 
Debian.  The problem is with the error function being undefined. Section 6.2 of 
the Writing R extensions (see below) suggests error handling can be handled by 
error and the appropriate header file is included in R.h, but this seems not to 
be the case?

Any help would be appreciated!

Thanks,

Jarrod

6.2 Error signaling

The basic error signaling routines are the equivalents of stop and warning in R 
code, and use the same interface.

void error(const char * format, ...);
void warning(const char * format, ...);
void errorcall(SEXP call, const char * format, ...);
void warningcall(SEXP call, const char * format, ...);
void warningcall_immediate(SEXP call, const char * format, ...);

These have the same call sequences as calls to printf, but in the simplest case 
can be called with a single character string argument giving the error message. 
(Don�t do this if the string contains �%� or might otherwise be interpreted as 
a format.)

These are defined in header R_ext/Error.h included by R.h.
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. Is e buidheann carthannais a th� ann an Oilthigh 
Dh�n �ideann, cl�raichte an Alba, �ireamh cl�raidh SC005336.

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread Maciej Nasinski
Hey All,

Once more, Ivan, thank you for your great blog post.
I found the https://github.com/hrbrmstr/rdaradar solution and ran it on the
100 most downloaded R packages.
Happily, all data/inst rda files are safe/non-exposed to RDS exploit (using
the linked solution).
Please access my fork for the results
https://github.com/Polkas/rdaradar/blob/main/cran_top_results.txt and the
run https://github.com/Polkas/rdaradar/blob/main/iter_all.R

It will be great to run it on all CRAN packages, but I imagine we should be
sure that the check is decent enough to not overload the servers without a
need.

KR
Maciej Nasinski
University of Warsaw

On Fri, 3 May 2024 at 12:23, Maciej Nasinski 
wrote:

> Dear Ivan,
>
> Your blog post is fantastic and I already start to promote it on LinkedIn
> with full credit to you.
>
> KR
> Maciej Nasinski
> University of Warsaw
>
> > On 3 May 2024, at 12:04, Maciej Nasinski 
> wrote:
> >
> > Dear Ivan,
> >
> > Thank you for such a quick response.
> > “It may be worth teaching people that, in general, R data files should be
> > as trusted as R code.” I totally agree and that why I wrote that any
> code can be dangerous if run without proper scrutiny.
> > A few linkedin post generated most probably by Chat GPT (a lot of icons
> in them) make a lot of harm lastly. For sure I will try to make a post in
> my community and will remind that any code can be dangerous.
> >
> > BTW. we can limit the possible scan with crandb downloads stats to only
> those which have more than x downloads a day:) I image it will be a
> demanding project.
> >
> > KR
> > Maciej Nasinski
> > University of Warsaw
> >
> >> On 3 May 2024, at 11:52, Ivan Krylov  wrote:
> >>
> >> It may be worth teaching people that in general, R data files should be
> >> as trusted as R code.
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread Maciej Nasinski
Dear Ivan,

Your blog post is fantastic and I already start to promote it on LinkedIn with 
full credit to you.

KR
Maciej Nasinski
University of Warsaw

> On 3 May 2024, at 12:04, Maciej Nasinski  wrote:
> 
> Dear Ivan,
> 
> Thank you for such a quick response.
> “It may be worth teaching people that, in general, R data files should be
> as trusted as R code.” I totally agree and that why I wrote that any code can 
> be dangerous if run without proper scrutiny.
> A few linkedin post generated most probably by Chat GPT (a lot of icons in 
> them) make a lot of harm lastly. For sure I will try to make a post in my 
> community and will remind that any code can be dangerous.
> 
> BTW. we can limit the possible scan with crandb downloads stats to only those 
> which have more than x downloads a day:) I image it will be a demanding 
> project.
> 
> KR
> Maciej Nasinski
> University of Warsaw
> 
>> On 3 May 2024, at 11:52, Ivan Krylov  wrote:
>> 
>> It may be worth teaching people that in general, R data files should be
>> as trusted as R code.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread Maciej Nasinski
Dear Ivan,

Thank you for such a quick response.
“It may be worth teaching people that, in general, R data files should be
as trusted as R code.” I totally agree and that why I wrote that any code can 
be dangerous if run without proper scrutiny. 
A few linkedin post generated most probably by Chat GPT (a lot of icons in 
them) make a lot of harm lastly. For sure I will try to make a post in my 
community and will remind that any code can be dangerous. 

BTW. we can limit the possible scan with crandb downloads stats to only those 
which have more than x downloads a day:) I image it will be a demanding project.

KR
Maciej Nasinski
University of Warsaw

> On 3 May 2024, at 11:52, Ivan Krylov  wrote:
> 
> It may be worth teaching people that in general, R data files should be
> as trusted as R code.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread Ivan Krylov via R-package-devel
Dear Maciej Nasinski,

On Fri, 3 May 2024 11:37:57 +0200
Maciej Nasinski  wrote:

> I believe we must conduct a comprehensive review of all existing CRAN
> packages.

Why now? R packages are already code. You don't need poisoned RDS files
to wreak havoc using an R package.

On the other hand, R data files contain R objects, which contain code.
You don't need exploits to smuggle code inside an R object.

> Additionally, I will expect an introduction of an additional
> step in the R CMD check process.

What exactly would you like this step to be?

> It is stated that R Team is aware of
> that, and the exploit is fixed in R 4.4.0, but I can not find any
> clear bullet point in the NEWS file for 4.4.0
> (https://cran.r-project.org/doc/manuals/r-release/NEWS.html).

This has recently been discussed in the R-help thread:
https://stat.ethz.ch/pipermail/r-help/2024-May/479287.html

> I look forward to your thoughts and collaborating closely on this
> urgent review.

It may be worth teaching people that in general, R data files should be
as trusted as R code.

It may also be worth setting aside a strict subset of the R data format
to carry data only, without any executable code [*], but it may turn
out to be much less useful than it sounds. For example, you won't be
able to save many kinds of model objects using this plain data format,
which makes it unrealistic to require plain data only inside data files
in CRAN packages.

An independent review of the whole >2 packages on CRAN for
malicious behaviour is a noble endeavour, but it will require people
and funding. Perhaps you could try to apply for an R Consortium
infrastructure grant to do that.

-- 
Best regards,
Ivan

[*] https://aitap.github.io/2024/05/02/unserialize.html#subset

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread Maciej Nasinski
I hope this message finds you well.

Following the recent announcement of a vulnerability related to the
RDS exploit in R
(https://hiddenlayer.com/research/r-bitrary-code-execution/).
Recent discussions on social media have raised concerns about the
credibility of the R language. Any code, including pure R code, can
potentially be malicious if it is executed without proper scrutiny.
It is worth noting that a similar problem was reported for the Python
pickle a few years ago:
https://hiddenlayer.com/research/weaponizing-machine-learning-models-with-ransomware/#Exploiting-Serialization.

In my opinion, not an exploit is a central problem, but if it is
introduced in any CRAN package.

I believe we must conduct a comprehensive review of all existing CRAN
packages. Additionally, I will expect an introduction of an additional
step in the R CMD check process. It is stated that R Team is aware of
that, and the exploit is fixed in R 4.4.0, but I can not find any
clear bullet point in the NEWS file for 4.4.0
(https://cran.r-project.org/doc/manuals/r-release/NEWS.html).

I look forward to your thoughts and collaborating closely on this urgent review.

KR
Maciej Nasinski
University of Warsaw

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel