First, let's take a step back, because I think there is way too much confusion 
here.

The original report was about the vignette from the poweRlaw package version 
0.70.6. That package contains a vignette file d_jss_paper.pdf with the SHA256 
hash 9486d99c1c1f2d1b06f0b6c5d27c54d4f6e39d69a91d7fad845f323b0ab88de9 (md5 
e0439db551e1d34e9bf8713fca27887b). This is the same file that would be 
available for download from the web view until the new version was published. 
However, I assume we are talking about the same file based on the fact that 
Iñaki's VirusTotal URL has exactly the same hash, i.e., web view and the 
package are identical (I also checked the other hashes just to be really sure). 
That's why I think we're barking up the wrong tree here since this is not about 
cache poisoning, file swaps or anything like that - the file has never been 
modified - it is the same file that has been submitted to CRAN in 2020.

That's why I was saying that this most likely has nothing to do with CRAN at 
all, but rather the question is if that old file has included some malware for 
the last 4 years or if simply the AV software is misclassifying due to a 
false-positive detection. I'm not a security expert, but based on the little 
information available and inspection of the streams I came to the conclusion 
that it's likely a false-positive. The main reason that made me think so was 
that submitting the exact same *identical* PDF payload with just one-byte 
change to the /ID (which is functionally not used by Acrobat) results in the 
file NOT being flagged as malicious by VirusTotal by any of the security 
vendors. That said, I'm not a security expert, so I may be wrong or I'm missing 
something, that's why I was asking for someone with more expertise to actually 
look at the file as opposed to just trusting auto-generated reports that may be 
wrong. But that is not beyond my power.

(Also if it turns out that the file did contain malware, it would be good to 
know what we can do - for example, nowadays we are re-compressing streams 
and/or filtering through GS so one could imagine that it could be also 
effective at removing PDF malware - if it is real.)

More responses inline.


> On Jan 28, 2024, at 1:10 AM, Bob Rudis <b...@rud.is> wrote:
> 
> Simon: Is there a historical record of the hashes of just the PDFs
> that show up in the CRAN web view?
> 

Not the website, but hashes are recorded in the packages - so you can verify 
that the file has not changed for years (I can directly confirm it has not 
changed as far back as May 2021).


> Ivan: do you know what mirror NOAA used at that time to get that version of
> the package? Or, did they pull it "directly" from cran.r-project.org
> (scare-quotes only b/c DNS spoofing is and has been a pretty solid attack
> vector)?
> 
> I've asked the infosec community if anyone has VT Enterprise to do a
> historical search on any PDFs that come directly from cran.r-project.org (I
> don't have VT Enterprise). It is possible there are other PDFs from that
> timeframe with similar issues (again, not saying CRAN had any issues; this
> could still be crawler cache poisoning).
> 
> I don't know if any university folks have grad student labor to harness,
> but having a few of them do some archive.org searches for other PDFs in
> that timeframe, and note the source of the archive (likely Common Crawl) if
> there are other real issues, that'd be a solid path forward for triage.
> 
> The fact that the current PDF on CRAN — which uses some of the same
> 7-year-old PDF & JPEG images from —
> https://github.com/csgillespie/poweRlaw/tree/main/vignettes — is not being
> flagged, means it's likely not an issue with Colin's sources.
> 
> Simon: it might be a good idea for all *.r-project.org sites to set up CAA
> records (
> https://en.wikipedia.org/wiki/DNS_Certification_Authority_Authorization)
> since that could help prevent adjacent TLS spoofing.
> 
> Also having something running — https://github.com/SSLMate/certspotter —
> can let y'all know if certs are created for *.r-project.org domains. That
> won't help for well-resourced attacks, but it does add some layers that may
> give a heads-up for any mid-grade spoofing attacks.
> 


All well meant, but remember that CRAN is mirrored worldwide, we have control 
pretty much only over the WU master. That said, we can have a look, but DNS 
changes are not as easy as you would think.

Cheers,
Simon

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to