Instead of a separate file to store such a list, would it be an idea to add versions of the \href{}{} and \url{} markup commands that are skipped by the URL checks?
Best, Wolfgang >-----Original Message----- >From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer >Graves >Sent: Friday, 08 January, 2021 13:04 >To: r-devel@r-project.org >Subject: Re: [Rd] URL checks > > I also would be pleased to be allowed to provide "a list of known >false-positive/exceptions" to the URL tests. I've been challenged >multiple times regarding URLs that worked fine when I checked them. We >should not be required to do a partial lobotomy to pass R CMD check ;-) > > Spencer Graves > >On 2021-01-07 09:53, Hugo Gruson wrote: >> >> I encountered the same issue today with https://astrostatistics.psu.edu/. >> >> This is a trust chain issue, as explained here: >> https://whatsmychaincert.com/?astrostatistics.psu.edu. >> >> I've worked for a couple of years on a project to increase HTTPS >> adoption on the web and we noticed that this type of error is very >> common, and that website maintainers are often unresponsive to requests >> to fix this issue. >> >> Therefore, I totally agree with Kirill that a list of known >> false-positive/exceptions would be a great addition to save time to both >> the CRAN team and package developers. >> >> Hugo >> >> On 07/01/2021 15:45, Kirill Müller via R-devel wrote: >>> One other failure mode: SSL certificates trusted by browsers that are >>> not installed on the check machine, e.g. the "GEANT Vereniging" >>> certificate from https://relational.fit.cvut.cz/ . >>> >>> K >>> >>> On 07.01.21 12:14, Kirill Müller via R-devel wrote: >>>> Hi >>>> >>>> The URL checks in R CMD check test all links in the README and >>>> vignettes for broken or redirected links. In many cases this improves >>>> documentation, I see problems with this approach which I have >>>> detailed below. >>>> >>>> I'm writing to this mailing list because I think the change needs to >>>> happen in R's check routines. I propose to introduce an "allow-list" >>>> for URLs, to reduce the burden on both CRAN and package maintainers. >>>> >>>> Comments are greatly appreciated. >>>> >>>> Best regards >>>> >>>> Kirill >>>> >>>> # Problems with the detection of broken/redirected URLs >>>> >>>> ## 301 should often be 307, how to change? >>>> >>>> Many web sites use a 301 redirection code that probably should be a >>>> 307. For example, https://www.oracle.com and https://www.oracle.com/ >>>> both redirect to https://www.oracle.com/index.html with a 301. I >>>> suspect the company still wants oracle.com to be recognized as the >>>> primary entry point of their web presence (to reserve the right to >>>> move the redirection to a different location later), I haven't >>>> checked with their PR department though. If that's true, the redirect >>>> probably should be a 307, which should be fixed by their IT >>>> department which I haven't contacted yet either. >>>> >>>> $ curl -i https://www.oracle.com >>>> HTTP/2 301 >>>> server: AkamaiGHost >>>> content-length: 0 >>>> location: https://www.oracle.com/index.html >>>> ... >>>> >>>> ## User agent detection >>>> >>>> twitter.com responds with a 400 error for requests without a user >>>> agent string hinting at an accepted browser. >>>> >>>> $ curl -i https://twitter.com/ >>>> HTTP/2 400 >>>> ... >>>> <body>...<p>Please switch to a supported browser...</p>...</body> >>>> >>>> $ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux >>>> x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1 >>>> HTTP/2 200 >>>> >>>> # Impact >>>> >>>> While the latter problem *could* be fixed by supplying a browser-like >>>> user agent string, the former problem is virtually unfixable -- so >>>> many web sites should use 307 instead of 301 but don't. The above >>>> list is also incomplete -- think of unreliable links, HTTP links, >>>> other failure modes... >>>> >>>> This affects me as a package maintainer, I have the choice to either >>>> change the links to incorrect versions, or remove them altogether. >>>> >>>> I can also choose to explain each broken link to CRAN, this subjects >>>> the team to undue burden I think. Submitting a package with NOTEs >>>> delays the release for a package which I must release very soon to >>>> avoid having it pulled from CRAN, I'd rather not risk that -- hence I >>>> need to remove the link and put it back later. >>>> >>>> I'm aware of https://github.com/r-lib/urlchecker, this alleviates the >>>> problem but ultimately doesn't solve it. >>>> >>>> # Proposed solution >>>> >>>> ## Allow-list >>>> >>>> A file inst/URL that lists all URLs where failures are allowed -- >>>> possibly with a list of the HTTP codes accepted for that link. >>>> >>>> Example: >>>> >>>> https://oracle.com/ 301 >>>> https://twitter.com/drob/status/1224851726068527106 400 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel