On Mon, Sep 9, 2024 at 11:11 AM Tomas Kalibera <tomas.kalib...@gmail.com> wrote: > > > On 9/8/24 23:14, Jeroen Ooms wrote: > > On Mon, Sep 2, 2024 at 10:05 AM Tomas Kalibera <tomas.kalib...@gmail.com> > > wrote: > >> > >> On 4/25/24 17:01, Ivan Krylov via R-devel wrote: > >>> On Thu, 25 Apr 2024 14:45:04 +0200 > >>> Jeroen Ooms <jeroeno...@gmail.com> wrote: > >>> > >>>> Thoughts? > >>> How verboten would it be to create an empty external pointer object, > >>> add it to the preserved list, and set an on-exit finalizer to clean up > >>> the curl multi-handle? As far as I can tell, the internet module is not > >>> supposed to be unloaded, so this would not introduce an opportunity to > >>> jump to an unmapped address. This makes it possible to avoid adding a > >>> CurlCleanup() function to the internet module: > >> Cleaning up this way in principle would probably be fine, but R already > >> has support for re-using connections. Even more, R can download files in > >> parallel (in a single thread), which particularly helps with bigger > >> latencies (e.g. typically users connecting from home, etc). See > >> ?download.file(), look for "simultaneous". > > Thank you for looking at this. A few ideas wrt parallel downloading: > > > > Additional improvement on Windows can be achieved by enabling the > > nghttp2 driver in libcurl in rtools, such that it takes advantage of > > http2 multiplexing for parallel downloads > > (https://bugs.r-project.org/show_bug.cgi?id=18664). > > Anyone who wants to cooperate and help is more than welcome to > contribute patches to upstream MXE. > > In case of nghttp2, thanks to Andrew Johnson, who contributed nghttp2 > support to upstream MXE. It will be part of the next Rtools (probably > Rtools45). > > > Moreover, one concern is that install.packages() may fail more > > frequently on low bandwidth connections due to reaching the "download > > timeout" when downloading files in parallel: > > > > R has an unusual definition of the http timeout, which by default > > aborts in-progress downloads after 60 seconds for no obvious reason. > > (by contrast, browsers enforce a timeout on unresponsive/stalled > > downloads only, which can be achieved in libcurl by setting > > CURLOPT_CONNECTTIMEOUT or CURLOPT_LOW_SPEED_TIME). > > > > The above is already a problem on slow networks, where large packages > > can fail to install with a timeout error in the download stage. Users > > may assume there must be a problem with the network, as it is not > > obvious that machines on slower internet connection need to work > > around R's defaults and modify options(timeout) before > > install.packages(). This problem could become more prevalent when > > using parallel downloads while still enforcing the same total timeout. > > > > For example: the MacOS binary for package "sf" is close to 90mb, hence > > currently, under the default R settings of options(timeout=60), > > install.packages will error with a download timeout on clients with > > less than 1.5MB/s bandwidth. But with the parallel implementation, > > install.packages() will share the bandwidth on 6 parallel downloads, > > so if "sf" is downloaded with all its dependencies, we need at least > > 9MB/s (i.e. a 100mbit connection) for the default settings to not > > cause a timeout. > > > > Hopefully this can be revised to enforce the timeout on stalled > > downloads only, as is common practice. > > Yes, this is work in progress, I am aware that the timeout could use > some thought re simultaneous downloads.
OK that is good to hear. > If anyone wants to help with testing the current implementation of > simultaneous download and report any bugs found, that would be nice. R-universe has ran this a few thousand times to recheck packages on r-devel on both linux and windows, and it works well. It reduces the CI process by a few seconds, and there are less random connection failures. If you want to inspect some recent logs for yourself, click the rightmost column on https://r-universe.dev/builds and then on the GitHub Actions page, look under the "Build R-devel for Windows / Linux" runs to see the log files. I was also able to confirm an edge case that install.packages() does not abort if any of the dependencies fails to download with http-404, which I think is desired behavior. If there is anything else specifically that you would like to see tested I can look at that. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel