I'd like to raise this again now that 4.4 is out. Below is a more complete patch which includes a function to properly cleanup libcurl when R quits. Implementing this is a little tricky because libcurl is a separate "module" in R, perhaps there is a better way, but this works:
view: https://github.com/r-devel/r-svn/pull/166/files patch: https://github.com/r-devel/r-svn/pull/166.diff The old patch is still there as well, which is meant a minimal proof of concept to test the performance gains for reusing the connection: view: https://github.com/r-devel/r-svn/pull/155/files patch: https://github.com/r-devel/r-svn/pull/155.diff Performance gains are greatest on high-bandwidth servers when downloading many files from the same server (e.g. packages from a cran mirror). In such cases, currently over 90% of the total time is spent on initiating and tearing town a separate SSL connection for each file download. Thoughts? On Sat, Mar 2, 2024 at 3:07 PM Jeroen Ooms <jeroeno...@gmail.com> wrote: > > Currently download.file() creates and terminates a new TLS connection > for each download. This creates a lot of overhead which is expensive > for both client and server (in particular the TLS handshake). Modern > internet clients (including browsers) re-use connections for many http > requests. > > We can do this in R by creating a persistent libcurl "multi-handle". > The R libcurl implementation already uses a multi-handle, however it > destroys it after each download, which defeats the purpose. The > purpose of the multi-handle is to keep it alive and let libcurl > maintain a persistent connection pool. This is particularly relevant > for install.packages() which needs to download many files from one and > the same server. > > Here is a bare minimal proof of concept patch that re-uses one and the > same multi-handle for all requests in R: > https://github.com/r-devel/r-svn/pull/155/files > > Some quick benchmarking shows that this can lead to big speedups for > download.packages() on high-bandwidth servers (such as CI). This quick > test to download 100 packages from CRAN showed more than 10x speedup > for me: https://github.com/r-devel/r-svn/pull/155 > > Moreover, I think this may make install.packages() more robust. In CI > build logs that download many packages, I often see one or two > downloads randomly failing with a TLS-connect error. I am hopeful this > problem will disappear when using a single connection to the CRAN > server to download all the packages. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel