hi Jeroen, On Sun, Jan 6, 2019 at 10:28 AM Jeroen Ooms <jeroeno...@gmail.com> wrote: > > On 2019/01/02 17:08:58, Wes McKinney <w...@gmail.com> wrote: > > hi folks,> > > > > With 0.12 around the corner and significant progress on the R bindings> > > project (sufficient for Spark integration [1]), I am wondering how> > > everyday R users are going to be able to install the software> > > respectively on Linux, macOS, and Windows. Thoughts about the strategy> > > for this?> > > The R packaging is a bit different than python. For Windows and macOS, > we can statically link external libs into the R package, to ship a > standalone binary R package without any runtime dependencies. On > Linux, R requires the system package manager (apt/yum) to provide > external libs. The R package manager doesn't work well with libs from > Conda.
How do R libraries handle (or not handle) symbol conflicts if everything is statically linked? > > The easiest way to build R packages on macOS is using binaries from > Homebrew. Because arrow is already in homebrew this should be > straightforward: > https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-arrow.rb > > For Windows, we need to build libarrow using the R mingw-w64 > toolchain (called Rtools). We are currently working to automate this > process, but for now I manually build and maintain these binaries > here: https://github.com/rwinlib/arrow There might be some collaboration opportunity with Kouhei or others who have been working on msys2 packaging, which AFAIK is going to be nearly the same toolchain > > For both Windows and MacOS holds that the process creating the > statically linked R binary package gets more complex for every extra > system dependency. Currently both the homebrew and rwinlib binaries > only depend on Boost and disable all extra arrow features. Keep in mind that the #1 use case for the Python package right now is to read and write Parquet files, which requires compression libraries and Thrift. In the short term, I would expect the same to be true of the R package, so failing to package Parquet will mean to cripple the package. > > So we can make the R package to work on Windows and Mac, however, for > the R package to be eligible for publication on CRAN, the required > Linux libs (i.e. libarrow-dev, arrow-devel) need to be available from > the official Debian and/or Fedora repository. The CRAN maintainers > want to build and test the R package at least on Debian using only > libs from official repositories, they won't build or install other > software on the build servers. Hence a prerequisite for getting the R > package on CRAN is getting libarrow-dev and arrow-devel into the > official Debian/Fedora repo's. > How would you propose to make this happen on a practical timeline (3 months or less)? This requirement (getting packages into an official Linux distro) is significantly more onerous than any of the other platforms we are packaging for. > In the mean time we could setup a custom R repository providing the R > package in source and binary form on e.g. Bintray. That way users can > at least do e.g: install.packages("arrow", repos = > "https://dl.bintray.com/apache/arrow/r") which is much easier than > installing manually. However as long as the package is not on CRAN, > other R packages cannot formally depend on it.