hi Jeroen,

On Sun, Jan 6, 2019 at 10:28 AM Jeroen Ooms <jeroeno...@gmail.com> wrote:
>
> On 2019/01/02 17:08:58, Wes McKinney <w...@gmail.com> wrote:
> > hi folks,>
> >
> > With 0.12 around the corner and significant progress on the R bindings>
> > project (sufficient for Spark integration [1]), I am wondering how>
> > everyday R users are going to be able to install the software>
> > respectively on Linux, macOS, and Windows. Thoughts about the strategy>
> > for this?>
>
> The R packaging is a bit different than python. For Windows and macOS,
> we can statically link external libs into the R package, to ship a
> standalone binary R package without any runtime dependencies. On
> Linux, R requires the system package manager (apt/yum) to provide
> external libs. The R package manager doesn't work well with libs from
> Conda.

How do R libraries handle (or not handle) symbol conflicts if
everything is statically linked?

>
> The easiest way to build R packages on macOS is using binaries from
> Homebrew. Because arrow is already in homebrew this should be
> straightforward:
> https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-arrow.rb
>
> For Windows, we need to build  libarrow using the R mingw-w64
> toolchain (called Rtools). We are currently working to automate this
> process, but for now I manually build and maintain these binaries
> here: https://github.com/rwinlib/arrow

There might be some collaboration opportunity with Kouhei or others
who have been working on msys2 packaging, which AFAIK is going to be
nearly the same toolchain

>
> For both Windows and MacOS holds that the process creating the
> statically linked R binary package gets more complex for every extra
> system dependency. Currently both the homebrew and rwinlib binaries
> only depend on Boost and disable all extra arrow features.

Keep in mind that the #1 use case for the Python package right now is
to read and write Parquet files, which requires compression libraries
and Thrift. In the short term, I would expect the same to be true of
the R package, so failing to package Parquet will mean to cripple the
package.

>
> So we can make the R package to work on Windows and Mac, however, for
> the R package to be eligible for publication on CRAN, the required
> Linux libs (i.e. libarrow-dev, arrow-devel) need to be available from
> the official Debian and/or Fedora repository. The CRAN maintainers
> want to build and test the R package at least on Debian using only
> libs from official repositories, they won't build or install other
> software on the build servers. Hence a prerequisite for getting the R
> package on CRAN is getting libarrow-dev and arrow-devel into the
> official Debian/Fedora repo's.
>

How would you propose to make this happen on a practical timeline (3
months or less)? This requirement (getting packages into an official
Linux distro) is significantly more onerous than any of the other
platforms we are packaging for.

> In the mean time we could setup a custom R repository providing the R
> package in source and binary form on e.g. Bintray. That way users can
> at least do e.g: install.packages("arrow", repos =
> "https://dl.bintray.com/apache/arrow/r";) which is much easier than
> installing manually. However as long as the package is not on CRAN,
> other R packages cannot formally depend on it.

Reply via email to