hi,

On Sun, Jan 6, 2019 at 11:13 AM Jeroen Ooms <jeroeno...@gmail.com> wrote:
>
> On Sun, Jan 6, 2019 at 5:39 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > hi Jeroen,
> >
> > On Sun, Jan 6, 2019 at 10:28 AM Jeroen Ooms <jeroeno...@gmail.com> wrote:
> > >
> > > On 2019/01/02 17:08:58, Wes McKinney <w...@gmail.com> wrote:
> > > > hi folks,>
> > > >
> > > > With 0.12 around the corner and significant progress on the R bindings>
> > > > project (sufficient for Spark integration [1]), I am wondering how>
> > > > everyday R users are going to be able to install the software>
> > > > respectively on Linux, macOS, and Windows. Thoughts about the strategy>
> > > > for this?>
> > >
> > > The R packaging is a bit different than python. For Windows and macOS,
> > > we can statically link external libs into the R package, to ship a
> > > standalone binary R package without any runtime dependencies. On
> > > Linux, R requires the system package manager (apt/yum) to provide
> > > external libs. The R package manager doesn't work well with libs from
> > > Conda.
> >
> > How do R libraries handle (or not handle) symbol conflicts if
> > everything is statically linked?
>
> Not sure what you mean. R packages on Mac/Win statically their system
> dependencies; there should be no interference with other packages. In
> the case of arrow, we build the R package using libarrow.a (which
> already contains the required boost libs), and then the resulting R
> binary package consists a single dll/dylib containing both the R
> bindings + libarrow, without any external runtime dll dependencies.
>

To limit the scope of the question, to read Parquet files, libarrow.a
has a few transitive dependencies

* zlib
* snappy
* Thrift

There are some other incoming dependencies that can't be avoided in
the future (that is, if R wants to be a first-class citizen in where
this project is headed)

* LLVM
* re2
* More compressors: bz2, zstd, lz4
* gRPC (which depends on Protobuf, OpenSSL)

Basically, the entire list in
https://github.com/apache/arrow/tree/master/cpp/thirdparty#arrow-c-thirdparty-dependencies.

>
> > There might be some collaboration opportunity with Kouhei or others
> > who have been working on msys2 packaging, which AFAIK is going to be
> > nearly the same toolchain
>
> Yes I based the build on Kouhei's build script (see the first line of
> the PKGBUILD file in the rwinlib repo), however I disabled some extra
> features which complicate the process, so that it looks more like the
> homebrew configuration.
>
> > Keep in mind that the #1 use case for the Python package right now is
> > to read and write Parquet files, which requires compression libraries
> > and Thrift. In the short term, I would expect the same to be true of
> > the R package, so failing to package Parquet will mean to cripple the
> > package.
>
> Which compression libraries exactly do we need to build with parquet
> support? Can we build arrow using vendored thrift, or do we need to
> build thrift separately? If this is important, we should send a PR to
> homebrew to enable this feature in their builds.
>
> I am not familiar with arrow yet, how do I test if parquet works using
> the R package?

A first patch for this was merged here
https://github.com/apache/arrow/commit/5723adad7ad80c95ba8fcb55d40186d6a29edb74

>
> > How would you propose to make this happen on a practical timeline (3
> > months or less)? This requirement (getting packages into an official
> > Linux distro) is significantly more onerous than any of the other
> > platforms we are packaging for.
>
> You need to find a Debian maintainer that is willing to upload the
> package. I don't know the details of the process either. I think the
> .deb has to pass lintian and they require some degree of API
> stability. If you plan to make backward incompatible api changes in
> arrow 0.13, then publishing to Debian may be premature.

At a glance this seems problematic. I think we're going to have to
find a way around this to depend on .deb/.rpm packages from Bintray or
something similar. If it turns out that R package can only depend on
API-stable binaries on Linux, that seems like an issue that should be
raised outside of this project

- Wes

Reply via email to