hi, On Sun, Jan 6, 2019 at 11:13 AM Jeroen Ooms <[email protected]> wrote: > > On Sun, Jan 6, 2019 at 5:39 PM Wes McKinney <[email protected]> wrote: > > > > hi Jeroen, > > > > On Sun, Jan 6, 2019 at 10:28 AM Jeroen Ooms <[email protected]> wrote: > > > > > > On 2019/01/02 17:08:58, Wes McKinney <[email protected]> wrote: > > > > hi folks,> > > > > > > > > With 0.12 around the corner and significant progress on the R bindings> > > > > project (sufficient for Spark integration [1]), I am wondering how> > > > > everyday R users are going to be able to install the software> > > > > respectively on Linux, macOS, and Windows. Thoughts about the strategy> > > > > for this?> > > > > > > The R packaging is a bit different than python. For Windows and macOS, > > > we can statically link external libs into the R package, to ship a > > > standalone binary R package without any runtime dependencies. On > > > Linux, R requires the system package manager (apt/yum) to provide > > > external libs. The R package manager doesn't work well with libs from > > > Conda. > > > > How do R libraries handle (or not handle) symbol conflicts if > > everything is statically linked? > > Not sure what you mean. R packages on Mac/Win statically their system > dependencies; there should be no interference with other packages. In > the case of arrow, we build the R package using libarrow.a (which > already contains the required boost libs), and then the resulting R > binary package consists a single dll/dylib containing both the R > bindings + libarrow, without any external runtime dll dependencies. >
To limit the scope of the question, to read Parquet files, libarrow.a has a few transitive dependencies * zlib * snappy * Thrift There are some other incoming dependencies that can't be avoided in the future (that is, if R wants to be a first-class citizen in where this project is headed) * LLVM * re2 * More compressors: bz2, zstd, lz4 * gRPC (which depends on Protobuf, OpenSSL) Basically, the entire list in https://github.com/apache/arrow/tree/master/cpp/thirdparty#arrow-c-thirdparty-dependencies. > > > There might be some collaboration opportunity with Kouhei or others > > who have been working on msys2 packaging, which AFAIK is going to be > > nearly the same toolchain > > Yes I based the build on Kouhei's build script (see the first line of > the PKGBUILD file in the rwinlib repo), however I disabled some extra > features which complicate the process, so that it looks more like the > homebrew configuration. > > > Keep in mind that the #1 use case for the Python package right now is > > to read and write Parquet files, which requires compression libraries > > and Thrift. In the short term, I would expect the same to be true of > > the R package, so failing to package Parquet will mean to cripple the > > package. > > Which compression libraries exactly do we need to build with parquet > support? Can we build arrow using vendored thrift, or do we need to > build thrift separately? If this is important, we should send a PR to > homebrew to enable this feature in their builds. > > I am not familiar with arrow yet, how do I test if parquet works using > the R package? A first patch for this was merged here https://github.com/apache/arrow/commit/5723adad7ad80c95ba8fcb55d40186d6a29edb74 > > > How would you propose to make this happen on a practical timeline (3 > > months or less)? This requirement (getting packages into an official > > Linux distro) is significantly more onerous than any of the other > > platforms we are packaging for. > > You need to find a Debian maintainer that is willing to upload the > package. I don't know the details of the process either. I think the > .deb has to pass lintian and they require some degree of API > stability. If you plan to make backward incompatible api changes in > arrow 0.13, then publishing to Debian may be premature. At a glance this seems problematic. I think we're going to have to find a way around this to depend on .deb/.rpm packages from Bintray or something similar. If it turns out that R package can only depend on API-stable binaries on Linux, that seems like an issue that should be raised outside of this project - Wes
