thisisnic commented on a change in pull request #11521: URL: https://github.com/apache/arrow/pull/11521#discussion_r764295000
########## File path: r/vignettes/install.Rmd ########## @@ -7,49 +7,215 @@ vignette: > %\VignetteEncoding{UTF-8} --- -On macOS and Windows, when you `install.packages("arrow")`, -you get a binary package that contains Arrow’s C++ dependencies along with it. -On Linux, `install.packages()` retrieves a source package that has to be compiled locally, -and C++ dependencies need to be resolved as well. +The Apache Arrow project is implemented in multiple languages, and the R package depends on the Arrow C++ library (referred to from here on as libarrow). This means that when you install arrow, you need both the R and C++ versions. If you install arrow from CRAN on a machine running Windows or MacOS, when you call `install.packages("arrow")`, a precompiled binary containing both the R package and libarrow will be downloaded. However, CRAN does not host R package binaries for Linux, and so you must choose from one of the alternative approaches. -On linux we recommend one of the following for the quickest and easiest -installation: +This vignette outlines the recommend approaches to installing arrow on Linux, starting from the simplest and least customisable to the most complex but with more flexbility to customise your installation. -* Set the environment variable `NOT_CRAN=true` before installing, which will both - check for compatible Apache binaries and use those and if those aren't available - set a more fully-featured build than default. -* Using [RStudio's public package manager](https://packagemanager.rstudio.com/client/#/) - which includes pre-built binaries +The intended audience for this document is arrow R package _users_ on Linux, and not Arrow _developers_. +If you're contributing to the Arrow project, see `vignette("developing", package = "arrow")` for +resources to help you on set up your development environment. You can also find +a more detailed discussion of the code run during the installation process in the +[developers' installation docs](https://arrow.apache.org/docs/r/articles/developers/install_details.html) -Our goal is to make `install.packages("arrow")` "just work" for as many Linux distributions, -versions, and configurations as possible with the above options. +> Having trouble installing arrow? See the "Troubleshooting" section below. -This rest of this document describes how it works and the options for fine-tuning Linux installation. -The intended audience for this document is `arrow` R package users on Linux, not Arrow developers. -If you're contributing to the Arrow project, see `vignette("developing", package = "arrow") for guidance on setting up your development environment. +# Basic installation + +## Method 1 - Installation with a precompiled libarrow binary + +As mentioned above, on macOS and Windows, when you run `install.packages("arrow")`, and install arrow from CRAN, you get an R binary package that contains a precompiled version of libarrow, though CRAN does not host binary packages for Linux. This means that the default behaviour when you run `install.packages()` on Linux is to retrieve the source version of the R package that has to be compiled locally, including building libarrow from source. See method 2 below for details of this. + +For a faster installation, we recommend that you instead use one of the methods below for installing arrow with a precompiled libarrow binary. + +### Method 1a - Binary R package containing libarrow binary via RSPM/conda + +```{r, echo=FALSE, out.width="30%"} +knitr::include_graphics("./r_binary_libarrow_binary.png") +``` + +If you want a quicker installation process, and by default a more fully-featured build, you could install arrow from [RStudio's public package manager](https://packagemanager.rstudio.com/client/#/), which hosts binaries for both Windows and Linux. + +For example, if you are using Ubuntu 20.04 (Focal): + +```{r, eval = FALSE} +install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/focal/latest") +``` + +For other Linux distributions, to get the relevant URL, you can visit +[the RSPM site](https://packagemanager.rstudio.com/client/#/repos/1/overview), +click on 'binary', and select your preferred distribution. + +Similarly, if you use `conda` to manage your R environment, you can get the +latest official release of the R package including libarrow via: + +```shell +conda install -c conda-forge --strict-channel-priority r-arrow +``` + +### Method 1b - R source package with libarrow binary + +```{r, echo=FALSE, out.width="50%"} +knitr::include_graphics("./r_source_libarrow_binary.png") +``` + +Another way of achieving faster installation with all key features enabled is to use our self-hosted libarrow binaries. You can do this by setting the `NOT_CRAN` environment variable before you call `install.packages()`: + +```{r, eval = FALSE} +Sys.setenv("NOT_CRAN" = TRUE) +install.packages("arrow") +``` + +This installs the source version of the R package, but during the installation process will check for compatible libarrow binaries that we host and use those if available. If no binary is available or can't be found, then this option falls back onto method 2 below, but results in a more fully-featured build than default. + +## Method 2 - Installing an R source package and building libarrow from source + +```{r, echo=FALSE, out.width="50%"} +knitr::include_graphics("./r_source_libarrow_source.png") +``` Generally compiling and installing R packages with C++ dependencies, requires either installing system packages, which you may not have privileges to do, or building the C++ dependencies separately, which introduces all sorts of -additional ways for things to go wrong. +additional ways for things to go wrong, which is why we recommend method 1 above. -Note also that if you use `conda` to manage your R environment, this document does not apply. -You can `conda install -c conda-forge --strict-channel-priority r-arrow` and you'll get the latest official -release of the R package along with any C++ dependencies. +However, if you wish to fine-tune or customise your Linux installation, the +instructions in this section explain how to do that. -> Having trouble installing `arrow`? See the "Troubleshooting" section below. +### Basic configuration for building from source with fully featured installation -# Installation basics +If you wish to install libarrow from source instead of looking for pre-compiled +binaries, you can set the `LIBARROW_BINARY` variable. -Install the latest release of `arrow` from CRAN with +```{r, eval = FALSE} +Sys.setenv("LIBARROW_BINARY" = FALSE) +``` -```r -Sys.setenv(NOT_CRAN = TRUE) +By default, this is set to `TRUE`, and so libarrow will only be built from +source if this environment variable is set to `FALSE` or no compatible binary +for your OS can be found. + +When compiling libarrow from source, you have the power to really fine-tune +which features to install. You can set the environment variable +`LIBARROW_MINIMAL` to enable a more full-featured build including S3 support +and alternative memory allocators. + +```{r, eval = FALSE} +Sys.setenv("LIBARROW_MINIMAL" = FALSE) +``` + +By default this variable is unset; if set to `TRUE` a trimmed-down version of +arrow is installed. + +Note that in this guide, you will have seen us mention the environment variable +`NOT_CRAN` - this is a convenience variable, which when set to `TRUE`, +automatically sets `LIBARROW_MINIMAL` to `FALSE` and `LIBARROW_BINARY` to `TRUE`. + +Building libarrow from source requires more time and resources than installing +a binary. We recommend that you set the environment variable `ARROW_R_DEV` to +`TRUE` for more verbose output during the installation process. Review comment: 100% agree, updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org