thisisnic commented on a change in pull request #11521:
URL: https://github.com/apache/arrow/pull/11521#discussion_r764295000



##########
File path: r/vignettes/install.Rmd
##########
@@ -7,49 +7,215 @@ vignette: >
   %\VignetteEncoding{UTF-8}
 ---
 
-On macOS and Windows, when you `install.packages("arrow")`,
-you get a binary package that contains Arrow’s C++ dependencies along with it.
-On Linux, `install.packages()` retrieves a source package that has to be 
compiled locally,
-and C++ dependencies need to be resolved as well. 
+The Apache Arrow project is implemented in multiple languages, and the R 
package depends on the Arrow C++ library (referred to from here on as 
libarrow).  This means that when you install arrow, you need both the R and C++ 
versions.  If you install arrow from CRAN on a machine running Windows or 
MacOS, when you call `install.packages("arrow")`, a precompiled binary 
containing both the R package and libarrow will be downloaded.  However, CRAN 
does not host R package binaries for Linux, and so you must choose from one of 
the alternative approaches.  
 
-On linux we recommend one of the following for the quickest and easiest 
-installation: 
+This vignette outlines the recommend approaches to installing arrow on Linux, 
starting from the simplest and least customisable to the most complex but with 
more flexbility to customise your installation.
 
-* Set the environment variable `NOT_CRAN=true` before installing, which will 
both
-  check for compatible Apache binaries and use those and if those aren't 
available
-  set a more fully-featured build than default.
-* Using [RStudio's public package 
manager](https://packagemanager.rstudio.com/client/#/)
-  which includes pre-built binaries
+The intended audience for this document is arrow R package _users_ on Linux, 
and not Arrow _developers_.
+If you're contributing to the Arrow project, see `vignette("developing", 
package = "arrow")` for
+resources to help you on set up your development environment.  You can also 
find
+a more detailed discussion of the code run during the installation process in 
the 
+[developers' installation 
docs](https://arrow.apache.org/docs/r/articles/developers/install_details.html)
 
-Our goal is to make `install.packages("arrow")` "just work" for as many Linux 
distributions,
-versions, and configurations as possible with the above options.
+> Having trouble installing arrow? See the "Troubleshooting" section below.
 
-This rest of this document describes how it works and the options for 
fine-tuning Linux installation.
-The intended audience for this document is `arrow` R package users on Linux, 
not Arrow developers.
-If you're contributing to the Arrow project, see `vignette("developing", 
package = "arrow") for guidance on setting up your development environment.
+# Basic installation
+
+## Method 1 - Installation with a precompiled libarrow binary
+
+As mentioned above, on macOS and Windows, when you run 
`install.packages("arrow")`, and install arrow from CRAN, you get an R binary 
package that contains a precompiled version of libarrow, though CRAN does not 
host binary packages for Linux.  This means that the default behaviour when you 
run `install.packages()` on Linux is to retrieve the source version of the R 
package that has to be compiled locally, including building libarrow from 
source. See method 2 below for details of this.
+
+For a faster installation, we recommend that you instead use one of the 
methods below for installing arrow with a precompiled libarrow binary.
+
+### Method 1a - Binary R package containing libarrow binary via RSPM/conda
+
+```{r, echo=FALSE, out.width="30%"}
+knitr::include_graphics("./r_binary_libarrow_binary.png")
+```
+
+If you want a quicker installation process, and by default a more 
fully-featured build, you could install arrow from [RStudio's public package 
manager](https://packagemanager.rstudio.com/client/#/), which hosts binaries 
for both Windows and Linux.
+
+For example, if you are using Ubuntu 20.04 (Focal):
+
+```{r, eval = FALSE}
+install.packages("arrow", repos = 
"https://packagemanager.rstudio.com/all/__linux__/focal/latest";)
+```
+
+For other Linux distributions, to get the relevant URL, you can visit 
+[the RSPM site](https://packagemanager.rstudio.com/client/#/repos/1/overview),
+click on 'binary', and select your preferred distribution.
+
+Similarly, if you use `conda` to manage your R environment, you can get the 
+latest official release of the R package including libarrow via:
+
+```shell
+conda install -c conda-forge --strict-channel-priority r-arrow
+```
+
+### Method 1b - R source package with libarrow binary
+
+```{r, echo=FALSE, out.width="50%"}
+knitr::include_graphics("./r_source_libarrow_binary.png")
+```
+
+Another way of achieving faster installation with all key features enabled is 
to use our self-hosted libarrow binaries.  You can do this by setting the 
`NOT_CRAN` environment variable before you call `install.packages()`:
+
+```{r, eval = FALSE}
+Sys.setenv("NOT_CRAN" = TRUE)
+install.packages("arrow")
+```
+
+This installs the source version of the R package, but during the installation 
process will check for compatible libarrow binaries that we host and use those 
if available.  If no binary is available or can't be found, then this option 
falls back onto method 2 below, but results in a more fully-featured build than 
default.
+
+## Method 2 - Installing an R source package and building libarrow from source
+
+```{r, echo=FALSE, out.width="50%"}
+knitr::include_graphics("./r_source_libarrow_source.png")
+```
 
 Generally compiling and installing R packages with C++ dependencies, requires 
 either installing system packages, which you may not have privileges to do, or 
 building the C++ dependencies separately, which introduces all sorts of 
-additional ways for things to go wrong. 
+additional ways for things to go wrong, which is why we recommend method 1 
above.
 
-Note also that if you use `conda` to manage your R environment, this document 
does not apply.
-You can `conda install -c conda-forge --strict-channel-priority r-arrow` and 
you'll get the latest official
-release of the R package along with any C++ dependencies.
+However, if you wish to fine-tune or customise your Linux installation, the 
+instructions in this section explain how to do that.
 
-> Having trouble installing `arrow`? See the "Troubleshooting" section below.
+### Basic configuration for building from source with fully featured 
installation
 
-# Installation basics
+If you wish to install libarrow from source instead of looking for 
pre-compiled 
+binaries, you can set the  `LIBARROW_BINARY` variable.
 
-Install the latest release of `arrow` from CRAN with
+```{r, eval = FALSE}
+Sys.setenv("LIBARROW_BINARY" = FALSE)
+```
 
-```r
-Sys.setenv(NOT_CRAN = TRUE)
+By default, this is set to `TRUE`, and so libarrow will only be built from 
+source if this environment variable is set to `FALSE` or no compatible binary 
+for your OS can be found.
+
+When compiling libarrow from source, you have the power to really fine-tune 
+which features to install.  You can set the environment variable 
+`LIBARROW_MINIMAL` to enable a more full-featured build including S3 support 
+and alternative memory allocators.
+
+```{r, eval = FALSE}
+Sys.setenv("LIBARROW_MINIMAL" = FALSE)
+```
+
+By default this variable is unset; if set to `TRUE` a trimmed-down version of 
+arrow is installed.
+
+Note that in this guide, you will have seen us mention the environment 
variable 
+`NOT_CRAN` - this is a convenience variable, which when set to `TRUE`, 
+automatically sets `LIBARROW_MINIMAL` to `FALSE` and `LIBARROW_BINARY` to 
`TRUE`.
+
+Building libarrow from source requires more time and resources than installing 
+a binary.  We recommend that you set the environment variable `ARROW_R_DEV` to 
+`TRUE` for more verbose output during the installation process.

Review comment:
       100% agree, updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to