jonkeane commented on a change in pull request #11521:
URL: https://github.com/apache/arrow/pull/11521#discussion_r760271296
##########
File path: r/vignettes/developers/install_details.Rmd
##########
@@ -0,0 +1,90 @@
+# How the R package is installed
+
+In order for the `arrow` R package to work, it needs the Arrow C++ library,
+also known as libarrow. There are a number of scripts that are triggered
+when `R CMD INSTALL .` is run and for Arrow users, these should all just work
+without configuration and pull in the most complete pieces (e.g. official
+binaries that we host).
+
+An overview of these scripts is shown below:
+
+* `configure` and `configure.win` - these scripts are triggered during
+`R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They
+handle finding the libarrow, setting up the build variables necessary, and
+writing the package Makevars file that is used to compile the C++ code in the R
+package.
+
+* `tools/nixlibs.R` - this script is sometimes called by `configure` on Linux
Review comment:
Would it be helpful to make a mention of the conditions where it is
called? (e.g. if libarrow is not already found on the system, if libarrow is
not being downloaded as a binary, etc.)?
##########
File path: r/vignettes/developers/install_details.Rmd
##########
@@ -0,0 +1,90 @@
+# How the R package is installed
+
+In order for the `arrow` R package to work, it needs the Arrow C++ library,
+also known as libarrow. There are a number of scripts that are triggered
+when `R CMD INSTALL .` is run and for Arrow users, these should all just work
+without configuration and pull in the most complete pieces (e.g. official
+binaries that we host).
+
+An overview of these scripts is shown below:
+
+* `configure` and `configure.win` - these scripts are triggered during
+`R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They
+handle finding the libarrow, setting up the build variables necessary, and
+writing the package Makevars file that is used to compile the C++ code in the R
+package.
+
+* `tools/nixlibs.R` - this script is sometimes called by `configure` on Linux
+(or on any non-windows OS with the environment variable
+`FORCE_BUNDLED_BUILD=true`). This sets up the build process for our bundled
+builds (which is the default on linux) and checks for binaries or downloads
+libarrow from source depending on dependency availability and build
configuration.
+
+* `inst/build_arrow_static.sh` - called by `tools/nixlibs.R` when libarrow is
+being built. It builds libarrow for a bundled, static build, and
Review comment:
```suggestion
* `inst/build_arrow_static.sh` - called by `tools/nixlibs.R` when libarrow
needs to be built. It builds libarrow for a bundled, static build, and
```
##########
File path: r/vignettes/developers/install_details.Rmd
##########
@@ -0,0 +1,90 @@
+# How the R package is installed
+
+In order for the `arrow` R package to work, it needs the Arrow C++ library,
+also known as libarrow. There are a number of scripts that are triggered
+when `R CMD INSTALL .` is run and for Arrow users, these should all just work
+without configuration and pull in the most complete pieces (e.g. official
+binaries that we host).
+
+An overview of these scripts is shown below:
+
+* `configure` and `configure.win` - these scripts are triggered during
+`R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They
+handle finding the libarrow, setting up the build variables necessary, and
+writing the package Makevars file that is used to compile the C++ code in the R
+package.
+
+* `tools/nixlibs.R` - this script is sometimes called by `configure` on Linux
+(or on any non-windows OS with the environment variable
+`FORCE_BUNDLED_BUILD=true`). This sets up the build process for our bundled
+builds (which is the default on linux) and checks for binaries or downloads
+libarrow from source depending on dependency availability and build
configuration.
+
+* `inst/build_arrow_static.sh` - called by `tools/nixlibs.R` when libarrow is
+being built. It builds libarrow for a bundled, static build, and
+mirrors the steps described in the ["Arrow R Developer Guide"
vignette](./setup.html)
+This build script is also what is used to generate the prebuilt binaries.
+
+The actions taken by these scripts to resolve dependencies and install the
+correct components are described below.
+
+## How dependencies are resolved
+
+There are a number of different ways you may have installed the Arrow C++
library:
+
+* a system package
+* a library you've built yourself outside of the context of installing the R
package
+* if you don't already have it, the R package will attempt to resolve it
automatically when it installs.
+
+If you are authorized to install system packages and you're installing a CRAN
release,
+you may want to use the official Apache Arrow release packages corresponding
to
+the R package version (though there are some drawbacks: see the
+["Troubleshooting" section in the main installation docs]("../install.html)).
+See the [Arrow project installation page](https://arrow.apache.org/install/)
+to find pre-compiled binary packages for some common Linux distributions,
+including Debian, Ubuntu, and CentOS.
+
+### libarrow dependencies
+
+You'll need to install `libparquet-dev` on Debian and Ubuntu, or
`parquet-devel` on CentOS.
+This will also automatically install libarrow as a dependency.
+
+### How the R package finds libarrow
+
+The diagram below shows how the R package finds a libarrow installation.
+
+```{r, echo=FALSE, out.width="70%"}
+knitr::include_graphics("../img/install_diagram.png")
+```
+
+#### Using pkg-config
+
+When you install the arrow R package on Linux, if no environment variables
+relating to the location of an existing libarrow installation have already by
+set, the installation code will attempt to find the Arrow C++ libraries on
+your system using the `pkg-config` command.
Review comment:
They might be explained later, so this might be moot, but when I came to
this when reading I wondered if we should say what those env vars are (or link
to a place where they are described?)
##########
File path: r/vignettes/install.Rmd
##########
@@ -379,63 +312,86 @@ setting `ARROW_WITH_ZSTD=OFF` to build without `zstd`; or
(3) uninstalling
the conflicting `zstd`.
See discussion [here](https://issues.apache.org/jira/browse/ARROW-8556).
-## Summary of build environment variables
-
-Some features are optional when you build Arrow from source. With the
exception of `ARROW_S3`, these are all `ON` by default in the bundled C++
build, but you can set them to `OFF` to disable them.
-
-* `ARROW_S3`: If set to `ON` S3 support will be built as long as the
- dependencies are met; if they are not met, the build script will turn this
`OFF`
-* `ARROW_JEMALLOC` for the `jemalloc` memory allocator
-* `ARROW_MIMALLOC` for the `mimalloc` memmory allocator
-* `ARROW_PARQUET`
-* `ARROW_DATASET`
-* `ARROW_JSON` for the JSON parsing library
-* `ARROW_WITH_RE2` for the RE2 regular expression library, used in some string
compute functions
-* `ARROW_WITH_UTF8PROC` for the UTF8Proc string library, used in many other
string compute functions
-* `ARROW_JSON` for JSON parsing
-* `ARROW_WITH_BROTLI`, `ARROW_WITH_BZ2`, `ARROW_WITH_LZ4`,
`ARROW_WITH_SNAPPY`, `ARROW_WITH_ZLIB`, and `ARROW_WITH_ZSTD` for various
compression algorithms
-
-
-There are a number of other variables that affect the `configure` script and
the bundled build script.
-By default, these are all unset. All boolean variables are case-insensitive.
-
-* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script
- won't look for Arrow libraries on your system and instead will look to
download/build them.
- Use this if you have a version mismatch between installed system libraries
- and the version of the R package you're installing.
-* `LIBARROW_BINARY`: If set to `true`, the script will try to download a binary
- C++ library built for your operating system.
- You may also set it to some other string,
- a related "distro-version" that has binaries built that work for your OS.
- If no binary is found, installation will fall back to building C++
- dependencies from source.
-* `LIBARROW_BUILD`: If set to `false`, the build script
+# Summary of build environment variables
+
+## libarrow configuration
+
+Some features are optional when you build Arrow from source - you can configure
+whether these components are built via the use of environment variables. The
+names of the environment variables which control these features and their
+default values are shown below.
+
+| Name | Description | Default Value |
+| ---| --- | :-: |
+| `ARROW_S3` | S3 support (if dependencies are met)* | `OFF` |
+| `ARROW_JEMALLOC` | The `jemalloc` memory allocator | `ON` |
+| `ARROW_MIMALLOC` | The `mimalloc` memory allocator | `ON` |
+| `ARROW_PARQUET` | | `ON` |
+| `ARROW_DATASET` | | `ON` |
+| `ARROW_JSON` | The JSON parsing library | `ON` |
+| `ARROW_WITH_RE2` | The RE2 regular expression library, used in some
string compute functions | `ON` |
+| `ARROW_WITH_UTF8PROC` | The UTF8Proc string library, used in many other
string compute functions | `ON` |
+| `ARROW_WITH_BROTLI` | Compression algorithm | `ON` |
+| `ARROW_WITH_BZ2` | Compression algorithm | `ON` |
+| `ARROW_WITH_LZ4` | Compression algorithm | `ON` |
+| `ARROW_WITH_SNAPPY` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZLIB` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZSTD` | Compression algorithm | `ON` |
+
+## R package configuration
+
+There are a number of other variables that affect the `configure` script and
+the bundled build script. All boolean variables are case-insensitive.
+
+| Name | Description | Default |
+| --- | --- | :-: |
+| `ARROW_USE_PKG_CONFIG` | Use `pkg-config` to search for `libarrow` install |
`true` |
+| `LIBARROW_BUILD` | Allow building from source | `true` |
+| `LIBARROW_BINARY` | Try to install `libarrow` binary instead of building
from source | `true` |
+| `LIBARROW_MINIMAL` | Build with minimal features enabled | (unset) |
+| `NOT_CRAN` | Set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false` |
`false` |
+| `ARROW_R_DEV` | More verbose messaging and regenerates some code | `false` |
+| `LIBARROW_DEBUG_DIR` | Directory to save source build logs | (unset) |
+| `CMAKE` | Alternative CMake path | (unset) |
+
+See below for more in-depth explanations of these environment variables.
+
+* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script won't look
for
+Arrow libraries on your system and instead will look to download/build them.
+ Use this if you have a version mismatch between installed system libraries
and
+ the version of the R package you're installing.
+* `LIBARROW_BINARY` : If set to `true`, the script will try to download a
binary
+ C++ library built for your operating system. You may also set it to some
other string, a related "distro-version" that has binaries built that work for
your OS.
+ If no binary is found, installation will fall back to building C++
dependencies from source.
Review comment:
Could/should we link to the csv that shows what those strings are? Or
give an example?
##########
File path: r/vignettes/install.Rmd
##########
@@ -379,63 +312,86 @@ setting `ARROW_WITH_ZSTD=OFF` to build without `zstd`; or
(3) uninstalling
the conflicting `zstd`.
See discussion [here](https://issues.apache.org/jira/browse/ARROW-8556).
-## Summary of build environment variables
-
-Some features are optional when you build Arrow from source. With the
exception of `ARROW_S3`, these are all `ON` by default in the bundled C++
build, but you can set them to `OFF` to disable them.
-
-* `ARROW_S3`: If set to `ON` S3 support will be built as long as the
- dependencies are met; if they are not met, the build script will turn this
`OFF`
-* `ARROW_JEMALLOC` for the `jemalloc` memory allocator
-* `ARROW_MIMALLOC` for the `mimalloc` memmory allocator
-* `ARROW_PARQUET`
-* `ARROW_DATASET`
-* `ARROW_JSON` for the JSON parsing library
-* `ARROW_WITH_RE2` for the RE2 regular expression library, used in some string
compute functions
-* `ARROW_WITH_UTF8PROC` for the UTF8Proc string library, used in many other
string compute functions
-* `ARROW_JSON` for JSON parsing
-* `ARROW_WITH_BROTLI`, `ARROW_WITH_BZ2`, `ARROW_WITH_LZ4`,
`ARROW_WITH_SNAPPY`, `ARROW_WITH_ZLIB`, and `ARROW_WITH_ZSTD` for various
compression algorithms
-
-
-There are a number of other variables that affect the `configure` script and
the bundled build script.
-By default, these are all unset. All boolean variables are case-insensitive.
-
-* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script
- won't look for Arrow libraries on your system and instead will look to
download/build them.
- Use this if you have a version mismatch between installed system libraries
- and the version of the R package you're installing.
-* `LIBARROW_BINARY`: If set to `true`, the script will try to download a binary
- C++ library built for your operating system.
- You may also set it to some other string,
- a related "distro-version" that has binaries built that work for your OS.
- If no binary is found, installation will fall back to building C++
- dependencies from source.
-* `LIBARROW_BUILD`: If set to `false`, the build script
+# Summary of build environment variables
+
+## libarrow configuration
+
+Some features are optional when you build Arrow from source - you can configure
+whether these components are built via the use of environment variables. The
+names of the environment variables which control these features and their
+default values are shown below.
+
+| Name | Description | Default Value |
+| ---| --- | :-: |
+| `ARROW_S3` | S3 support (if dependencies are met)* | `OFF` |
+| `ARROW_JEMALLOC` | The `jemalloc` memory allocator | `ON` |
+| `ARROW_MIMALLOC` | The `mimalloc` memory allocator | `ON` |
+| `ARROW_PARQUET` | | `ON` |
+| `ARROW_DATASET` | | `ON` |
+| `ARROW_JSON` | The JSON parsing library | `ON` |
+| `ARROW_WITH_RE2` | The RE2 regular expression library, used in some
string compute functions | `ON` |
+| `ARROW_WITH_UTF8PROC` | The UTF8Proc string library, used in many other
string compute functions | `ON` |
+| `ARROW_WITH_BROTLI` | Compression algorithm | `ON` |
+| `ARROW_WITH_BZ2` | Compression algorithm | `ON` |
+| `ARROW_WITH_LZ4` | Compression algorithm | `ON` |
+| `ARROW_WITH_SNAPPY` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZLIB` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZSTD` | Compression algorithm | `ON` |
+
+## R package configuration
+
+There are a number of other variables that affect the `configure` script and
+the bundled build script. All boolean variables are case-insensitive.
+
+| Name | Description | Default |
+| --- | --- | :-: |
+| `ARROW_USE_PKG_CONFIG` | Use `pkg-config` to search for `libarrow` install |
`true` |
+| `LIBARROW_BUILD` | Allow building from source | `true` |
+| `LIBARROW_BINARY` | Try to install `libarrow` binary instead of building
from source | `true` |
+| `LIBARROW_MINIMAL` | Build with minimal features enabled | (unset) |
+| `NOT_CRAN` | Set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false` |
`false` |
+| `ARROW_R_DEV` | More verbose messaging and regenerates some code | `false` |
+| `LIBARROW_DEBUG_DIR` | Directory to save source build logs | (unset) |
+| `CMAKE` | Alternative CMake path | (unset) |
+
+See below for more in-depth explanations of these environment variables.
+
+* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script won't look
for
+Arrow libraries on your system and instead will look to download/build them.
+ Use this if you have a version mismatch between installed system libraries
and
+ the version of the R package you're installing.
+* `LIBARROW_BINARY` : If set to `true`, the script will try to download a
binary
+ C++ library built for your operating system. You may also set it to some
other string, a related "distro-version" that has binaries built that work for
your OS.
+ If no binary is found, installation will fall back to building C++
dependencies from source.
+* `LIBARROW_BUILD` : If set to `false`, the build script
will not attempt to build the C++ from source. This means you will only get
a working `arrow` R package if a prebuilt binary is found.
Use this if you want to avoid compiling the C++ library, which may be slow
- and resource-intensive, and ensure that you only use a prebuilt binary.
-* `LIBARROW_MINIMAL`: If set to `false`, the build script
- will enable some optional features, including compression libraries, S3
- support, and additional alternative memory allocators. This will increase the
- source build time but results in a more fully functional library.
-* `NOT_CRAN`: If this variable is set to `true`, as the `devtools` package
does,
+ and resource-intensive, and ensure that you only use a prebuilt binary.
+* `LIBARROW_MINIMAL` : If set to `false`, the build script
+ will enable some optional features, including S3
+ support and additional alternative memory allocators. This will increase the
+ source build time but results in a more fully functional library. If set to
+ `true` turns off Parquet, Datasets, compression libraries, and other
optional
+ features.
Review comment:
```suggestion
features (which is only really useful if compiling on a platform that does
not support these features, e.g. Solaris)
```
This might be too much snark, but it _is_ the reason for it.
##########
File path: r/vignettes/developers/install_details.Rmd
##########
@@ -0,0 +1,90 @@
+# How the R package is installed
+
+In order for the `arrow` R package to work, it needs the Arrow C++ library,
+also known as libarrow. There are a number of scripts that are triggered
+when `R CMD INSTALL .` is run and for Arrow users, these should all just work
+without configuration and pull in the most complete pieces (e.g. official
+binaries that we host).
+
+An overview of these scripts is shown below:
+
+* `configure` and `configure.win` - these scripts are triggered during
+`R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They
+handle finding the libarrow, setting up the build variables necessary, and
+writing the package Makevars file that is used to compile the C++ code in the R
+package.
+
+* `tools/nixlibs.R` - this script is sometimes called by `configure` on Linux
+(or on any non-windows OS with the environment variable
+`FORCE_BUNDLED_BUILD=true`). This sets up the build process for our bundled
+builds (which is the default on linux) and checks for binaries or downloads
+libarrow from source depending on dependency availability and build
configuration.
+
+* `inst/build_arrow_static.sh` - called by `tools/nixlibs.R` when libarrow is
+being built. It builds libarrow for a bundled, static build, and
+mirrors the steps described in the ["Arrow R Developer Guide"
vignette](./setup.html)
+This build script is also what is used to generate the prebuilt binaries.
+
+The actions taken by these scripts to resolve dependencies and install the
+correct components are described below.
+
+## How dependencies are resolved
+
+There are a number of different ways you may have installed the Arrow C++
library:
+
+* a system package
+* a library you've built yourself outside of the context of installing the R
package
+* if you don't already have it, the R package will attempt to resolve it
automatically when it installs.
+
+If you are authorized to install system packages and you're installing a CRAN
release,
+you may want to use the official Apache Arrow release packages corresponding
to
+the R package version (though there are some drawbacks: see the
+["Troubleshooting" section in the main installation docs]("../install.html)).
+See the [Arrow project installation page](https://arrow.apache.org/install/)
+to find pre-compiled binary packages for some common Linux distributions,
+including Debian, Ubuntu, and CentOS.
+
+### libarrow dependencies
+
+You'll need to install `libparquet-dev` on Debian and Ubuntu, or
`parquet-devel` on CentOS.
+This will also automatically install libarrow as a dependency.
+
+### How the R package finds libarrow
+
+The diagram below shows how the R package finds a libarrow installation.
+
+```{r, echo=FALSE, out.width="70%"}
+knitr::include_graphics("../img/install_diagram.png")
+```
Review comment:
For lack of a better way to comment: this is a fantastic diagram. And
really shows the power of diagrams cause it would take sooo many words and
still not be as clear in text.
##########
File path: r/vignettes/developers/install_details.Rmd
##########
@@ -0,0 +1,90 @@
+# How the R package is installed
+
+In order for the `arrow` R package to work, it needs the Arrow C++ library,
+also known as libarrow. There are a number of scripts that are triggered
+when `R CMD INSTALL .` is run and for Arrow users, these should all just work
+without configuration and pull in the most complete pieces (e.g. official
+binaries that we host).
+
+An overview of these scripts is shown below:
+
+* `configure` and `configure.win` - these scripts are triggered during
+`R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They
+handle finding the libarrow, setting up the build variables necessary, and
+writing the package Makevars file that is used to compile the C++ code in the R
+package.
+
+* `tools/nixlibs.R` - this script is sometimes called by `configure` on Linux
+(or on any non-windows OS with the environment variable
+`FORCE_BUNDLED_BUILD=true`). This sets up the build process for our bundled
+builds (which is the default on linux) and checks for binaries or downloads
+libarrow from source depending on dependency availability and build
configuration.
+
+* `inst/build_arrow_static.sh` - called by `tools/nixlibs.R` when libarrow is
+being built. It builds libarrow for a bundled, static build, and
+mirrors the steps described in the ["Arrow R Developer Guide"
vignette](./setup.html)
+This build script is also what is used to generate the prebuilt binaries.
+
+The actions taken by these scripts to resolve dependencies and install the
+correct components are described below.
+
+## How dependencies are resolved
+
+There are a number of different ways you may have installed the Arrow C++
library:
+
+* a system package
+* a library you've built yourself outside of the context of installing the R
package
+* if you don't already have it, the R package will attempt to resolve it
automatically when it installs.
+
+If you are authorized to install system packages and you're installing a CRAN
release,
+you may want to use the official Apache Arrow release packages corresponding
to
+the R package version (though there are some drawbacks: see the
+["Troubleshooting" section in the main installation docs]("../install.html)).
+See the [Arrow project installation page](https://arrow.apache.org/install/)
+to find pre-compiled binary packages for some common Linux distributions,
+including Debian, Ubuntu, and CentOS.
Review comment:
Maybe we should mention something about `apt` and `yum` here to make
clear that that's what we're suggesting?
##########
File path: r/vignettes/install.Rmd
##########
@@ -379,63 +312,86 @@ setting `ARROW_WITH_ZSTD=OFF` to build without `zstd`; or
(3) uninstalling
the conflicting `zstd`.
See discussion [here](https://issues.apache.org/jira/browse/ARROW-8556).
-## Summary of build environment variables
-
-Some features are optional when you build Arrow from source. With the
exception of `ARROW_S3`, these are all `ON` by default in the bundled C++
build, but you can set them to `OFF` to disable them.
-
-* `ARROW_S3`: If set to `ON` S3 support will be built as long as the
- dependencies are met; if they are not met, the build script will turn this
`OFF`
-* `ARROW_JEMALLOC` for the `jemalloc` memory allocator
-* `ARROW_MIMALLOC` for the `mimalloc` memmory allocator
-* `ARROW_PARQUET`
-* `ARROW_DATASET`
-* `ARROW_JSON` for the JSON parsing library
-* `ARROW_WITH_RE2` for the RE2 regular expression library, used in some string
compute functions
-* `ARROW_WITH_UTF8PROC` for the UTF8Proc string library, used in many other
string compute functions
-* `ARROW_JSON` for JSON parsing
-* `ARROW_WITH_BROTLI`, `ARROW_WITH_BZ2`, `ARROW_WITH_LZ4`,
`ARROW_WITH_SNAPPY`, `ARROW_WITH_ZLIB`, and `ARROW_WITH_ZSTD` for various
compression algorithms
-
-
-There are a number of other variables that affect the `configure` script and
the bundled build script.
-By default, these are all unset. All boolean variables are case-insensitive.
-
-* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script
- won't look for Arrow libraries on your system and instead will look to
download/build them.
- Use this if you have a version mismatch between installed system libraries
- and the version of the R package you're installing.
-* `LIBARROW_BINARY`: If set to `true`, the script will try to download a binary
- C++ library built for your operating system.
- You may also set it to some other string,
- a related "distro-version" that has binaries built that work for your OS.
- If no binary is found, installation will fall back to building C++
- dependencies from source.
-* `LIBARROW_BUILD`: If set to `false`, the build script
+# Summary of build environment variables
+
+## libarrow configuration
+
+Some features are optional when you build Arrow from source - you can configure
+whether these components are built via the use of environment variables. The
+names of the environment variables which control these features and their
+default values are shown below.
+
+| Name | Description | Default Value |
+| ---| --- | :-: |
+| `ARROW_S3` | S3 support (if dependencies are met)* | `OFF` |
+| `ARROW_JEMALLOC` | The `jemalloc` memory allocator | `ON` |
+| `ARROW_MIMALLOC` | The `mimalloc` memory allocator | `ON` |
+| `ARROW_PARQUET` | | `ON` |
+| `ARROW_DATASET` | | `ON` |
+| `ARROW_JSON` | The JSON parsing library | `ON` |
+| `ARROW_WITH_RE2` | The RE2 regular expression library, used in some
string compute functions | `ON` |
+| `ARROW_WITH_UTF8PROC` | The UTF8Proc string library, used in many other
string compute functions | `ON` |
+| `ARROW_WITH_BROTLI` | Compression algorithm | `ON` |
+| `ARROW_WITH_BZ2` | Compression algorithm | `ON` |
+| `ARROW_WITH_LZ4` | Compression algorithm | `ON` |
+| `ARROW_WITH_SNAPPY` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZLIB` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZSTD` | Compression algorithm | `ON` |
+
+## R package configuration
+
+There are a number of other variables that affect the `configure` script and
+the bundled build script. All boolean variables are case-insensitive.
+
+| Name | Description | Default |
+| --- | --- | :-: |
+| `ARROW_USE_PKG_CONFIG` | Use `pkg-config` to search for `libarrow` install |
`true` |
+| `LIBARROW_BUILD` | Allow building from source | `true` |
+| `LIBARROW_BINARY` | Try to install `libarrow` binary instead of building
from source | `true` |
+| `LIBARROW_MINIMAL` | Build with minimal features enabled | (unset) |
+| `NOT_CRAN` | Set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false` |
`false` |
+| `ARROW_R_DEV` | More verbose messaging and regenerates some code | `false` |
+| `LIBARROW_DEBUG_DIR` | Directory to save source build logs | (unset) |
+| `CMAKE` | Alternative CMake path | (unset) |
+
+See below for more in-depth explanations of these environment variables.
+
+* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script won't look
for
+Arrow libraries on your system and instead will look to download/build them.
+ Use this if you have a version mismatch between installed system libraries
and
+ the version of the R package you're installing.
Review comment:
This applies to this but also others below, might we want to flag that a
number of these are mostly useful for development? Either in each one (like
this one) that is helpful for developing, or in a sentence at the top that is
something like "many of these variables are mainly useful while developing
Arrow and aren't super relevant to folks trying to install Arrow and use it."
##########
File path: r/vignettes/install.Rmd
##########
@@ -379,63 +312,86 @@ setting `ARROW_WITH_ZSTD=OFF` to build without `zstd`; or
(3) uninstalling
the conflicting `zstd`.
See discussion [here](https://issues.apache.org/jira/browse/ARROW-8556).
-## Summary of build environment variables
-
-Some features are optional when you build Arrow from source. With the
exception of `ARROW_S3`, these are all `ON` by default in the bundled C++
build, but you can set them to `OFF` to disable them.
-
-* `ARROW_S3`: If set to `ON` S3 support will be built as long as the
- dependencies are met; if they are not met, the build script will turn this
`OFF`
-* `ARROW_JEMALLOC` for the `jemalloc` memory allocator
-* `ARROW_MIMALLOC` for the `mimalloc` memmory allocator
-* `ARROW_PARQUET`
-* `ARROW_DATASET`
-* `ARROW_JSON` for the JSON parsing library
-* `ARROW_WITH_RE2` for the RE2 regular expression library, used in some string
compute functions
-* `ARROW_WITH_UTF8PROC` for the UTF8Proc string library, used in many other
string compute functions
-* `ARROW_JSON` for JSON parsing
-* `ARROW_WITH_BROTLI`, `ARROW_WITH_BZ2`, `ARROW_WITH_LZ4`,
`ARROW_WITH_SNAPPY`, `ARROW_WITH_ZLIB`, and `ARROW_WITH_ZSTD` for various
compression algorithms
-
-
-There are a number of other variables that affect the `configure` script and
the bundled build script.
-By default, these are all unset. All boolean variables are case-insensitive.
-
-* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script
- won't look for Arrow libraries on your system and instead will look to
download/build them.
- Use this if you have a version mismatch between installed system libraries
- and the version of the R package you're installing.
-* `LIBARROW_BINARY`: If set to `true`, the script will try to download a binary
- C++ library built for your operating system.
- You may also set it to some other string,
- a related "distro-version" that has binaries built that work for your OS.
- If no binary is found, installation will fall back to building C++
- dependencies from source.
-* `LIBARROW_BUILD`: If set to `false`, the build script
+# Summary of build environment variables
+
+## libarrow configuration
+
+Some features are optional when you build Arrow from source - you can configure
+whether these components are built via the use of environment variables. The
+names of the environment variables which control these features and their
+default values are shown below.
+
+| Name | Description | Default Value |
+| ---| --- | :-: |
+| `ARROW_S3` | S3 support (if dependencies are met)* | `OFF` |
+| `ARROW_JEMALLOC` | The `jemalloc` memory allocator | `ON` |
+| `ARROW_MIMALLOC` | The `mimalloc` memory allocator | `ON` |
+| `ARROW_PARQUET` | | `ON` |
+| `ARROW_DATASET` | | `ON` |
+| `ARROW_JSON` | The JSON parsing library | `ON` |
+| `ARROW_WITH_RE2` | The RE2 regular expression library, used in some
string compute functions | `ON` |
+| `ARROW_WITH_UTF8PROC` | The UTF8Proc string library, used in many other
string compute functions | `ON` |
+| `ARROW_WITH_BROTLI` | Compression algorithm | `ON` |
+| `ARROW_WITH_BZ2` | Compression algorithm | `ON` |
+| `ARROW_WITH_LZ4` | Compression algorithm | `ON` |
+| `ARROW_WITH_SNAPPY` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZLIB` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZSTD` | Compression algorithm | `ON` |
+
+## R package configuration
+
+There are a number of other variables that affect the `configure` script and
+the bundled build script. All boolean variables are case-insensitive.
+
+| Name | Description | Default |
+| --- | --- | :-: |
+| `ARROW_USE_PKG_CONFIG` | Use `pkg-config` to search for `libarrow` install |
`true` |
+| `LIBARROW_BUILD` | Allow building from source | `true` |
+| `LIBARROW_BINARY` | Try to install `libarrow` binary instead of building
from source | `true` |
+| `LIBARROW_MINIMAL` | Build with minimal features enabled | (unset) |
+| `NOT_CRAN` | Set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false` |
`false` |
+| `ARROW_R_DEV` | More verbose messaging and regenerates some code | `false` |
+| `LIBARROW_DEBUG_DIR` | Directory to save source build logs | (unset) |
+| `CMAKE` | Alternative CMake path | (unset) |
+
+See below for more in-depth explanations of these environment variables.
+
+* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script won't look
for
+Arrow libraries on your system and instead will look to download/build them.
+ Use this if you have a version mismatch between installed system libraries
and
+ the version of the R package you're installing.
+* `LIBARROW_BINARY` : If set to `true`, the script will try to download a
binary
+ C++ library built for your operating system. You may also set it to some
other string, a related "distro-version" that has binaries built that work for
your OS.
+ If no binary is found, installation will fall back to building C++
dependencies from source.
+* `LIBARROW_BUILD` : If set to `false`, the build script
will not attempt to build the C++ from source. This means you will only get
a working `arrow` R package if a prebuilt binary is found.
Use this if you want to avoid compiling the C++ library, which may be slow
- and resource-intensive, and ensure that you only use a prebuilt binary.
-* `LIBARROW_MINIMAL`: If set to `false`, the build script
- will enable some optional features, including compression libraries, S3
- support, and additional alternative memory allocators. This will increase the
- source build time but results in a more fully functional library.
-* `NOT_CRAN`: If this variable is set to `true`, as the `devtools` package
does,
+ and resource-intensive, and ensure that you only use a prebuilt binary.
+* `LIBARROW_MINIMAL` : If set to `false`, the build script
+ will enable some optional features, including S3
+ support and additional alternative memory allocators. This will increase the
+ source build time but results in a more fully functional library. If set to
+ `true` turns off Parquet, Datasets, compression libraries, and other
optional
+ features.
+* `NOT_CRAN` : If this variable is set to `true`, as the `devtools` package
does,
the build script will set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false`
unless those environment variables are already set. This provides for a more
complete and fast installation experience for users who already have
`NOT_CRAN=true` as part of their workflow, without requiring additional
environment variables to be set.
Review comment:
Do we want to mention here as well that we encourage setting this as a
catch-all for installing all the good things (and the good ways) on Linux?
##########
File path: r/vignettes/developers/install_details.Rmd
##########
@@ -0,0 +1,90 @@
+# How the R package is installed
+
+In order for the `arrow` R package to work, it needs the Arrow C++ library,
+also known as libarrow. There are a number of scripts that are triggered
+when `R CMD INSTALL .` is run and for Arrow users, these should all just work
+without configuration and pull in the most complete pieces (e.g. official
+binaries that we host).
+
+An overview of these scripts is shown below:
+
+* `configure` and `configure.win` - these scripts are triggered during
+`R CMD INSTALL .` on non-Windows and Windows platforms, respectively. They
+handle finding the libarrow, setting up the build variables necessary, and
+writing the package Makevars file that is used to compile the C++ code in the R
+package.
+
+* `tools/nixlibs.R` - this script is sometimes called by `configure` on Linux
+(or on any non-windows OS with the environment variable
+`FORCE_BUNDLED_BUILD=true`). This sets up the build process for our bundled
+builds (which is the default on linux) and checks for binaries or downloads
+libarrow from source depending on dependency availability and build
configuration.
+
+* `inst/build_arrow_static.sh` - called by `tools/nixlibs.R` when libarrow is
+being built. It builds libarrow for a bundled, static build, and
+mirrors the steps described in the ["Arrow R Developer Guide"
vignette](./setup.html)
+This build script is also what is used to generate the prebuilt binaries.
Review comment:
```suggestion
This build script is also what is used to generate our prebuilt binaries.
```
Could/should we link to somewhere in our docs where we show how to install
our binaries?
##########
File path: r/vignettes/install.Rmd
##########
@@ -379,63 +312,86 @@ setting `ARROW_WITH_ZSTD=OFF` to build without `zstd`; or
(3) uninstalling
the conflicting `zstd`.
See discussion [here](https://issues.apache.org/jira/browse/ARROW-8556).
-## Summary of build environment variables
-
-Some features are optional when you build Arrow from source. With the
exception of `ARROW_S3`, these are all `ON` by default in the bundled C++
build, but you can set them to `OFF` to disable them.
-
-* `ARROW_S3`: If set to `ON` S3 support will be built as long as the
- dependencies are met; if they are not met, the build script will turn this
`OFF`
-* `ARROW_JEMALLOC` for the `jemalloc` memory allocator
-* `ARROW_MIMALLOC` for the `mimalloc` memmory allocator
-* `ARROW_PARQUET`
-* `ARROW_DATASET`
-* `ARROW_JSON` for the JSON parsing library
-* `ARROW_WITH_RE2` for the RE2 regular expression library, used in some string
compute functions
-* `ARROW_WITH_UTF8PROC` for the UTF8Proc string library, used in many other
string compute functions
-* `ARROW_JSON` for JSON parsing
-* `ARROW_WITH_BROTLI`, `ARROW_WITH_BZ2`, `ARROW_WITH_LZ4`,
`ARROW_WITH_SNAPPY`, `ARROW_WITH_ZLIB`, and `ARROW_WITH_ZSTD` for various
compression algorithms
-
-
-There are a number of other variables that affect the `configure` script and
the bundled build script.
-By default, these are all unset. All boolean variables are case-insensitive.
-
-* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script
- won't look for Arrow libraries on your system and instead will look to
download/build them.
- Use this if you have a version mismatch between installed system libraries
- and the version of the R package you're installing.
-* `LIBARROW_BINARY`: If set to `true`, the script will try to download a binary
- C++ library built for your operating system.
- You may also set it to some other string,
- a related "distro-version" that has binaries built that work for your OS.
- If no binary is found, installation will fall back to building C++
- dependencies from source.
-* `LIBARROW_BUILD`: If set to `false`, the build script
+# Summary of build environment variables
+
+## libarrow configuration
+
+Some features are optional when you build Arrow from source - you can configure
+whether these components are built via the use of environment variables. The
+names of the environment variables which control these features and their
+default values are shown below.
+
+| Name | Description | Default Value |
+| ---| --- | :-: |
+| `ARROW_S3` | S3 support (if dependencies are met)* | `OFF` |
+| `ARROW_JEMALLOC` | The `jemalloc` memory allocator | `ON` |
+| `ARROW_MIMALLOC` | The `mimalloc` memory allocator | `ON` |
+| `ARROW_PARQUET` | | `ON` |
+| `ARROW_DATASET` | | `ON` |
+| `ARROW_JSON` | The JSON parsing library | `ON` |
+| `ARROW_WITH_RE2` | The RE2 regular expression library, used in some
string compute functions | `ON` |
+| `ARROW_WITH_UTF8PROC` | The UTF8Proc string library, used in many other
string compute functions | `ON` |
+| `ARROW_WITH_BROTLI` | Compression algorithm | `ON` |
+| `ARROW_WITH_BZ2` | Compression algorithm | `ON` |
+| `ARROW_WITH_LZ4` | Compression algorithm | `ON` |
+| `ARROW_WITH_SNAPPY` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZLIB` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZSTD` | Compression algorithm | `ON` |
+
+## R package configuration
+
+There are a number of other variables that affect the `configure` script and
+the bundled build script. All boolean variables are case-insensitive.
+
+| Name | Description | Default |
+| --- | --- | :-: |
+| `ARROW_USE_PKG_CONFIG` | Use `pkg-config` to search for `libarrow` install |
`true` |
+| `LIBARROW_BUILD` | Allow building from source | `true` |
+| `LIBARROW_BINARY` | Try to install `libarrow` binary instead of building
from source | `true` |
+| `LIBARROW_MINIMAL` | Build with minimal features enabled | (unset) |
+| `NOT_CRAN` | Set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false` |
`false` |
+| `ARROW_R_DEV` | More verbose messaging and regenerates some code | `false` |
+| `LIBARROW_DEBUG_DIR` | Directory to save source build logs | (unset) |
+| `CMAKE` | Alternative CMake path | (unset) |
+
+See below for more in-depth explanations of these environment variables.
+
+* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script won't look
for
+Arrow libraries on your system and instead will look to download/build them.
+ Use this if you have a version mismatch between installed system libraries
and
+ the version of the R package you're installing.
+* `LIBARROW_BINARY` : If set to `true`, the script will try to download a
binary
+ C++ library built for your operating system. You may also set it to some
other string, a related "distro-version" that has binaries built that work for
your OS.
+ If no binary is found, installation will fall back to building C++
dependencies from source.
+* `LIBARROW_BUILD` : If set to `false`, the build script
will not attempt to build the C++ from source. This means you will only get
a working `arrow` R package if a prebuilt binary is found.
Use this if you want to avoid compiling the C++ library, which may be slow
- and resource-intensive, and ensure that you only use a prebuilt binary.
-* `LIBARROW_MINIMAL`: If set to `false`, the build script
- will enable some optional features, including compression libraries, S3
- support, and additional alternative memory allocators. This will increase the
- source build time but results in a more fully functional library.
-* `NOT_CRAN`: If this variable is set to `true`, as the `devtools` package
does,
+ and resource-intensive, and ensure that you only use a prebuilt binary.
+* `LIBARROW_MINIMAL` : If set to `false`, the build script
+ will enable some optional features, including S3
+ support and additional alternative memory allocators. This will increase the
+ source build time but results in a more fully functional library. If set to
+ `true` turns off Parquet, Datasets, compression libraries, and other
optional
+ features.
+* `NOT_CRAN` : If this variable is set to `true`, as the `devtools` package
does,
the build script will set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false`
unless those environment variables are already set. This provides for a more
complete and fast installation experience for users who already have
`NOT_CRAN=true` as part of their workflow, without requiring additional
environment variables to be set.
-* `ARROW_R_DEV`: If set to `true`, more verbose messaging will be printed
+* `ARROW_R_DEV` : If set to `true`, more verbose messaging will be printed
in the build script. `arrow::install_arrow(verbose = TRUE)` sets this.
This variable also is needed if you're modifying C++
- code in the package: see the developer guide vignette.
-* `LIBARROW_DEBUG_DIR`: If the C++ library building from source fails
(`cmake`),
+ code in the package: see the developer guide vignette.
+* `LIBARROW_DEBUG_DIR` : If the C++ library building from source fails
(`cmake`),
there may be messages telling you to check some log file in the build
directory.
However, when the library is built during R package installation,
that location is in a temp directory that is already deleted.
To capture those logs, set this variable to an absolute (not relative) path
- and the log files will be copied there.
+ and the log files will be copied there.
Review comment:
```suggestion
and the log files will be copied there.
```
##########
File path: r/vignettes/install.Rmd
##########
@@ -379,63 +312,86 @@ setting `ARROW_WITH_ZSTD=OFF` to build without `zstd`; or
(3) uninstalling
the conflicting `zstd`.
See discussion [here](https://issues.apache.org/jira/browse/ARROW-8556).
-## Summary of build environment variables
-
-Some features are optional when you build Arrow from source. With the
exception of `ARROW_S3`, these are all `ON` by default in the bundled C++
build, but you can set them to `OFF` to disable them.
-
-* `ARROW_S3`: If set to `ON` S3 support will be built as long as the
- dependencies are met; if they are not met, the build script will turn this
`OFF`
-* `ARROW_JEMALLOC` for the `jemalloc` memory allocator
-* `ARROW_MIMALLOC` for the `mimalloc` memmory allocator
-* `ARROW_PARQUET`
-* `ARROW_DATASET`
-* `ARROW_JSON` for the JSON parsing library
-* `ARROW_WITH_RE2` for the RE2 regular expression library, used in some string
compute functions
-* `ARROW_WITH_UTF8PROC` for the UTF8Proc string library, used in many other
string compute functions
-* `ARROW_JSON` for JSON parsing
-* `ARROW_WITH_BROTLI`, `ARROW_WITH_BZ2`, `ARROW_WITH_LZ4`,
`ARROW_WITH_SNAPPY`, `ARROW_WITH_ZLIB`, and `ARROW_WITH_ZSTD` for various
compression algorithms
-
-
-There are a number of other variables that affect the `configure` script and
the bundled build script.
-By default, these are all unset. All boolean variables are case-insensitive.
-
-* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script
- won't look for Arrow libraries on your system and instead will look to
download/build them.
- Use this if you have a version mismatch between installed system libraries
- and the version of the R package you're installing.
-* `LIBARROW_BINARY`: If set to `true`, the script will try to download a binary
- C++ library built for your operating system.
- You may also set it to some other string,
- a related "distro-version" that has binaries built that work for your OS.
- If no binary is found, installation will fall back to building C++
- dependencies from source.
-* `LIBARROW_BUILD`: If set to `false`, the build script
+# Summary of build environment variables
+
+## libarrow configuration
+
+Some features are optional when you build Arrow from source - you can configure
+whether these components are built via the use of environment variables. The
+names of the environment variables which control these features and their
+default values are shown below.
+
+| Name | Description | Default Value |
+| ---| --- | :-: |
+| `ARROW_S3` | S3 support (if dependencies are met)* | `OFF` |
+| `ARROW_JEMALLOC` | The `jemalloc` memory allocator | `ON` |
+| `ARROW_MIMALLOC` | The `mimalloc` memory allocator | `ON` |
+| `ARROW_PARQUET` | | `ON` |
+| `ARROW_DATASET` | | `ON` |
+| `ARROW_JSON` | The JSON parsing library | `ON` |
+| `ARROW_WITH_RE2` | The RE2 regular expression library, used in some
string compute functions | `ON` |
+| `ARROW_WITH_UTF8PROC` | The UTF8Proc string library, used in many other
string compute functions | `ON` |
+| `ARROW_WITH_BROTLI` | Compression algorithm | `ON` |
+| `ARROW_WITH_BZ2` | Compression algorithm | `ON` |
+| `ARROW_WITH_LZ4` | Compression algorithm | `ON` |
+| `ARROW_WITH_SNAPPY` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZLIB` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZSTD` | Compression algorithm | `ON` |
+
+## R package configuration
+
+There are a number of other variables that affect the `configure` script and
+the bundled build script. All boolean variables are case-insensitive.
+
+| Name | Description | Default |
+| --- | --- | :-: |
+| `ARROW_USE_PKG_CONFIG` | Use `pkg-config` to search for `libarrow` install |
`true` |
+| `LIBARROW_BUILD` | Allow building from source | `true` |
+| `LIBARROW_BINARY` | Try to install `libarrow` binary instead of building
from source | `true` |
+| `LIBARROW_MINIMAL` | Build with minimal features enabled | (unset) |
+| `NOT_CRAN` | Set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false` |
`false` |
+| `ARROW_R_DEV` | More verbose messaging and regenerates some code | `false` |
+| `LIBARROW_DEBUG_DIR` | Directory to save source build logs | (unset) |
+| `CMAKE` | Alternative CMake path | (unset) |
+
+See below for more in-depth explanations of these environment variables.
+
+* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script won't look
for
+Arrow libraries on your system and instead will look to download/build them.
+ Use this if you have a version mismatch between installed system libraries
and
+ the version of the R package you're installing.
+* `LIBARROW_BINARY` : If set to `true`, the script will try to download a
binary
+ C++ library built for your operating system. You may also set it to some
other string, a related "distro-version" that has binaries built that work for
your OS.
+ If no binary is found, installation will fall back to building C++
dependencies from source.
+* `LIBARROW_BUILD` : If set to `false`, the build script
will not attempt to build the C++ from source. This means you will only get
a working `arrow` R package if a prebuilt binary is found.
Use this if you want to avoid compiling the C++ library, which may be slow
- and resource-intensive, and ensure that you only use a prebuilt binary.
-* `LIBARROW_MINIMAL`: If set to `false`, the build script
- will enable some optional features, including compression libraries, S3
- support, and additional alternative memory allocators. This will increase the
- source build time but results in a more fully functional library.
-* `NOT_CRAN`: If this variable is set to `true`, as the `devtools` package
does,
+ and resource-intensive, and ensure that you only use a prebuilt binary.
+* `LIBARROW_MINIMAL` : If set to `false`, the build script
+ will enable some optional features, including S3
+ support and additional alternative memory allocators. This will increase the
+ source build time but results in a more fully functional library. If set to
+ `true` turns off Parquet, Datasets, compression libraries, and other
optional
+ features.
+* `NOT_CRAN` : If this variable is set to `true`, as the `devtools` package
does,
the build script will set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false`
unless those environment variables are already set. This provides for a more
complete and fast installation experience for users who already have
`NOT_CRAN=true` as part of their workflow, without requiring additional
environment variables to be set.
-* `ARROW_R_DEV`: If set to `true`, more verbose messaging will be printed
+* `ARROW_R_DEV` : If set to `true`, more verbose messaging will be printed
in the build script. `arrow::install_arrow(verbose = TRUE)` sets this.
This variable also is needed if you're modifying C++
- code in the package: see the developer guide vignette.
-* `LIBARROW_DEBUG_DIR`: If the C++ library building from source fails
(`cmake`),
+ code in the package: see the developer guide vignette.
+* `LIBARROW_DEBUG_DIR` : If the C++ library building from source fails
(`cmake`),
Review comment:
😱 TIL this exists. I have fought so much with trying to grab those files
without this 🤦
##########
File path: r/vignettes/install.Rmd
##########
@@ -379,63 +312,86 @@ setting `ARROW_WITH_ZSTD=OFF` to build without `zstd`; or
(3) uninstalling
the conflicting `zstd`.
See discussion [here](https://issues.apache.org/jira/browse/ARROW-8556).
-## Summary of build environment variables
-
-Some features are optional when you build Arrow from source. With the
exception of `ARROW_S3`, these are all `ON` by default in the bundled C++
build, but you can set them to `OFF` to disable them.
-
-* `ARROW_S3`: If set to `ON` S3 support will be built as long as the
- dependencies are met; if they are not met, the build script will turn this
`OFF`
-* `ARROW_JEMALLOC` for the `jemalloc` memory allocator
-* `ARROW_MIMALLOC` for the `mimalloc` memmory allocator
-* `ARROW_PARQUET`
-* `ARROW_DATASET`
-* `ARROW_JSON` for the JSON parsing library
-* `ARROW_WITH_RE2` for the RE2 regular expression library, used in some string
compute functions
-* `ARROW_WITH_UTF8PROC` for the UTF8Proc string library, used in many other
string compute functions
-* `ARROW_JSON` for JSON parsing
-* `ARROW_WITH_BROTLI`, `ARROW_WITH_BZ2`, `ARROW_WITH_LZ4`,
`ARROW_WITH_SNAPPY`, `ARROW_WITH_ZLIB`, and `ARROW_WITH_ZSTD` for various
compression algorithms
-
-
-There are a number of other variables that affect the `configure` script and
the bundled build script.
-By default, these are all unset. All boolean variables are case-insensitive.
-
-* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script
- won't look for Arrow libraries on your system and instead will look to
download/build them.
- Use this if you have a version mismatch between installed system libraries
- and the version of the R package you're installing.
-* `LIBARROW_BINARY`: If set to `true`, the script will try to download a binary
- C++ library built for your operating system.
- You may also set it to some other string,
- a related "distro-version" that has binaries built that work for your OS.
- If no binary is found, installation will fall back to building C++
- dependencies from source.
-* `LIBARROW_BUILD`: If set to `false`, the build script
+# Summary of build environment variables
+
+## libarrow configuration
+
+Some features are optional when you build Arrow from source - you can configure
+whether these components are built via the use of environment variables. The
+names of the environment variables which control these features and their
+default values are shown below.
+
+| Name | Description | Default Value |
+| ---| --- | :-: |
+| `ARROW_S3` | S3 support (if dependencies are met)* | `OFF` |
+| `ARROW_JEMALLOC` | The `jemalloc` memory allocator | `ON` |
+| `ARROW_MIMALLOC` | The `mimalloc` memory allocator | `ON` |
+| `ARROW_PARQUET` | | `ON` |
+| `ARROW_DATASET` | | `ON` |
+| `ARROW_JSON` | The JSON parsing library | `ON` |
+| `ARROW_WITH_RE2` | The RE2 regular expression library, used in some
string compute functions | `ON` |
+| `ARROW_WITH_UTF8PROC` | The UTF8Proc string library, used in many other
string compute functions | `ON` |
+| `ARROW_WITH_BROTLI` | Compression algorithm | `ON` |
+| `ARROW_WITH_BZ2` | Compression algorithm | `ON` |
+| `ARROW_WITH_LZ4` | Compression algorithm | `ON` |
+| `ARROW_WITH_SNAPPY` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZLIB` | Compression algorithm | `ON` |
+| `ARROW_WITH_ZSTD` | Compression algorithm | `ON` |
+
+## R package configuration
+
+There are a number of other variables that affect the `configure` script and
+the bundled build script. All boolean variables are case-insensitive.
+
+| Name | Description | Default |
+| --- | --- | :-: |
+| `ARROW_USE_PKG_CONFIG` | Use `pkg-config` to search for `libarrow` install |
`true` |
+| `LIBARROW_BUILD` | Allow building from source | `true` |
+| `LIBARROW_BINARY` | Try to install `libarrow` binary instead of building
from source | `true` |
+| `LIBARROW_MINIMAL` | Build with minimal features enabled | (unset) |
+| `NOT_CRAN` | Set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false` |
`false` |
+| `ARROW_R_DEV` | More verbose messaging and regenerates some code | `false` |
+| `LIBARROW_DEBUG_DIR` | Directory to save source build logs | (unset) |
+| `CMAKE` | Alternative CMake path | (unset) |
+
+See below for more in-depth explanations of these environment variables.
+
+* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script won't look
for
+Arrow libraries on your system and instead will look to download/build them.
+ Use this if you have a version mismatch between installed system libraries
and
+ the version of the R package you're installing.
+* `LIBARROW_BINARY` : If set to `true`, the script will try to download a
binary
+ C++ library built for your operating system. You may also set it to some
other string, a related "distro-version" that has binaries built that work for
your OS.
+ If no binary is found, installation will fall back to building C++
dependencies from source.
+* `LIBARROW_BUILD` : If set to `false`, the build script
will not attempt to build the C++ from source. This means you will only get
a working `arrow` R package if a prebuilt binary is found.
Use this if you want to avoid compiling the C++ library, which may be slow
- and resource-intensive, and ensure that you only use a prebuilt binary.
-* `LIBARROW_MINIMAL`: If set to `false`, the build script
- will enable some optional features, including compression libraries, S3
- support, and additional alternative memory allocators. This will increase the
- source build time but results in a more fully functional library.
-* `NOT_CRAN`: If this variable is set to `true`, as the `devtools` package
does,
+ and resource-intensive, and ensure that you only use a prebuilt binary.
+* `LIBARROW_MINIMAL` : If set to `false`, the build script
+ will enable some optional features, including S3
+ support and additional alternative memory allocators. This will increase the
+ source build time but results in a more fully functional library. If set to
+ `true` turns off Parquet, Datasets, compression libraries, and other
optional
+ features.
+* `NOT_CRAN` : If this variable is set to `true`, as the `devtools` package
does,
the build script will set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false`
unless those environment variables are already set. This provides for a more
complete and fast installation experience for users who already have
`NOT_CRAN=true` as part of their workflow, without requiring additional
environment variables to be set.
-* `ARROW_R_DEV`: If set to `true`, more verbose messaging will be printed
+* `ARROW_R_DEV` : If set to `true`, more verbose messaging will be printed
in the build script. `arrow::install_arrow(verbose = TRUE)` sets this.
This variable also is needed if you're modifying C++
- code in the package: see the developer guide vignette.
-* `LIBARROW_DEBUG_DIR`: If the C++ library building from source fails
(`cmake`),
+ code in the package: see the developer guide vignette.
Review comment:
```suggestion
code in the package: see the developer guide vignette.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]