nealrichardson commented on a change in pull request #9579: URL: https://github.com/apache/arrow/pull/9579#discussion_r591999165
########## File path: r/vignettes/install.Rmd ########## @@ -166,10 +166,30 @@ run `install.packages("arrow")` or `R CMD INSTALL` but not when running `R CMD c unless you've set the `NOT_CRAN=true` environment variable. For the mechanics of how all this works, see the R package `configure` script, -which calls `tools/linuxlibs.R`. +which calls `tools/nixlibs.R`. If the C++ library is built from source, `inst/build_arrow_static.sh` is executed. This build script is also what is used to generate the prebuilt binaries. + +# Using `remotes::install_github(...)` + +If you need an Arrow installation from a specific repository or at a specific ref, `remotes::install_github()` should work on most platforms (with the notable exception of windows). This method is helpful if you need a full install of arrow that is separate from another install (e.g. we use this in [arrowbench](https://github.com/ursacomputing/arrowbench) to install development versions of arrow isolated from the system install). However there are some caveats to be aware of: + +* Setting the environment variable `FORCE_TOOLS_LIBS_SCRIPT` to `true` will avoid linking to any arrow libraries already installed and attempt to build arrow from the same source at the repository+ref given. +* If you are using the `FORCE_TOOLS_LIBS_SCRIPT` you must also set `build = FALSE` in the `remotes::install_github()` call. This is similar to checking out the repository and calling `R CMD INSTALL .` in the `arrow/r` directory (as opposed to first calling `R CMD BUILD .` and then installing the tar.gz file that produces, which is the default for `remotes::install_github()`). If you have arrow installed already, you may want to change your Makevars `CPPFLAGS` and `LDFLAGS` to `""` in order to prevent the installation process from attempting to link to already installed system versions of arrow. One way to do this temporarily is wrapping your `remotes::install_github()` call like so: `withr::with_makevars(list(CPPFLAGS = "", LDFLAGS = ""), remotes::install_github(...))`. +* Specify `subdir = "r"` to get the R package (or use `/r` after `username/repo` e.g. `apache/arrow/r`). +* On macOS you may need to also specify the environment variable `SDKROOT` to an appropriate location (typically something like `/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk`). This is most easily and reliably done using `xcrun --show-sdk-path` (to set the environment variable inside of R you can `Sys.setenv(SDKROOT=system("xcrun --show-sdk-path", intern = TRUE))`). Setting the `SDKROOT` variable is necessary on modern (at least >= 10.15) macOS SDKs. This allows the build system to find the appropriate standard libraries and headers when it is compiling them. Review comment: Sounds to me like the actual fix for this problem is to have the `build_bzip2` cmake macro pass in the SDKROOT like the build_jemalloc one does in the link you shared there. If not that, then TBH I'd rather turn off bzip2 in our build script than document this, this makes it look like it's hard to install arrow and it really shouldn't be. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org