jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702120622
##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
message("Please restart R to use the 'arrow' package.")
}
}
+
+
+#' Create an install package with all thirdparty dependencies
Review comment:
```suggestion
#' Create an source bundle that includes all thirdparty dependencies
```
##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
message("Please restart R to use the 'arrow' package.")
}
}
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the
version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package.
+#' @return The full path to `dest_file`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package _or_ run
+#'
`source("https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R")`
+#' * Run `create_package_with_all_dependencies("my_arrow_pkg.tar.gz")`
+#' * Copy the newly created `my_arrow_pkg.tar.gz` to the computer without
internet access
+#'
+#' ### On the computer without internet access, install the prepared package:
+#' * Install the `arrow` package from the copied file
+#' * `install.packages("my_arrow_pkg.tar.gz", dependencies = c("Depends",
"Imports", "LinkingTo"))`
+#' * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#'
+#' @examples
+#' \dontrun{
+#' new_pkg <- create_package_with_all_dependencies()
+#' # Note: this works when run in the same R session, but it's meant to be
+#' # copied to a different computer.
+#' install.packages(new_pkg, dependencies = c("Depends", "Imports",
"LinkingTo"))
+#' }
+#' @export
+create_package_with_all_dependencies <- function(dest_file = NULL, source_file
= NULL) {
+ if (is.null(source_file)) {
+ pkg_download_dir <- tempfile()
+ dir.create(pkg_download_dir)
+ on.exit(unlink(pkg_download_dir, recursive = TRUE), add = TRUE)
+ downloaded <- utils::download.packages("arrow", destdir =
pkg_download_dir, type = "source")
Review comment:
This is very minor, but do we want a message here saying that we are
downloading the file?
##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
message("Please restart R to use the 'arrow' package.")
}
}
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the
version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package.
Review comment:
```suggestion
#' @param source_file File path for the input tar.gz package. Defaults to
#' downloading the package from CRAN (or whatever you have set as the first
in `getOption("repos")`).
```
In adding this clarification, I realized that if someone has set as their
first repo RStudio Package Manager, this might do funny things (though, they
would be getting a binary which should have *most* of everything built already,
the next steps would either be ignored, or won't work.) Maybe we "just" need to
document that here and tell people if they are doing that to use the binary
they get from there.
##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,37 @@ satisfy C++ dependencies.
> Note that, unlike packages like `tensorflow`, `blogdown`, and others that
> require external dependencies, you do not need to run `install_arrow()`
> after a successful `arrow` installation.
+The `install-arrow.R` file also includes the
`create_package_with_all_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed.
+This function provides a way to download them in advance.
+Doing so may be useful when installing Arrow on a computer without internet
access.
+Note that Arrow _can_ be installed on a computer without internet access, but
Review comment:
```suggestion
Note that Arrow _can_ be installed on a computer without internet access
without doing this, but
```
##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
message("Please restart R to use the 'arrow' package.")
}
}
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the
version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package.
+#' @return The full path to `dest_file`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package _or_ run
+#'
`source("https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R")`
+#' * Run `create_package_with_all_dependencies("my_arrow_pkg.tar.gz")`
+#' * Copy the newly created `my_arrow_pkg.tar.gz` to the computer without
internet access
+#'
+#' ### On the computer without internet access, install the prepared package:
+#' * Install the `arrow` package from the copied file
+#' * `install.packages("my_arrow_pkg.tar.gz", dependencies = c("Depends",
"Imports", "LinkingTo"))`
+#' * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#'
+#' @examples
+#' \dontrun{
+#' new_pkg <- create_package_with_all_dependencies()
+#' # Note: this works when run in the same R session, but it's meant to be
+#' # copied to a different computer.
+#' install.packages(new_pkg, dependencies = c("Depends", "Imports",
"LinkingTo"))
+#' }
+#' @export
+create_package_with_all_dependencies <- function(dest_file = NULL, source_file
= NULL) {
Review comment:
I'm fine with the order these are in. Generally I like inputs before
outputs like Neal mentioned, but you're right that for most people
`source_file` will be left blank.
##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +423,129 @@ cmake_version <- function(cmd = "cmake") {
)
}
-with_s3_support <- function(env_vars) {
- arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" ||
tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+ # Because these are done as environment variables (as opposed to build
flags),
+ # setting these to "OFF" overrides any previous setting. We don't need to
+ # check the existing value.
+ turn_off <- c(
+ "ARROW_MIMALLOC" = "OFF",
+ "ARROW_JEMALLOC" = "OFF",
+ "ARROW_PARQUET" = "OFF", # depends on thrift
+ "ARROW_DATASET" = "OFF", # depends on parquet
+ "ARROW_S3" = "OFF",
+ "ARROW_WITH_BROTLI" = "OFF",
+ "ARROW_WITH_BZ2" = "OFF",
+ "ARROW_WITH_LZ4" = "OFF",
+ "ARROW_WITH_SNAPPY" = "OFF",
+ "ARROW_WITH_ZLIB" = "OFF",
+ "ARROW_WITH_ZSTD" = "OFF",
+ "ARROW_WITH_RE2" = "OFF",
+ "ARROW_WITH_UTF8PROC" = "OFF",
+ # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+ # that setting is will *not* be honored by build_arrow_static.sh until
+ # ARROW-13768 is resolved.
Review comment:
```suggestion
```
ARROW-13768 is resolved, so we can remove this, yeah?
##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +423,129 @@ cmake_version <- function(cmd = "cmake") {
)
}
-with_s3_support <- function(env_vars) {
- arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" ||
tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+ # Because these are done as environment variables (as opposed to build
flags),
+ # setting these to "OFF" overrides any previous setting. We don't need to
+ # check the existing value.
+ turn_off <- c(
+ "ARROW_MIMALLOC" = "OFF",
+ "ARROW_JEMALLOC" = "OFF",
+ "ARROW_PARQUET" = "OFF", # depends on thrift
+ "ARROW_DATASET" = "OFF", # depends on parquet
+ "ARROW_S3" = "OFF",
+ "ARROW_WITH_BROTLI" = "OFF",
+ "ARROW_WITH_BZ2" = "OFF",
+ "ARROW_WITH_LZ4" = "OFF",
+ "ARROW_WITH_SNAPPY" = "OFF",
+ "ARROW_WITH_ZLIB" = "OFF",
+ "ARROW_WITH_ZSTD" = "OFF",
+ "ARROW_WITH_RE2" = "OFF",
+ "ARROW_WITH_UTF8PROC" = "OFF",
+ # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+ # that setting is will *not* be honored by build_arrow_static.sh until
+ # ARROW-13768 is resolved.
+ "ARROW_JSON" = "OFF",
+ # The syntax to turn off XSIMD is different.
+ # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
+ "EXTRA_CMAKE_FLAGS" = paste(
+ env_var_list[["EXTRA_CMAKE_FLAGS"]],
+ "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
+ )
+ )
+ # Create a new env_var_list, with the values of turn_off set.
+ # replace() also adds new values if they didn't exist before
+ replace(env_var_list, names(turn_off), turn_off)
+}
+
+set_thirdparty_urls <- function(env_var_list) {
+ # This function does *not* check if existing *_SOURCE_URL variables are set.
+ # The directory tools/thirdparty_dependencies is created by
+ # create_package_with_all_dependencies() and saved in the tar file.
+ files <- list.files(thirdparty_dependency_dir, full.names = FALSE)
+ url_env_varname <- toupper(sub("(.*?)-.*", "ARROW_\\1_URL", files))
+ # Special handling for the aws dependencies, which have extra `-`
+ aws <- grepl("^aws", files)
+ url_env_varname[aws] <- sub(
+ "AWS_SDK_CPP", "AWSSDK",
+ gsub(
+ "-", "_",
+ sub(
+ "(AWS.*)-.*", "ARROW_\\1_URL",
+ toupper(files[aws])
+ )
+ )
+ )
+ full_filenames <- file.path(normalizePath(thirdparty_dependency_dir), files)
+
+ env_var_list <- replace(env_var_list, url_env_varname, full_filenames)
+ if (!quietly) {
+ env_var_list <- replace(env_var_list, "ARROW_VERBOSE_THIRDPARTY_BUILD",
"ON")
+ }
+ env_var_list
+}
+
+with_mimalloc <- function(env_var_list) {
+ arrow_mimalloc <- env_is("ARROW_MIMALLOC", "on") ||
env_is("LIBARROW_MINIMAL", "false")
+ if (arrow_mimalloc) {
Review comment:
```suggestion
# but if ARROW_MIMALLOC=OFF explicitly, we are definitely off, so override
if (env_is("ARROW_MIMALLOC", "off")) {
if (arrow_mimalloc) {
```
This wasn't in the original, but like S3 below it, we want to be able to do
`LIBARROW_MINIMAL=FALSE ARROW_MIMALLOC=OFF` and have everything on but mimalloc
off. And while we're moving this code around might also well fix this too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]