[ https://issues.apache.org/jira/browse/ARROW-12981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360167#comment-17360167 ]
Karl Dunkle Werner commented on ARROW-12981: -------------------------------------------- Adding new system software sources is a hard sell, so I think having a sysadmin install dependencies isn't a great solution for me until the dependencies are in the official Debian/RHEL software sources. > [R] Install source package from CRAN alone > ------------------------------------------ > > Key: ARROW-12981 > URL: https://issues.apache.org/jira/browse/ARROW-12981 > Project: Apache Arrow > Issue Type: Wish > Components: Packaging, R > Affects Versions: 4.0.1 > Environment: Linux > Reporter: Karl Dunkle Werner > Priority: Major > > Hello, > I would like to install {{Arrow}} on Linux using only CRAN, without > downloading additional files from Github, Apache, or Ursa Labs. I understand > this is a big ask, and might not be a priority for you all. Feel free to > close if you feel that this is out of scope. > Why is a CRAN-only installation useful? > # It's common for organizations to set up firewalls that prevent arbitrary > downloads, but allow access to their own internal CRAN mirror. > ** Sometimes these firewalls also allow requests to Github, but often not. > # On a broader level, my favorite thing about R is CRAN, the CRAN > maintainers, and their > [policy|https://cran.r-project.org/web/packages/policies.html#Source-packages] > that "Source packages may not contain any form of binary executable code." > By distributing most of the Arrow code separately (either as source C++ or a > compiled library), automated code archives and other source-based tools > become much less useful. > Of course, {{arrow}} isn't the only R package to depend on external libraries > or distribute code separately. If a CRAN-only approach isn't viable, it would > still be useful to have an all-offline method. I'm also having trouble > getting an offline install to work, even with a local copy of the Arrow repo. > (See the bottom of the script below.) > > What does does installing offline look like now? > Here's a bash script that approximates installing behind a firewall. > {code:sh} > git clone --depth 1 g...@github.com:apache/arrow.git test_arrow > cd test_arrow > wget 'https://cran.r-project.org/src/contrib/arrow_4.0.1.tar.gz' > # Set up a temporary R library (optional) > mkdir test_r_lib > export R_LIBS_USER=test_r_lib > export ARROW_R_DEV=true > export LIBARROW_MINIMAL=false > export LIBARROW_DOWNLOAD=false > export LIBARROW_BINARY=false > export LIBARROW_BUILD=true > # These are all of the direct dependencies, including Suggests > # This isn't required if the packages are already installed > Rscript -e "install.packages(c('assertthat', 'bit64', 'purrr', 'R6', 'rlang', > 'tidyselect', 'vctrs', 'cpp11', 'decor', 'distro', 'dplyr', 'hms', 'knitr', > 'lubridate', 'pkgload', 'reticulate', 'rmarkdown', 'stringr', 'testthat', > 'tibble', 'withr'))" > # Disable your internet connection here. > # Now try to install the R package we downloaded with wget. > # This is an approximation of being behind a firewall. > Rscript -e 'install.packages("arrow_4.0.1.tar.gz", repos=NULL)' > # It successfully installs the R component, but not the C++ library, > # even with LIBARROW_BUILD=true > Rscript -e "arrow::arrow_available()" > # [1] FALSE > # As mentioned in the installation vignette, > # we can R CMD INSTALL in the git repo. > R CMD INSTALL r > # This will try to build the C++ library, but fails when mimalloc and > # jemalloc can't be downloaded from Github. > # (Seems not to be affected by LIBARROW_DOWNLOAD=false). > # When C++ compilation fails, the R component still installs. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)