This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 5cd1df6  ARROW-902: [C++] Script for downloading all thirdparty build 
dependencies and configuration for offline builds
5cd1df6 is described below

commit 5cd1df6bc7c79f079e1a8c1a7a6f1e36c91a8687
Author: Wes McKinney <[email protected]>
AuthorDate: Sat Jun 23 04:13:31 2018 -0400

    ARROW-902: [C++] Script for downloading all thirdparty build dependencies 
and configuration for offline builds
    
    This patch provides a script for downloading all the third party 
dependencies to a particular location, and new environment variables that can 
be used to direct the build system to use local files for ExternalProject 
builds instead of accessing the internet.
    
    Author: Wes McKinney <[email protected]>
    
    Closes #2141 from wesm/ARROW-902 and squashes the following commits:
    
    4583759a <Wes McKinney> Fix Orc download URL
    01edf60a <Wes McKinney> Add offline build how-to in thirdparty/README.md
    a38249e7 <Wes McKinney> Use -c with wget
    4e7c819d <Wes McKinney> Read toolchain versions from thirdparty/versions.txt
    9d4478b5 <Wes McKinney> First draft of dependency download, offline build
---
 cpp/README.md                               |  52 +++----
 cpp/cmake_modules/ThirdpartyToolchain.cmake | 213 ++++++++++++++++++++--------
 cpp/thirdparty/README.md                    |  89 ++++++++++++
 cpp/thirdparty/download_dependencies.sh     |  82 +++++++++++
 cpp/thirdparty/versions.txt                 |  34 +++++
 5 files changed, 381 insertions(+), 89 deletions(-)

diff --git a/cpp/README.md b/cpp/README.md
index 3091588..e7ddb0a 100644
--- a/cpp/README.md
+++ b/cpp/README.md
@@ -21,10 +21,10 @@
 
 ## System setup
 
-Arrow uses CMake as a build configuration system. Currently, it supports 
in-source and
-out-of-source builds with the latter one being preferred.
+Arrow uses CMake as a build configuration system. Currently, it supports
+in-source and out-of-source builds with the latter one being preferred.
 
-Build Arrow requires:
+Building Arrow requires:
 
 * A C++11-enabled compiler. On Linux, gcc 4.8 and higher should be sufficient.
 * CMake
@@ -108,11 +108,11 @@ ASAN, and `ARROW_USE_ASAN` is mutually-exclusive with the 
valgrind option
 
 ### Building/Running fuzzers
 
-Fuzzers can help finding unhandled exceptions and problems with untrusted 
input that
-may lead to crashes, security issues and undefined behavior. They do this by
-generating random input data and observing the behavior of the executed code. 
To build
-the fuzzer code, LLVM is required (GCC-based compilers won't work). You can 
build them
-using the following code:
+Fuzzers can help finding unhandled exceptions and problems with untrusted input
+that may lead to crashes, security issues and undefined behavior. They do this
+by generating random input data and observing the behavior of the executed
+code. To build the fuzzer code, LLVM is required (GCC-based compilers won't
+work). You can build them using the following code:
 
     cmake -DARROW_FUZZING=ON -DARROW_USE_ASAN=ON ..
 
@@ -156,29 +156,18 @@ There are some problems that may occur during the 
compilation process:
 - libfuzzer was not distributed with your LLVM: `ld: file not found: 
.../libLLVMFuzzer.a`
 - your LLVM is too old: `clang: error: unsupported argument 'fuzzer' to option 
'fsanitize='`
 
-### Third-party environment variables
-
-To set up your own specific build toolchain, here are the relevant environment
-variables
-
-* Boost: `BOOST_ROOT`
-* Googletest: `GTEST_HOME` (only required to build the unit tests)
-* gflags: `GFLAGS_HOME` (only required to build the unit tests)
-* Google Benchmark: `GBENCHMARK_HOME` (only required if building benchmarks)
-* Flatbuffers: `FLATBUFFERS_HOME` (only required for the IPC extensions)
-* Hadoop: `HADOOP_HOME` (only required for the HDFS I/O extensions)
-* jemalloc: `JEMALLOC_HOME`
-* brotli: `BROTLI_HOME`, can be disabled with `-DARROW_WITH_BROTLI=off`
-* lz4: `LZ4_HOME`, can be disabled with `-DARROW_WITH_LZ4=off`
-* snappy: `SNAPPY_HOME`, can be disabled with `-DARROW_WITH_SNAPPY=off`
-* zlib: `ZLIB_HOME`, can be disabled with `-DARROW_WITH_ZLIB=off`
-* zstd: `ZSTD_HOME`, can be disabled with `-DARROW_WITH_ZSTD=off`
-
-If you have all of your toolchain libraries installed at the same prefix, you
-can use the environment variable `$ARROW_BUILD_TOOLCHAIN` to automatically set
-all of these variables. Note that `ARROW_BUILD_TOOLCHAIN` will not set
-`BOOST_ROOT`, so if you have custom Boost installation, you must set this
-environment variable separately.
+### Third-party dependencies and configuration
+
+Arrow depends on a number of third-party libraries. We support these in a few
+ways:
+
+* Building dependencies from source by downloading archives from the internet
+* Building dependencies from source using from local archives (to allow offline
+  builds)
+* Building with locally-installed libraries
+
+See [thirdparty/README.md][5] for details about these options and how to
+configure your build toolchain.
 
 ### Building Python integration library (optional)
 
@@ -382,3 +371,4 @@ both of these options would be used rarely. Current known 
uses-cases when they a
 [2]: https://github.com/apache/arrow/blob/master/cpp/apidoc/Windows.md
 [3]: https://google.github.io/styleguide/cppguide.html
 [4]: https://github.com/include-what-you-use/include-what-you-use
+[5]: https://github.com/apache/arrow/blob/master/cpp/thirdparty/README.md
\ No newline at end of file
diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 4dfe043..563a314 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -17,44 +17,9 @@
 
 
 # ----------------------------------------------------------------------
-# Thirdparty toolchain
+# Thirdparty versions, environment variables, source URLs
 
 set(THIRDPARTY_DIR "${CMAKE_SOURCE_DIR}/thirdparty")
-set(GFLAGS_VERSION "2.2.0")
-set(GTEST_VERSION "1.8.0")
-set(GBENCHMARK_VERSION "1.4.1")
-set(FLATBUFFERS_VERSION "1.9.0")
-set(JEMALLOC_VERSION "17c897976c60b0e6e4f4a365c751027244dada7a")
-set(SNAPPY_VERSION "1.1.3")
-set(BROTLI_VERSION "v0.6.0")
-set(LZ4_VERSION "1.7.5")
-set(ZSTD_VERSION "1.2.0")
-set(PROTOBUF_VERSION "2.6.0")
-set(GRPC_VERSION "94582910ad7f82ad447ecc72e6548cb669e4f7a9") # v1.6.5
-set(ORC_VERSION "cf00b67795717ab3eb04e950780ed6d104109017")
-
-string(TOUPPER ${CMAKE_BUILD_TYPE} UPPERCASE_BUILD_TYPE)
-
-set(EP_CXX_FLAGS "${CMAKE_CXX_FLAGS} 
${CMAKE_CXX_FLAGS_${UPPERCASE_BUILD_TYPE}}")
-set(EP_C_FLAGS "${CMAKE_C_FLAGS} ${CMAKE_C_FLAGS_${UPPERCASE_BUILD_TYPE}}")
-
-if (NOT ARROW_VERBOSE_THIRDPARTY_BUILD)
-  set(EP_LOG_OPTIONS
-    LOG_CONFIGURE 1
-    LOG_BUILD 1
-    LOG_INSTALL 1
-    LOG_DOWNLOAD 1)
-  set(Boost_DEBUG FALSE)
-else()
-  set(EP_LOG_OPTIONS)
-  set(Boost_DEBUG TRUE)
-endif()
-
-if (NOT MSVC)
-  # Set -fPIC on all external projects
-  set(EP_CXX_FLAGS "${EP_CXX_FLAGS} -fPIC")
-  set(EP_C_FLAGS "${EP_C_FLAGS} -fPIC")
-endif()
 
 if (NOT "$ENV{ARROW_BUILD_TOOLCHAIN}" STREQUAL "")
   set(FLATBUFFERS_HOME "$ENV{ARROW_BUILD_TOOLCHAIN}")
@@ -114,6 +79,145 @@ if (DEFINED ENV{PROTOBUF_HOME})
   set(PROTOBUF_HOME "$ENV{PROTOBUF_HOME}")
 endif()
 
+# ----------------------------------------------------------------------
+# Versions and URLs for toolchain builds, which also can be used to configure
+# offline builds
+
+# Read toolchain versions from cpp/thirdparty/versions.txt
+file(STRINGS "${THIRDPARTY_DIR}/versions.txt" TOOLCHAIN_VERSIONS_TXT)
+foreach(_VERSION_ENTRY ${TOOLCHAIN_VERSIONS_TXT})
+  # Exclude comments
+  if(_VERSION_ENTRY MATCHES "#.*")
+    continue()
+  endif()
+
+  string(REGEX MATCH "^[^=]*" _LIB_NAME ${_VERSION_ENTRY})
+  string(REPLACE "${_LIB_NAME}=" "" _LIB_VERSION ${_VERSION_ENTRY})
+
+  # Skip blank or malformed lines
+  if(${_LIB_VERSION} STREQUAL "")
+    continue()
+  endif()
+
+  # For debugging
+  message(STATUS "${_LIB_NAME}: ${_LIB_VERSION}")
+
+  set(${_LIB_NAME} "${_LIB_VERSION}")
+endforeach()
+
+if (DEFINED ENV{ARROW_BOOST_URL})
+  set(BOOST_SOURCE_URL "$ENV{ARROW_BOOST_URL}")
+else()
+  string(REPLACE "." "_" BOOST_VERSION_UNDERSCORES ${BOOST_VERSION})
+  set(BOOST_SOURCE_URL
+    
"https://dl.bintray.com/boostorg/release/${BOOST_VERSION}/source/boost_${BOOST_VERSION_UNDERSCORES}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_GTEST_URL})
+  set(GTEST_SOURCE_URL "$ENV{ARROW_GTEST_URL}")
+else()
+  set(GTEST_SOURCE_URL 
"https://github.com/google/googletest/archive/release-${GTEST_VERSION}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_GFLAGS_URL})
+  set(GFLAGS_SOURCE_URL "$ENV{ARROW_GFLAGS_URL}")
+else()
+  set(GFLAGS_SOURCE_URL 
"https://github.com/gflags/gflags/archive/v${GFLAGS_VERSION}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_GBENCHMARK_URL})
+  set(GBENCHMARK_SOURCE_URL "$ENV{ARROW_GBENCHMARK_URL}")
+else()
+  set(GBENCHMARK_SOURCE_URL 
"https://github.com/google/benchmark/archive/v${GBENCHMARK_VERSION}.tar.gz";)
+endif()
+
+set(RAPIDJSON_SOURCE_MD5 "badd12c511e081fec6c89c43a7027bce")
+if (DEFINED ENV{ARROW_RAPIDJSON_URL})
+  set(RAPIDJSON_SOURCE_URL "$ENV{ARROW_RAPIDJSON_URL}")
+else()
+  set(RAPIDJSON_SOURCE_URL 
"https://github.com/miloyip/rapidjson/archive/v${RAPIDJSON_VERSION}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_FLATBUFFERS_URL})
+  set(FLATBUFFERS_SOURCE_URL "$ENV{ARROW_FLATBUFFERS_URL}")
+else()
+  set(FLATBUFFERS_SOURCE_URL 
"https://github.com/google/flatbuffers/archive/v${FLATBUFFERS_VERSION}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_SNAPPY_URL})
+  set(SNAPPY_SOURCE_URL "$ENV{ARROW_SNAPPY_URL}")
+else()
+  set(SNAPPY_SOURCE_URL 
"https://github.com/google/snappy/releases/download/${SNAPPY_VERSION}/snappy-${SNAPPY_VERSION}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_BROTLI_URL})
+  set(BROTLI_SOURCE_URL "$ENV{ARROW_BROTLI_URL}")
+else()
+  set(BROTLI_SOURCE_URL 
"https://github.com/google/brotli/archive/${BROTLI_VERSION}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_LZ4_URL})
+  set(LZ4_SOURCE_URL "$ENV{ARROW_LZ4_URL}")
+else()
+  set(LZ4_SOURCE_URL 
"https://github.com/lz4/lz4/archive/v${LZ4_VERSION}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_ZLIB_URL})
+  set(ZLIB_SOURCE_URL "$ENV{ARROW_ZLIB_URL}")
+else()
+  set(ZLIB_SOURCE_URL "http://zlib.net/fossils/zlib-${ZLIB_VERSION}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_ZSTD_URL})
+  set(ZSTD_SOURCE_URL "$ENV{ARROW_ZSTD_URL}")
+else()
+  set(ZSTD_SOURCE_URL 
"https://github.com/facebook/zstd/archive/v${ZSTD_VERSION}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_PROTOBUF_URL})
+  set(PROTOBUF_SOURCE_URL "$ENV{ARROW_PROTOBUF_URL}")
+else()
+  set(PROTOBUF_SOURCE_URL 
"https://github.com/google/protobuf/releases/download/v${PROTOBUF_VERSION}/protobuf-${PROTOBUF_VERSION}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_GRPC_URL})
+  set(GRPC_SOURCE_URL "$ENV{ARROW_GRPC_URL}")
+else()
+  set(GRPC_SOURCE_URL 
"https://github.com/grpc/grpc/archive/v${GRPC_VERSION}.tar.gz";)
+endif()
+
+if (DEFINED ENV{ARROW_ORC_URL})
+  set(ORC_SOURCE_URL "$ENV{ARROW_ORC_URL}")
+else()
+  set(ORC_SOURCE_URL 
"https://github.com/apache/orc/archive/rel/release-${ORC_VERSION}.tar.gz";)
+endif()
+
+# ----------------------------------------------------------------------
+# ExternalProject options
+
+string(TOUPPER ${CMAKE_BUILD_TYPE} UPPERCASE_BUILD_TYPE)
+
+set(EP_CXX_FLAGS "${CMAKE_CXX_FLAGS} 
${CMAKE_CXX_FLAGS_${UPPERCASE_BUILD_TYPE}}")
+set(EP_C_FLAGS "${CMAKE_C_FLAGS} ${CMAKE_C_FLAGS_${UPPERCASE_BUILD_TYPE}}")
+
+if (NOT ARROW_VERBOSE_THIRDPARTY_BUILD)
+  set(EP_LOG_OPTIONS
+    LOG_CONFIGURE 1
+    LOG_BUILD 1
+    LOG_INSTALL 1
+    LOG_DOWNLOAD 1)
+  set(Boost_DEBUG FALSE)
+else()
+  set(EP_LOG_OPTIONS)
+  set(Boost_DEBUG TRUE)
+endif()
+
+if (NOT MSVC)
+  # Set -fPIC on all external projects
+  set(EP_CXX_FLAGS "${EP_CXX_FLAGS} -fPIC")
+  set(EP_C_FLAGS "${EP_C_FLAGS} -fPIC")
+endif()
+
 # Ensure that a default make is set
 if ("${MAKE}" STREQUAL "")
     if (NOT MSVC)
@@ -146,10 +250,6 @@ set(Boost_ADDITIONAL_VERSIONS
   "1.62.0" "1.61"
   "1.61.0" "1.62"
   "1.60.0" "1.60")
-list(GET Boost_ADDITIONAL_VERSIONS 2 BOOST_LATEST_VERSION)
-string(REPLACE "." "_" BOOST_LATEST_VERSION_IN_PATH ${BOOST_LATEST_VERSION})
-set(BOOST_LATEST_URL
-  
"https://dl.bintray.com/boostorg/release/${BOOST_LATEST_VERSION}/source/boost_${BOOST_LATEST_VERSION_IN_PATH}.tar.gz";)
 
 if (ARROW_BOOST_VENDORED)
   set(BOOST_PREFIX "${CMAKE_CURRENT_BINARY_DIR}/boost_ep-prefix/src/boost_ep")
@@ -185,7 +285,7 @@ if (ARROW_BOOST_VENDORED)
       "cxxflags=-fPIC")
   endif()
   ExternalProject_Add(boost_ep
-    URL ${BOOST_LATEST_URL}
+    URL ${BOOST_SOURCE_URL}
     BUILD_BYPRODUCTS ${BOOST_BUILD_PRODUCTS}
     BUILD_IN_SOURCE 1
     CONFIGURE_COMMAND ${BOOST_CONFIGURE_COMMAND}
@@ -288,7 +388,7 @@ if(ARROW_BUILD_TESTS OR ARROW_BUILD_BENCHMARKS)
     endif()
 
     ExternalProject_Add(googletest_ep
-      URL 
"https://github.com/google/googletest/archive/release-${GTEST_VERSION}.tar.gz";
+      URL ${GTEST_SOURCE_URL}
       BUILD_BYPRODUCTS ${GTEST_STATIC_LIB} ${GTEST_MAIN_STATIC_LIB}
       CMAKE_ARGS ${GTEST_CMAKE_ARGS}
       ${EP_LOG_OPTIONS})
@@ -314,7 +414,6 @@ if(ARROW_BUILD_TESTS OR ARROW_BUILD_BENCHMARKS)
   if("${GFLAGS_HOME}" STREQUAL "")
     set(GFLAGS_CMAKE_CXX_FLAGS ${EP_CXX_FLAGS})
 
-    set(GFLAGS_URL 
"https://github.com/gflags/gflags/archive/v${GFLAGS_VERSION}.tar.gz";)
     set(GFLAGS_PREFIX 
"${CMAKE_CURRENT_BINARY_DIR}/gflags_ep-prefix/src/gflags_ep")
     set(GFLAGS_HOME "${GFLAGS_PREFIX}")
     set(GFLAGS_INCLUDE_DIR "${GFLAGS_PREFIX}/include")
@@ -337,7 +436,7 @@ if(ARROW_BUILD_TESTS OR ARROW_BUILD_BENCHMARKS)
                           -DCMAKE_CXX_FLAGS=${GFLAGS_CMAKE_CXX_FLAGS})
 
     ExternalProject_Add(gflags_ep
-      URL ${GFLAGS_URL}
+      URL ${GFLAGS_SOURCE_URL}
       ${EP_LOG_OPTIONS}
       BUILD_IN_SOURCE 1
       BUILD_BYPRODUCTS "${GFLAGS_STATIC_LIB}"
@@ -389,7 +488,7 @@ if(ARROW_BUILD_BENCHMARKS)
     endif()
 
     ExternalProject_Add(gbenchmark_ep
-      URL 
"https://github.com/google/benchmark/archive/v${GBENCHMARK_VERSION}.tar.gz";
+      URL ${GBENCHMARK_SOURCE_URL}
       BUILD_BYPRODUCTS "${GBENCHMARK_STATIC_LIB}"
       CMAKE_ARGS ${GBENCHMARK_CMAKE_ARGS}
       ${EP_LOG_OPTIONS})
@@ -414,8 +513,8 @@ if (ARROW_IPC)
   if("${RAPIDJSON_HOME}" STREQUAL "")
     ExternalProject_Add(rapidjson_ep
       PREFIX "${CMAKE_BINARY_DIR}"
-      URL "https://github.com/miloyip/rapidjson/archive/v1.1.0.tar.gz";
-      URL_MD5 "badd12c511e081fec6c89c43a7027bce"
+      URL ${RAPIDJSON_SOURCE_URL}
+      URL_MD5 ${RAPIDJSON_SOURCE_MD5}
       CONFIGURE_COMMAND ""
       BUILD_COMMAND ""
       BUILD_IN_SOURCE 1
@@ -446,7 +545,7 @@ if (ARROW_IPC)
     endif()
     # We always need to do release builds, otherwise flatc will not be 
installed.
     ExternalProject_Add(flatbuffers_ep
-      URL 
"https://github.com/google/flatbuffers/archive/v${FLATBUFFERS_VERSION}.tar.gz";
+      URL ${FLATBUFFERS_SOURCE_URL}
       CMAKE_ARGS
       "-DCMAKE_CXX_FLAGS=${FLATBUFFERS_CMAKE_CXX_FLAGS}"
       "-DCMAKE_INSTALL_PREFIX:PATH=${FLATBUFFERS_PREFIX}"
@@ -580,7 +679,7 @@ if (ARROW_WITH_ZLIB)
                         -DBUILD_SHARED_LIBS=OFF)
 
     ExternalProject_Add(zlib_ep
-      URL "http://zlib.net/fossils/zlib-1.2.8.tar.gz";
+      URL ${ZLIB_SOURCE_URL}
       ${EP_LOG_OPTIONS}
       BUILD_BYPRODUCTS "${ZLIB_STATIC_LIB}"
       CMAKE_ARGS ${ZLIB_CMAKE_ARGS})
@@ -613,7 +712,6 @@ if (ARROW_WITH_SNAPPY)
       set(SNAPPY_STATIC_LIB_NAME snappy)
     endif()
     set(SNAPPY_STATIC_LIB 
"${SNAPPY_PREFIX}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}${SNAPPY_STATIC_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}")
-    set(SNAPPY_SRC_URL 
"https://github.com/google/snappy/releases/download/${SNAPPY_VERSION}/snappy-${SNAPPY_VERSION}.tar.gz";)
 
     if (${UPPERCASE_BUILD_TYPE} EQUAL "RELEASE")
       if (APPLE)
@@ -642,7 +740,7 @@ if (ARROW_WITH_SNAPPY)
         BUILD_IN_SOURCE 1
         BUILD_COMMAND ${MAKE}
         INSTALL_DIR ${SNAPPY_PREFIX}
-        URL ${SNAPPY_SRC_URL}
+        URL ${SNAPPY_SOURCE_URL}
         CMAKE_ARGS ${SNAPPY_CMAKE_ARGS}
         BUILD_BYPRODUCTS "${SNAPPY_STATIC_LIB}")
     else()
@@ -652,7 +750,7 @@ if (ARROW_WITH_SNAPPY)
         BUILD_IN_SOURCE 1
         BUILD_COMMAND ${MAKE}
         INSTALL_DIR ${SNAPPY_PREFIX}
-        URL ${SNAPPY_SRC_URL}
+        URL ${SNAPPY_SOURCE_URL}
         BUILD_BYPRODUCTS "${SNAPPY_STATIC_LIB}")
     endif()
     set(SNAPPY_VENDORED 1)
@@ -696,7 +794,7 @@ if (ARROW_WITH_BROTLI)
                           -DBUILD_SHARED_LIBS=OFF)
 
     ExternalProject_Add(brotli_ep
-      URL "https://github.com/google/brotli/archive/${BROTLI_VERSION}.tar.gz";
+      URL ${BROTLI_SOURCE_URL}
       BUILD_BYPRODUCTS "${BROTLI_STATIC_LIBRARY_ENC}" 
"${BROTLI_STATIC_LIBRARY_DEC}" "${BROTLI_STATIC_LIBRARY_COMMON}"
       ${BROTLI_BUILD_BYPRODUCTS}
       ${EP_LOG_OPTIONS}
@@ -758,7 +856,7 @@ if (ARROW_WITH_LZ4)
     endif()
 
     ExternalProject_Add(lz4_ep
-        URL "https://github.com/lz4/lz4/archive/v${LZ4_VERSION}.tar.gz";
+        URL ${LZ4_SOURCE_URL}
         ${EP_LOG_OPTIONS}
         UPDATE_COMMAND ""
         ${LZ4_PATCH_COMMAND}
@@ -811,7 +909,7 @@ if (ARROW_WITH_ZSTD)
     endif()
 
     ExternalProject_Add(zstd_ep
-        URL "https://github.com/facebook/zstd/archive/v${ZSTD_VERSION}.tar.gz";
+        URL ${ZSTD_SOURCE_URL}
         ${EP_LOG_OPTIONS}
         UPDATE_COMMAND ""
         ${ZSTD_PATCH_COMMAND}
@@ -891,12 +989,11 @@ if (ARROW_ORC)
     set (PROTOBUF_HOME "${PROTOBUF_PREFIX}")
     set (PROTOBUF_INCLUDE_DIR "${PROTOBUF_PREFIX}/include")
     set (PROTOBUF_STATIC_LIB 
"${PROTOBUF_PREFIX}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}protobuf${CMAKE_STATIC_LIBRARY_SUFFIX}")
-    set (PROTOBUF_SRC_URL 
"https://github.com/google/protobuf/releases/download/v${PROTOBUF_VERSION}/protobuf-${PROTOBUF_VERSION}.tar.gz";)
 
     ExternalProject_Add(protobuf_ep
       CONFIGURE_COMMAND "./configure" "--disable-shared" 
"--prefix=${PROTOBUF_PREFIX}" "CXXFLAGS=${EP_CXX_FLAGS}"
       BUILD_IN_SOURCE 1
-      URL ${PROTOBUF_SRC_URL}
+      URL ${PROTOBUF_SOURCE_URL}
       BUILD_BYPRODUCTS "${PROTOBUF_STATIC_LIB}"
       ${EP_LOG_OPTIONS})
 
@@ -952,7 +1049,7 @@ if (ARROW_ORC)
                       -DZLIB_HOME=${ZLIB_HOME})
 
   ExternalProject_Add(orc_ep
-    URL "https://github.com/apache/orc/archive/${ORC_VERSION}.tar.gz";
+    URL ${ORC_SOURCE_URL}
     BUILD_BYPRODUCTS ${ORC_STATIC_LIB}
     CMAKE_ARGS ${ORC_CMAKE_ARGS}
     ${EP_LOG_OPTIONS})
diff --git a/cpp/thirdparty/README.md b/cpp/thirdparty/README.md
new file mode 100644
index 0000000..f4f89f5
--- /dev/null
+++ b/cpp/thirdparty/README.md
@@ -0,0 +1,89 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Arrow C++ Thirdparty Dependencies
+
+The version numbers for our third-party dependencies are listed in
+`thirdparty/versions.txt`. This is used by the CMake build system as well as
+the dependency downloader script (see below), which can be used to set up
+offline builds.
+
+## Configuring your own build toolchain
+
+To set up your own specific build toolchain, here are the relevant environment
+variables
+
+* Boost: `BOOST_ROOT`
+* Googletest: `GTEST_HOME` (only required to build the unit tests)
+* gflags: `GFLAGS_HOME` (only required to build the unit tests)
+* Google Benchmark: `GBENCHMARK_HOME` (only required if building benchmarks)
+* Flatbuffers: `FLATBUFFERS_HOME` (only required for -DARROW_IPC=on, which is
+  the default)
+* Hadoop: `HADOOP_HOME` (only required for the HDFS I/O extensions)
+* jemalloc: `JEMALLOC_HOME`
+* brotli: `BROTLI_HOME`, can be disabled with `-DARROW_WITH_BROTLI=off`
+* lz4: `LZ4_HOME`, can be disabled with `-DARROW_WITH_LZ4=off`
+* snappy: `SNAPPY_HOME`, can be disabled with `-DARROW_WITH_SNAPPY=off`
+* zlib: `ZLIB_HOME`, can be disabled with `-DARROW_WITH_ZLIB=off`
+* zstd: `ZSTD_HOME`, can be disabled with `-DARROW_WITH_ZSTD=off`
+
+If you have all of your toolchain libraries installed at the same prefix, you
+can use the environment variable `$ARROW_BUILD_TOOLCHAIN` to automatically set
+all of these variables. Note that `ARROW_BUILD_TOOLCHAIN` will not set
+`BOOST_ROOT`, so if you have custom Boost installation, you must set this
+environment variable separately.
+
+## Configuring for offline builds
+
+If you do not use the above variables to direct the Arrow build system to
+preinstalled dependencies, they will be built automatically by the build
+system. The source archive for each dependency will be downloaded via the
+internet, which can cause issues in environments with limited access to the
+internet.
+
+To enable offline builds, you can download the source artifacts yourself and
+use environment variables of the form `ARROW_$LIBRARY_URL` to direct the build
+system to read from a local file rather than accessing the internet.
+
+To make this easier for you, we have prepared a script
+`thirdparty/download_dependencies.sh` which will download the correct version
+of each dependency to a directory of your choosing. It will print a list of
+bash-style environment variable statements at the end to use for your build
+script:
+
+```shell
+$ ./thirdparty/download_dependencies $HOME/arrow-thirdparty-deps
+# some output omitted
+
+# Environment variables for offline Arrow build
+export ARROW_BOOST_URL=$HOME/arrow-thirdparty-deps/boost.tar.gz
+export ARROW_GTEST_URL=$HOME/arrow-thirdparty-deps/gtest.tar.gz
+export ARROW_GFLAGS_URL=$HOME/arrow-thirdparty-deps/gflags.tar.gz
+export ARROW_GBENCHMARK_URL=$HOME/arrow-thirdparty-deps/gbenchmark.tar.gz
+export ARROW_FLATBUFFERS_URL=$HOME/arrow-thirdparty-deps/flatbuffers.tar.gz
+export ARROW_RAPIDJSON_URL=$HOME/arrow-thirdparty-deps/rapidjson.tar.gz
+export ARROW_SNAPPY_URL=$HOME/arrow-thirdparty-deps/snappy.tar.gz
+export ARROW_BROTLI_URL=$HOME/arrow-thirdparty-deps/brotli.tar.gz
+export ARROW_LZ4_URL=$HOME/arrow-thirdparty-deps/lz4.tar.gz
+export ARROW_ZLIB_URL=$HOME/arrow-thirdparty-deps/zlib.tar.gz
+export ARROW_ZSTD_URL=$HOME/arrow-thirdparty-deps/zstd.tar.gz
+export ARROW_PROTOBUF_URL=$HOME/arrow-thirdparty-deps/protobuf.tar.gz
+export ARROW_GRPC_URL=$HOME/arrow-thirdparty-deps/grpc.tar.gz
+export ARROW_ORC_URL=$HOME/arrow-thirdparty-deps/orc.tar.gz
+```
diff --git a/cpp/thirdparty/download_dependencies.sh 
b/cpp/thirdparty/download_dependencies.sh
new file mode 100755
index 0000000..2d8bee4
--- /dev/null
+++ b/cpp/thirdparty/download_dependencies.sh
@@ -0,0 +1,82 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# This script downloads all the thirdparty dependencies as a series of tarballs
+# that can be used for offline builds, etc.
+
+set -e
+
+SOURCE_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+
+if [ "$#" -ne 1 ]; then
+  echo "Usage: $0 <destination-directory>"
+  exit
+fi
+
+_DST=$1
+
+# To change toolchain versions, edit versions.txt
+source $SOURCE_DIR/versions.txt
+
+BOOST_UNDERSCORE_VERSION=`echo $BOOST_VERSION | sed 's/\./_/g'`
+wget -c -O $_DST/boost.tar.gz 
https://dl.bintray.com/boostorg/release/$BOOST_VERSION/source/boost_$BOOST_UNDERSCORE_VERSION.tar.gz
+
+wget -c -O $_DST/gtest.tar.gz 
https://github.com/google/googletest/archive/release-$GTEST_VERSION.tar.gz
+
+wget -c -O $_DST/gflags.tar.gz 
https://github.com/gflags/gflags/archive/v$GFLAGS_VERSION.tar.gz
+
+wget -c -O $_DST/gbenchmark.tar.gz 
https://github.com/google/benchmark/archive/v$GBENCHMARK_VERSION.tar.gz
+
+wget -c -O $_DST/flatbuffers.tar.gz 
https://github.com/google/flatbuffers/archive/v$FLATBUFFERS_VERSION.tar.gz
+
+wget -c -O $_DST/rapidjson.tar.gz 
https://github.com/miloyip/rapidjson/archive/v$RAPIDJSON_VERSION.tar.gz
+
+wget -c -O $_DST/snappy.tar.gz 
https://github.com/google/snappy/releases/download/$SNAPPY_VERSION/snappy-$SNAPPY_VERSION.tar.gz
+
+wget -c -O $_DST/brotli.tar.gz 
https://github.com/google/brotli/archive/$BROTLI_VERSION.tar.gz
+
+wget -c -O $_DST/lz4.tar.gz 
https://github.com/lz4/lz4/archive/v$LZ4_VERSION.tar.gz
+
+wget -c -O $_DST/zlib.tar.gz http://zlib.net/fossils/zlib-$ZLIB_VERSION.tar.gz
+
+wget -c -O $_DST/zstd.tar.gz 
https://github.com/facebook/zstd/archive/v$ZSTD_VERSION.tar.gz
+
+wget -c -O $_DST/protobuf.tar.gz 
https://github.com/google/protobuf/releases/download/v$PROTOBUF_VERSION/protobuf-$PROTOBUF_VERSION.tar.gz
+
+wget -c -O $_DST/grpc.tar.gz 
https://github.com/grpc/grpc/archive/v$GRPC_VERSION.tar.gz
+
+wget -c -O $_DST/orc.tar.gz 
https://github.com/apache/orc/archive/rel/release-$ORC_VERSION.tar.gz
+
+echo "
+# Environment variables for offline Arrow build
+export ARROW_BOOST_URL=$_DST/boost.tar.gz
+export ARROW_GTEST_URL=$_DST/gtest.tar.gz
+export ARROW_GFLAGS_URL=$_DST/gflags.tar.gz
+export ARROW_GBENCHMARK_URL=$_DST/gbenchmark.tar.gz
+export ARROW_FLATBUFFERS_URL=$_DST/flatbuffers.tar.gz
+export ARROW_RAPIDJSON_URL=$_DST/rapidjson.tar.gz
+export ARROW_SNAPPY_URL=$_DST/snappy.tar.gz
+export ARROW_BROTLI_URL=$_DST/brotli.tar.gz
+export ARROW_LZ4_URL=$_DST/lz4.tar.gz
+export ARROW_ZLIB_URL=$_DST/zlib.tar.gz
+export ARROW_ZSTD_URL=$_DST/zstd.tar.gz
+export ARROW_PROTOBUF_URL=$_DST/protobuf.tar.gz
+export ARROW_GRPC_URL=$_DST/grpc.tar.gz
+export ARROW_ORC_URL=$_DST/orc.tar.gz
+"
diff --git a/cpp/thirdparty/versions.txt b/cpp/thirdparty/versions.txt
new file mode 100644
index 0000000..554c719
--- /dev/null
+++ b/cpp/thirdparty/versions.txt
@@ -0,0 +1,34 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Toolchain library versions
+
+BOOST_VERSION=1.67.0
+GTEST_VERSION=1.8.0
+GFLAGS_VERSION=2.2.0
+GBENCHMARK_VERSION=1.4.1
+FLATBUFFERS_VERSION=1.9.0
+RAPIDJSON_VERSION=1.1.0
+JEMALLOC_VERSION=17c897976c60b0e6e4f4a365c751027244dada7a
+SNAPPY_VERSION=1.1.3
+BROTLI_VERSION=v0.6.0
+LZ4_VERSION=1.7.5
+ZLIB_VERSION=1.2.8
+ZSTD_VERSION=1.2.0
+PROTOBUF_VERSION=2.6.0
+GRPC_VERSION=1.12.1
+ORC_VERSION=1.5.1

Reply via email to