[arrow] branch master updated: ARROW-5579: [Java] shade flatbuffer dependency

2019-06-14 Thread emkornfield
This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 720be32  ARROW-5579: [Java] shade flatbuffer dependency
720be32 is described below

commit 720be32a0bb5e968b1d5f2753f03697074514a89
Author: tianchen 
AuthorDate: Fri Jun 14 23:18:17 2019 -0700

ARROW-5579: [Java] shade flatbuffer dependency

Related to [ARROW-5579](https://issues.apache.org/jira/browse/ARROW-5579).

After some discussion with the Flatbuffers maintainer, it appears that FB 
generated code is not guaranteed to be compatible with any other version of the 
runtime library other than the exact same version of the flatc used to compile 
it.

This makes depending on flatbuffers in a library (like arrow) quite risky, 
as if an app depends on any other version of FB, either directly or 
transitively, it's likely the versions will clash at some point and you'll see 
undefined behaviour at runtime.

Shading the dependency looks to me the best way to avoid this.

Author: tianchen 

Closes #4540 from tianchen92/ARROW-5579 and squashes the following commits:

fbd0c7176  ARROW-5579:  shade flatbuffer dependency
---
 java/flight/pom.xml |  3 +++
 java/format/pom.xml | 20 
 java/vector/pom.xml | 25 +
 3 files changed, 48 insertions(+)

diff --git a/java/flight/pom.xml b/java/flight/pom.xml
index 7d01a6e..3745207 100644
--- a/java/flight/pom.xml
+++ b/java/flight/pom.xml
@@ -185,6 +185,9 @@
   com.google.protobuf:*
   com.google.guava:*
 
+
+  com.google.flatbuffers:*
+
   
   
 
diff --git a/java/format/pom.xml b/java/format/pom.xml
index 8997eb1..c6159b3 100644
--- a/java/format/pom.xml
+++ b/java/format/pom.xml
@@ -159,6 +159,26 @@
 true
   
 
+
+  org.apache.maven.plugins
+  maven-shade-plugin
+  3.2.1
+  
+
+  package
+  
+shade
+  
+  
+
+  
+com.google.flatbuffers:*
+  
+
+  
+
+  
+
   
 
 
diff --git a/java/vector/pom.xml b/java/vector/pom.xml
index b882e3e..7f194e3 100644
--- a/java/vector/pom.xml
+++ b/java/vector/pom.xml
@@ -133,6 +133,31 @@
   
 
   
+  
+org.apache.maven.plugins
+maven-shade-plugin
+3.2.1
+
+  
+package
+
+  shade
+
+
+  
+
+  com.google.flatbuffers:*
+  io.netty:*
+  com.fasterxml.jackson.core:*
+  org.slf4j:slf4j-api
+  org.apache.arrow:arrow-memory
+  org.apache.arrow:arrow-format
+
+  
+
+  
+
+  
 
 
   



[arrow] branch master updated: ARROW-5615: [C++] gcc 5.4.0 doesn't want to parse inline C++11 string R literal

2019-06-14 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e06f20  ARROW-5615: [C++] gcc 5.4.0 doesn't want to parse inline 
C++11 string R literal
2e06f20 is described below

commit 2e06f2000f42d33a1c3b137e761622dd36f66192
Author: Wes McKinney 
AuthorDate: Fri Jun 14 20:34:36 2019 -0500

ARROW-5615: [C++] gcc 5.4.0 doesn't want to parse inline C++11 string R 
literal

I ran into this while trying to tinker with ARROW-5474 (checking minimum 
Boost version).

This occurs for me on master with

```
docker-compose build cpp-ubuntu-xenial
docker-compose run cpp-ubuntu-xenial
```

Error looks like

```
/arrow/cpp/src/arrow/ipc/json-simple-test.cc:543:9: error: missing 
terminating " character [-Werror]
   ASSERT_OK(ArrayFromJSON(type, R"delim(
 ^
/arrow/cpp/src/arrow/ipc/json-simple-test.cc:543:2: error: missing 
terminating " character
   ASSERT_OK(ArrayFromJSON(type, R"delim(
  ^
/arrow/cpp/src/arrow/ipc/json-simple-test.cc:550:1: error: stray '\' in 
program
 )delim",
```

I'm perplexed about why this is a problem and why it has not been 
encountered by others.

Author: Wes McKinney 

Closes #4579 from wesm/ARROW-5615 and squashes the following commits:

a4f08fa81  Make parquet-schema-test robust to stripping 
whitespace
f7ed973bc  gcc 5.4.0 doesn't want to parse inline C++11 
string R literal
---
 cpp/CMakeLists.txt|  7 -
 cpp/src/arrow/ipc/json-simple-test.cc | 23 +++--
 cpp/src/parquet/schema-test.cc| 48 +--
 cpp/src/parquet/schema.cc | 20 +++
 4 files changed, 54 insertions(+), 44 deletions(-)

diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt
index 501c541..5d9daf8 100644
--- a/cpp/CMakeLists.txt
+++ b/cpp/CMakeLists.txt
@@ -270,7 +270,12 @@ if(NOT ARROW_BUILD_TESTS)
   set(NO_TESTS 1)
 else()
   add_custom_target(all-tests)
-  add_custom_target(unittest ctest -L unittest)
+  add_custom_target(unittest
+ctest
+-j4
+-L
+unittest
+--output-on-failure)
   add_dependencies(unittest all-tests)
 endif()
 
diff --git a/cpp/src/arrow/ipc/json-simple-test.cc 
b/cpp/src/arrow/ipc/json-simple-test.cc
index f1d487f..772557b 100644
--- a/cpp/src/arrow/ipc/json-simple-test.cc
+++ b/cpp/src/arrow/ipc/json-simple-test.cc
@@ -540,13 +540,15 @@ TEST(TestMap, IntegerToInteger) {
   auto type = map(int16(), int16());
   std::shared_ptr expected, actual;
 
-  ASSERT_OK(ArrayFromJSON(type, R"([
+  const char* input = R"(
+[
 [[0, 1], [1, 1], [2, 2], [3, 3], [4, 5], [5, 8]],
 null,
 [[0, null], [1, null], [2, 0], [3, 1], [4, null], [5, 2]],
 []
-  ])",
-  &actual));
+  ]
+)";
+  ASSERT_OK(ArrayFromJSON(type, input, &actual));
 
   std::unique_ptr builder;
   ASSERT_OK(MakeBuilder(default_memory_pool(), type, &builder));
@@ -569,12 +571,15 @@ TEST(TestMap, IntegerToInteger) {
 
 TEST(TestMap, StringToInteger) {
   auto type = map(utf8(), int32());
-  auto actual = ArrayFromJSON(type, R"([
+  const char* input = R"(
+[
 [["joe", 0], ["mark", null]],
 null,
 [["cap", 8]],
 []
-  ])");
+  ]
+)";
+  auto actual = ArrayFromJSON(type, input);
   std::vector offsets = {0, 2, 2, 3, 3};
   auto expected_keys = ArrayFromJSON(utf8(), R"(["joe", "mark", "cap"])");
   auto expected_values = ArrayFromJSON(int32(), "[0, null, 8]");
@@ -610,7 +615,8 @@ TEST(TestMap, IntegerMapToStringList) {
   auto type = map(map(int16(), int16()), list(utf8()));
   std::shared_ptr expected, actual;
 
-  ASSERT_OK(ArrayFromJSON(type, R"([
+  const char* input = R"(
+[
 [
   [
 [],
@@ -626,8 +632,9 @@ TEST(TestMap, IntegerMapToStringList) {
   ]
 ],
 null
-  ])",
-  &actual));
+  ]
+)";
+  ASSERT_OK(ArrayFromJSON(type, input, &actual));
 
   std::unique_ptr builder;
   ASSERT_OK(MakeBuilder(default_memory_pool(), type, &builder));
diff --git a/cpp/src/parquet/schema-test.cc b/cpp/src/parquet/schema-test.cc
index cdaa099..6a580d7 100644
--- a/cpp/src/parquet/schema-test.cc
+++ b/cpp/src/parquet/schema-test.cc
@@ -585,17 +585,16 @@ TEST(TestColumnDescriptor, TestAttrs) {
   ASSERT_EQ(Type::BYTE_ARRAY, descr.physical_type());
 
   ASSERT_EQ(-1, descr.type_length());
-  ASSERT_EQ(
-  R"(column descriptor = {
-  name: name
-  path: 
-  physical_type: BYTE_ARRAY
-  logical_type: UTF8
-  logical_annotation: String
-  max_definition_level: 4
-  max_repetition_level: 1
-})",
-  descr.ToString());
+  const char* expected_descr = R"(column descriptor = {
+  name: name,
+  path: ,
+  physical_type: BYTE_ARRAY,
+  logical_type: UTF8,
+  lo

[arrow] branch master updated: ARROW-5616: [C++][Python] Fix -Wwrite-strings warning when building against Python 2.7 headers

2019-06-14 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 571afd6  ARROW-5616: [C++][Python] Fix -Wwrite-strings warning when 
building against Python 2.7 headers
571afd6 is described below

commit 571afd65e4bab6f7bf06106c0da668582d02836a
Author: Wes McKinney 
AuthorDate: Fri Jun 14 19:59:14 2019 -0500

ARROW-5616: [C++][Python] Fix -Wwrite-strings warning when building against 
Python 2.7 headers

`PyObject_CallMethod` uses `const char*` for its arguments while Python 2.7 
it's `char*` so this warning only occurs there

Author: Wes McKinney 

Closes #4581 from wesm/ARROW-5616 and squashes the following commits:

5b8035cf7  List comprehension leaves a dangling reference in 
Python 2
78733618f  Fix another API callsite
19a78d0a7  Python 2.7 builds have -Wwrite-strings
---
 cpp/src/arrow/python/common.h   | 11 +++
 cpp/src/arrow/python/extension_type.cc  |  7 ---
 cpp/src/arrow/python/io.cc  | 11 ---
 python/pyarrow/tests/test_extension_type.py |  2 +-
 4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/cpp/src/arrow/python/common.h b/cpp/src/arrow/python/common.h
index a759d39..a10e3bb 100644
--- a/cpp/src/arrow/python/common.h
+++ b/cpp/src/arrow/python/common.h
@@ -259,6 +259,17 @@ class ARROW_PYTHON_EXPORT PyBuffer : public Buffer {
   Py_buffer py_buf_;
 };
 
+// This is annoying: because C++11 does not allow implicit conversion of string
+// literals to non-const char*, we need to go through some gymnastics to use
+// PyObject_CallMethod without a lot of pain (its arguments are non-const
+// char*)
+template 
+static inline PyObject* cpp_PyObject_CallMethod(PyObject* obj, const char* 
method_name,
+const char* argspec, 
ArgTypes... args) {
+  return PyObject_CallMethod(obj, const_cast(method_name),
+ const_cast(argspec), args...);
+}
+
 }  // namespace py
 }  // namespace arrow
 
diff --git a/cpp/src/arrow/python/extension_type.cc 
b/cpp/src/arrow/python/extension_type.cc
index b130030..b9bd8b0 100644
--- a/cpp/src/arrow/python/extension_type.cc
+++ b/cpp/src/arrow/python/extension_type.cc
@@ -33,7 +33,8 @@ namespace {
 
 // Serialize a Python ExtensionType instance
 Status SerializeExtInstance(PyObject* type_instance, std::string* out) {
-  OwnedRef res(PyObject_CallMethod(type_instance, "__arrow_ext_serialize__", 
nullptr));
+  OwnedRef res(
+  cpp_PyObject_CallMethod(type_instance, "__arrow_ext_serialize__", 
nullptr));
   if (!res) {
 return ConvertPyError();
   }
@@ -61,8 +62,8 @@ PyObject* DeserializeExtInstance(PyObject* type_class,
 return nullptr;
   }
 
-  return PyObject_CallMethod(type_class, "__arrow_ext_deserialize__", "OO",
- storage_ref.obj(), data_ref.obj());
+  return cpp_PyObject_CallMethod(type_class, "__arrow_ext_deserialize__", "OO",
+ storage_ref.obj(), data_ref.obj());
 }
 
 }  // namespace
diff --git a/cpp/src/arrow/python/io.cc b/cpp/src/arrow/python/io.cc
index fd16f67..8a4823b 100644
--- a/cpp/src/arrow/python/io.cc
+++ b/cpp/src/arrow/python/io.cc
@@ -36,17 +36,6 @@ namespace py {
 // --
 // Python file
 
-// This is annoying: because C++11 does not allow implicit conversion of string
-// literals to non-const char*, we need to go through some gymnastics to use
-// PyObject_CallMethod without a lot of pain (its arguments are non-const
-// char*)
-template 
-static inline PyObject* cpp_PyObject_CallMethod(PyObject* obj, const char* 
method_name,
-const char* argspec, 
ArgTypes... args) {
-  return PyObject_CallMethod(obj, const_cast(method_name),
- const_cast(argspec), args...);
-}
-
 // A common interface to a Python file-like object. Must acquire GIL before
 // calling any methods
 class PythonFile {
diff --git a/python/pyarrow/tests/test_extension_type.py 
b/python/pyarrow/tests/test_extension_type.py
index d688d3c..fb949ca 100644
--- a/python/pyarrow/tests/test_extension_type.py
+++ b/python/pyarrow/tests/test_extension_type.py
@@ -114,7 +114,7 @@ def test_ext_array_lifetime():
 storage = pa.array([b"foo", b"bar"], type=pa.binary(3))
 arr = pa.ExtensionArray.from_storage(ty, storage)
 
-refs = [weakref.ref(obj) for obj in (ty, arr, storage)]
+refs = [weakref.ref(ty), weakref.ref(arr), weakref.ref(storage)]
 del ty, storage, arr
 for ref in refs:
 assert ref() is None



[arrow] branch master updated: ARROW-5341: [C++][Documentation] developers/cpp.rst should mention documentation warnings

2019-06-14 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 8c5271d  ARROW-5341: [C++][Documentation] developers/cpp.rst should 
mention documentation warnings
8c5271d is described below

commit 8c5271d96649bf68e17d678ceb1057834a9095b3
Author: Benjamin Kietzman 
AuthorDate: Fri Jun 14 16:48:10 2019 -0500

ARROW-5341: [C++][Documentation] developers/cpp.rst should mention 
documentation warnings

Add a section detailing that documentation warnings will break the build at 
level `CHECKIN` with the clang compiler. The relevant clang documentation is 
linked so that readers can look up what might provoke a doc warning

Author: Benjamin Kietzman 
Author: Wes McKinney 

Closes #4578 from bkietz/5341-Add-instructions-about-fixing-and-testin and 
squashes the following commits:

f0c3ed34d  Consolidate developer docs related to Doxygen 
comments in a new subsection
7e155a27a  add section describing documentation warnings
---
 docs/README.md |  4 ++--
 docs/source/developers/cpp.rst | 45 +++---
 2 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/docs/README.md b/docs/README.md
index aa0a231..2130426 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -23,8 +23,8 @@ This directory contains source files for building the main 
project
 documentation. This includes the [Arrow columnar format specification][2].
 
 Instructions for building the documentation site are found in
-[docs/source/building.rst][1]. The build depends on the API
+[docs/source/developers/documentation.rst][1]. The build depends on the API
 documentation for some of the project subcomponents.
 
 [1]: 
https://github.com/apache/arrow/blob/master/docs/source/developers/documentation.rst
-[2]: https://github.com/apache/arrow/tree/master/docs/source/format
\ No newline at end of file
+[2]: https://github.com/apache/arrow/tree/master/docs/source/format
diff --git a/docs/source/developers/cpp.rst b/docs/source/developers/cpp.rst
index fbc483c..568e5c8 100644
--- a/docs/source/developers/cpp.rst
+++ b/docs/source/developers/cpp.rst
@@ -355,8 +355,6 @@ This project follows `Google's C++ Style Guide
 `_ with minor exceptions:
 
 * We relax the line length restriction to 90 characters.
-* We use doxygen style comments ("///") in header files for comments that we
-  wish to show up in API documentation
 * We use the ``NULLPTR`` macro in header files (instead of ``nullptr``) defined
   in ``src/arrow/util/macros.h`` to support building C++/CLI (ARROW-1134)
 
@@ -368,7 +366,10 @@ codebase is subjected to a number of code style and code 
cleanliness checks.
 In order to have a passing CI build, your modified git branch must pass the
 following checks:
 
-* C++ builds without compiler warnings with ``-DBUILD_WARNING_LEVEL=CHECKIN``
+* C++ builds with the project's active version of ``clang`` without
+  compiler warnings with ``-DBUILD_WARNING_LEVEL=CHECKIN``. Note that
+  there are classes of warnings (such as `-Wdocumentation`, see more
+  on this below) that are not caught by `gcc`.
 * C++ unit test suite with valgrind enabled, use ``-DARROW_TEST_MEMCHECK=ON``
   when invoking CMake
 * Passes cpplint checks, checked with ``make lint``
@@ -400,6 +401,31 @@ target that is executable from the root of the repository:
 See :ref:`integration` for more information about the project's
 ``docker-compose`` configuration.
 
+API Documentation
+~
+
+We use Doxygen style comments (``///``) in header files for comments
+that we wish to show up in API documentation for classes and
+functions.
+
+When using ``clang`` and building with
+``-DBUILD_WARNING_LEVEL=CHECKIN``, the ``-Wdocumentation`` flag is
+used which checks for some common documnetation inconsistencies, like
+documenting some, but not all function parameters with ``\param``. See
+the `LLVM documentation warnings section
+`_
+for more about this.
+
+While we publish the API documentation as part of the main Sphinx-based
+documentation site, you can also build the C++ API documentation anytime using
+Doxygen. Run the following command from the ``cpp/apidoc`` directory:
+
+.. code-block:: shell
+
+   doxygen Doxyfile
+
+This requires `Doxygen `_ to be installed.
+
 Modular Build Targets
 ~
 
@@ -432,19 +458,6 @@ Parquet libraries, its tests, and its dependencies, you 
can run:
 If you omit an explicit target when invoking ``make``, all targets will be
 built.
 
-Building API Documentation
-~~
-
-While we publish the API documentation as part of the main Sphinx-based
-documentation site, you can also buil

[arrow] branch master updated: ARROW-5576: [C++] Query ASF mirror system for URL and use when downloading Thrift

2019-06-14 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 38b019d  ARROW-5576: [C++] Query ASF mirror system for URL and use 
when downloading Thrift
38b019d is described below

commit 38b019df4e7a7da45ee0002c3161d87426561eac
Author: Wes McKinney 
AuthorDate: Fri Jun 14 13:27:58 2019 -0500

ARROW-5576: [C++] Query ASF mirror system for URL and use when downloading 
Thrift

This also allows CHECKSUM values to be put in cpp/thirdparty/versions.txt 
for security purposes. Apache Thrift is still using MD5 for some reason, so we 
will need to fix that once they get their next release out (hopefully with 
SHA256 checksums)

Author: Wes McKinney 

Closes #4558 from wesm/ARROW-5576 and squashes the following commits:

4700af407  Disable log suppression in thrift_ep
bfbfbec8d  Query ASF mirror system for URL and use when 
downloading Thrift
---
 cpp/build-support/get_apache_mirror.py  | 31 +++
 cpp/cmake_modules/ThirdpartyToolchain.cmake | 39 -
 cpp/thirdparty/versions.txt |  1 +
 3 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/cpp/build-support/get_apache_mirror.py 
b/cpp/build-support/get_apache_mirror.py
new file mode 100644
index 000..07186e0
--- /dev/null
+++ b/cpp/build-support/get_apache_mirror.py
@@ -0,0 +1,31 @@
+#!/usr/bin/env python
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# This script queries the ASF mirror system to obtain a suggested
+# mirror for downloading dependencies, e.g. in CMake
+
+import json
+try:
+from urllib2 import urlopen
+except ImportError:
+# py3
+from urllib.request import urlopen
+
+suggested_mirror = urlopen('https://www.apache.org/dyn/'
+   'closer.cgi?as_json=1').read()
+print(json.loads(suggested_mirror)['preferred'])
diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 89f5200..90c6d5e 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -34,6 +34,21 @@ else()
 endif()
 
 # --
+# We should not use the Apache dist server for build dependencies
+
+set(APACHE_MIRROR "")
+
+macro(get_apache_mirror)
+  if(APACHE_MIRROR STREQUAL "")
+exec_program(${PYTHON_EXECUTABLE}
+ ARGS
+ ${CMAKE_SOURCE_DIR}/build-support/get_apache_mirror.py
+ OUTPUT_VARIABLE
+ APACHE_MIRROR)
+  endif()
+endmacro()
+
+# --
 # Resolve the dependencies
 
 # TODO: add uriparser here when it gets a conda package
@@ -200,7 +215,9 @@ endif()
 file(STRINGS "${THIRDPARTY_DIR}/versions.txt" TOOLCHAIN_VERSIONS_TXT)
 foreach(_VERSION_ENTRY ${TOOLCHAIN_VERSIONS_TXT})
   # Exclude comments
-  if(NOT _VERSION_ENTRY MATCHES "^[^#][A-Za-z0-9-_]+_VERSION=")
+  if(NOT
+ ((_VERSION_ENTRY MATCHES "^[^#][A-Za-z0-9-_]+_VERSION=")
+  OR (_VERSION_ENTRY MATCHES "^[^#][A-Za-z0-9-_]+_CHECKSUM=")))
 continue()
   endif()
 
@@ -344,10 +361,7 @@ endif()
 if(DEFINED ENV{ARROW_THRIFT_URL})
   set(THRIFT_SOURCE_URL "$ENV{ARROW_THRIFT_URL}")
 else()
-  set(
-THRIFT_SOURCE_URL
-
"https://archive.apache.org/dist/thrift/${THRIFT_VERSION}/thrift-${THRIFT_VERSION}.tar.gz";
-)
+  set(THRIFT_SOURCE_URL "FROM-APACHE-MIRROR")
 endif()
 
 if(DEFINED ENV{ARROW_URIPARSER_URL})
@@ -996,11 +1010,24 @@ macro(build_thrift)
   ${THRIFT_CMAKE_ARGS})
   endif()
 
+  if("${THRIFT_SOURCE_URL}" STREQUAL "FROM-APACHE-MIRROR")
+get_apache_mirror()
+set(THRIFT_SOURCE_URL
+
"${APACHE_MIRROR}/thrift/${THRIFT_VERSION}/thrift-${THRIFT_VERSION}.tar.gz")
+  endif()
+
+  message("Downloading Apache Thrift from ${THRIFT_SOURCE_URL}")
+
   externalproject_add(thrift_ep
   URL ${THRIFT_SOURCE_URL}
+  URL_HASH "MD5=${THRIFT_MD5_CHECKSUM}"
   BUILD_BY

[arrow] branch master updated: ARROW-3686: [Python] support masked arrays in pa.array

2019-06-14 Thread fsaintjacques
This is an automated email from the ASF dual-hosted git repository.

fsaintjacques pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 663b27b  ARROW-3686: [Python] support masked arrays in pa.array
663b27b is described below

commit 663b27ba694565bc4a3569a6b51679afe7cb4846
Author: Joris Van den Bossche 
AuthorDate: Fri Jun 14 13:57:17 2019 -0400

ARROW-3686: [Python] support masked arrays in pa.array

https://issues.apache.org/jira/browse/ARROW-3686

Author: Joris Van den Bossche 

Closes #4534 from jorisvandenbossche/ARROW-3686-masked-array and squashes 
the following commits:

424885f29  pin type + use isinstance
f431c5e87  Merge remote-tracking branch 
'upstream/master' into ARROW-3686-masked-array
e3e22b536  support masked arrays in pa.array
---
 python/pyarrow/array.pxi   |  8 
 python/pyarrow/tests/test_array.py | 11 +++
 2 files changed, 19 insertions(+)

diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index 607d7ae..97ffb66 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -170,6 +170,14 @@ def array(object obj, type=None, mask=None, size=None, 
from_pandas=None,
 if is_pandas_object and from_pandas is None:
 c_from_pandas = True
 
+if isinstance(values, np.ma.MaskedArray):
+if mask is not None:
+raise ValueError("Cannot pass a numpy masked array and "
+ "specify a mask at the same time")
+else:
+mask = values.mask
+values = values.data
+
 if pandas_api.is_categorical(values):
 return DictionaryArray.from_arrays(
 values.codes, values.categories.values,
diff --git a/python/pyarrow/tests/test_array.py 
b/python/pyarrow/tests/test_array.py
index f4fc23c..531b835 100644
--- a/python/pyarrow/tests/test_array.py
+++ b/python/pyarrow/tests/test_array.py
@@ -1116,6 +1116,17 @@ def test_array_from_numpy_unicode():
 assert arrow_arr.equals(expected)
 
 
+def test_array_from_masked():
+ma = np.ma.array([1, 2, 3, 4], dtype='int64',
+ mask=[False, False, True, False])
+result = pa.array(ma)
+expected = pa.array([1, 2, None, 4], type='int64')
+assert expected.equals(result)
+
+with pytest.raises(ValueError, match="Cannot pass a numpy masked array"):
+pa.array(ma, mask=np.array([True, False, False, False]))
+
+
 def test_buffers_primitive():
 a = pa.array([1, 2, None, 4], type=pa.int16())
 buffers = a.buffers()



[arrow] branch master updated: ARROW-5612: [Python][Doc] Add prominent note that date_as_object option changed with Arrow 0.13

2019-06-14 Thread fsaintjacques
This is an automated email from the ASF dual-hosted git repository.

fsaintjacques pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 02fd62d  ARROW-5612: [Python][Doc] Add prominent note that 
date_as_object option changed with Arrow 0.13
02fd62d is described below

commit 02fd62d8f46412d8b0399f49adf8e2053946152e
Author: Miguel Cabrera 
AuthorDate: Fri Jun 14 12:51:59 2019 -0400

ARROW-5612: [Python][Doc] Add prominent note that date_as_object option 
changed with Arrow 0.13

Adding small documentation on bits on the pandas integration documentation. 
It relates to #4363
Not sure if the wording is correct.

Author: Miguel Cabrera 
Author: Miguel Cabrera 
Author: Wes McKinney 

Closes #4381 from mfcabrera/improve-pandas-doc and squashes the following 
commits:

b6ed4ed62  Add notes about date_as_object default value change
f65178f9e  Small fix s/was/is/
16b553084  Document to_pandas behaviour before 0.13 and add 
extra pd related info
---
 docs/source/python/pandas.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/docs/source/python/pandas.rst b/docs/source/python/pandas.rst
index dbc5e77..aafbf57 100644
--- a/docs/source/python/pandas.rst
+++ b/docs/source/python/pandas.rst
@@ -184,6 +184,12 @@ If you want to use NumPy's ``datetime64`` dtype instead, 
pass
s2 = pd.Series(arr.to_pandas(date_as_object=False))
s2.dtype
 
+.. warning::
+
+   As of Arrow ``0.13`` the parameter ``date_as_object`` is ``True``
+   by default. Older versions must pass ``date_as_object=True`` to
+   obtain this behavior
+
 Time types
 ~~
 



[arrow] branch master updated: ARROW-5604: [Go] improve coverage of TypeTraits

2019-06-14 Thread sbinet
This is an automated email from the ASF dual-hosted git repository.

sbinet pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new a5fa7bb  ARROW-5604: [Go] improve coverage of TypeTraits
a5fa7bb is described below

commit a5fa7bb4a5785db627cd5f4d8996fd8afc3f2e42
Author: Sebastien Binet 
AuthorDate: Fri Jun 14 18:31:21 2019 +0200

ARROW-5604: [Go] improve coverage of TypeTraits

Author: Sebastien Binet 

Closes #4571 from sbinet/issue-5604 and squashes the following commits:

2252eec70  ARROW-5604:  improve coverage of TypeTraits
---
 go/arrow/Makefile |   2 +-
 go/arrow/doc.go   |   2 +-
 go/arrow/type_traits_interval.go  |   4 +-
 go/arrow/type_traits_numeric.gen.go   |   7 +-
 go/arrow/type_traits_numeric.gen.go.tmpl  |  11 +-
 go/arrow/type_traits_numeric.gen_test.go  | 570 ++
 go/arrow/type_traits_numeric.gen_test.go.tmpl |  61 +++
 go/arrow/type_traits_test.go  | 156 +++
 8 files changed, 803 insertions(+), 10 deletions(-)

diff --git a/go/arrow/Makefile b/go/arrow/Makefile
index bd77836..9c4a232 100644
--- a/go/arrow/Makefile
+++ b/go/arrow/Makefile
@@ -30,7 +30,7 @@ assembly:
@$(MAKE) -C math assembly
 
 generate: bin/tmpl
-   bin/tmpl -i -data=numeric.tmpldata type_traits_numeric.gen.go.tmpl 
array/numeric.gen.go.tmpl array/numericbuilder.gen_test.go.tmpl  
array/numericbuilder.gen.go.tmpl array/bufferbuilder_numeric.gen.go.tmpl
+   bin/tmpl -i -data=numeric.tmpldata type_traits_numeric.gen.go.tmpl 
type_traits_numeric.gen_test.go.tmpl array/numeric.gen.go.tmpl 
array/numericbuilder.gen_test.go.tmpl  array/numericbuilder.gen.go.tmpl 
array/bufferbuilder_numeric.gen.go.tmpl
bin/tmpl -i -data=datatype_numeric.gen.go.tmpldata 
datatype_numeric.gen.go.tmpl
@$(MAKE) -C math generate
 
diff --git a/go/arrow/doc.go b/go/arrow/doc.go
index a91e62c..10ddda9 100644
--- a/go/arrow/doc.go
+++ b/go/arrow/doc.go
@@ -31,7 +31,7 @@ array is valid (not null). If the array has no null entries, 
it is possible to o
 */
 package arrow
 
-//go:generate go run _tools/tmpl/main.go -i -data=numeric.tmpldata 
type_traits_numeric.gen.go.tmpl array/numeric.gen.go.tmpl 
array/numericbuilder.gen.go.tmpl array/bufferbuilder_numeric.gen.go.tmpl
+//go:generate go run _tools/tmpl/main.go -i -data=numeric.tmpldata 
type_traits_numeric.gen.go.tmpl type_traits_numeric.gen_test.go.tmpl 
array/numeric.gen.go.tmpl array/numericbuilder.gen.go.tmpl 
array/bufferbuilder_numeric.gen.go.tmpl
 //go:generate go run _tools/tmpl/main.go -i 
-data=datatype_numeric.gen.go.tmpldata datatype_numeric.gen.go.tmpl 
tensor/numeric.gen.go.tmpl tensor/numeric.gen_test.go.tmpl
 //go:generate go run ./gen-flatbuffers.go
 
diff --git a/go/arrow/type_traits_interval.go b/go/arrow/type_traits_interval.go
index 8ddaa51..fcff1e6 100644
--- a/go/arrow/type_traits_interval.go
+++ b/go/arrow/type_traits_interval.go
@@ -89,8 +89,8 @@ func (daytimeTraits) BytesRequired(n int) int { return 
DayTimeIntervalSizeBytes
 
 // PutValue
 func (daytimeTraits) PutValue(b []byte, v DayTimeInterval) {
-   binary.LittleEndian.PutUint32(b, uint32(v.Days))
-   binary.LittleEndian.PutUint32(b, uint32(v.Milliseconds))
+   binary.LittleEndian.PutUint32(b[0:4], uint32(v.Days))
+   binary.LittleEndian.PutUint32(b[4:8], uint32(v.Milliseconds))
 }
 
 // CastFromBytes reinterprets the slice b to a slice of type DayTimeInterval.
diff --git a/go/arrow/type_traits_numeric.gen.go 
b/go/arrow/type_traits_numeric.gen.go
index c8c063a..f98f494 100644
--- a/go/arrow/type_traits_numeric.gen.go
+++ b/go/arrow/type_traits_numeric.gen.go
@@ -16,10 +16,11 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
-package arrow
+package arrow // import "github.com/apache/arrow/go/arrow"
 
 import (
"encoding/binary"
+   "math"
"reflect"
"unsafe"
 )
@@ -153,7 +154,7 @@ func (float64Traits) BytesRequired(n int) int { return 
Float64SizeBytes * n }
 
 // PutValue
 func (float64Traits) PutValue(b []byte, v float64) {
-   binary.LittleEndian.PutUint64(b, uint64(v))
+   binary.LittleEndian.PutUint64(b, math.Float64bits(v))
 }
 
 // CastFromBytes reinterprets the slice b to a slice of type float64.
@@ -297,7 +298,7 @@ func (float32Traits) BytesRequired(n int) int { return 
Float32SizeBytes * n }
 
 // PutValue
 func (float32Traits) PutValue(b []byte, v float32) {
-   binary.LittleEndian.PutUint32(b, uint32(v))
+   binary.LittleEndian.PutUint32(b, math.Float32bits(v))
 }
 
 // CastFromBytes reinterprets the slice b to a slice of type float32.
diff --git a/go/arrow/type_traits_numeric.gen.go.tmpl 
b/go/arrow/type_traits_numeric.gen.go.tmpl
index 362d2d8..c4a25ee 100644
--- a/go/arrow/type_traits_nu

[arrow] branch master updated: ARROW-5591: [Go] implement read/write IPC for Duration & Intervals

2019-06-14 Thread sbinet
This is an automated email from the ASF dual-hosted git repository.

sbinet pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new dee0c1f  ARROW-5591: [Go] implement read/write IPC for Duration & 
Intervals
dee0c1f is described below

commit dee0c1f0d404192d3ba222fc4be7aee88ad3c16b
Author: Sebastien Binet 
AuthorDate: Fri Jun 14 18:25:07 2019 +0200

ARROW-5591: [Go] implement read/write IPC for Duration & Intervals

Author: Sebastien Binet 

Closes #4564 from sbinet/issue-5591 and squashes the following commits:

c2e638b28  go/arrow/ipc: implement read/write IPC for 
Duration
375844faf  ARROW-5591:  implement read/write IPC for 
Duration & Intervals
---
 go/arrow/internal/arrdata/arrdata.go | 155 +++
 go/arrow/ipc/file_reader.go  |   4 +-
 go/arrow/ipc/metadata.go |  53 
 3 files changed, 211 insertions(+), 1 deletion(-)

diff --git a/go/arrow/internal/arrdata/arrdata.go 
b/go/arrow/internal/arrdata/arrdata.go
index e76d68a..aeb7ee5 100644
--- a/go/arrow/internal/arrdata/arrdata.go
+++ b/go/arrow/internal/arrdata/arrdata.go
@@ -40,6 +40,8 @@ func init() {
Records["fixed_size_lists"] = makeFixedSizeListsRecords()
Records["fixed_width_types"] = makeFixedWidthTypesRecords()
Records["fixed_size_binaries"] = makeFixedSizeBinariesRecords()
+   Records["intervals"] = makeIntervalsRecords()
+   Records["durations"] = makeDurationsRecords()
 
for k := range Records {
RecordNames = append(RecordNames, k)
@@ -474,6 +476,105 @@ func makeFixedSizeBinariesRecords() []array.Record {
return recs
 }
 
+func makeIntervalsRecords() []array.Record {
+   mem := memory.NewGoAllocator()
+
+   schema := arrow.NewSchema(
+   []arrow.Field{
+   arrow.Field{Name: "months", Type: 
arrow.FixedWidthTypes.MonthInterval, Nullable: true},
+   arrow.Field{Name: "days", Type: 
arrow.FixedWidthTypes.DayTimeInterval, Nullable: true},
+   }, nil,
+   )
+
+   mask := []bool{true, false, false, true, true}
+   chunks := [][]array.Interface{
+   []array.Interface{
+   arrayOf(mem, []arrow.MonthInterval{1, 2, 3, 4, 5}, 
mask),
+   arrayOf(mem, []arrow.DayTimeInterval{{1, 1}, {2, 2}, 
{3, 3}, {4, 4}, {5, 5}}, mask),
+   },
+   []array.Interface{
+   arrayOf(mem, []arrow.MonthInterval{11, 12, 13, 14, 15}, 
mask),
+   arrayOf(mem, []arrow.DayTimeInterval{{11, 11}, {12, 
12}, {13, 13}, {14, 14}, {15, 15}}, mask),
+   },
+   []array.Interface{
+   arrayOf(mem, []arrow.MonthInterval{21, 22, 23, 24, 25}, 
mask),
+   arrayOf(mem, []arrow.DayTimeInterval{{21, 21}, {22, 
22}, {23, 23}, {24, 24}, {25, 25}}, mask),
+   },
+   }
+
+   defer func() {
+   for _, chunk := range chunks {
+   for _, col := range chunk {
+   col.Release()
+   }
+   }
+   }()
+
+   recs := make([]array.Record, len(chunks))
+   for i, chunk := range chunks {
+   recs[i] = array.NewRecord(schema, chunk, -1)
+   }
+
+   return recs
+}
+
+type (
+   duration_s  arrow.Duration
+   duration_ms arrow.Duration
+   duration_us arrow.Duration
+   duration_ns arrow.Duration
+)
+
+func makeDurationsRecords() []array.Record {
+   mem := memory.NewGoAllocator()
+
+   schema := arrow.NewSchema(
+   []arrow.Field{
+   arrow.Field{Name: "durations-s", Type: 
&arrow.DurationType{Unit: arrow.Second}, Nullable: true},
+   arrow.Field{Name: "durations-ms", Type: 
&arrow.DurationType{Unit: arrow.Millisecond}, Nullable: true},
+   arrow.Field{Name: "durations-us", Type: 
&arrow.DurationType{Unit: arrow.Microsecond}, Nullable: true},
+   arrow.Field{Name: "durations-ns", Type: 
&arrow.DurationType{Unit: arrow.Nanosecond}, Nullable: true},
+   }, nil,
+   )
+
+   mask := []bool{true, false, false, true, true}
+   chunks := [][]array.Interface{
+   []array.Interface{
+   arrayOf(mem, []duration_s{1, 2, 3, 4, 5}, mask),
+   arrayOf(mem, []duration_ms{1, 2, 3, 4, 5}, mask),
+   arrayOf(mem, []duration_us{1, 2, 3, 4, 5}, mask),
+   arrayOf(mem, []duration_ns{1, 2, 3, 4, 5}, mask),
+   },
+   []array.Interface{
+   arrayOf(mem, []duration_s{11, 12, 13, 14, 15}, mask),
+   arrayOf(mem, []duration_ms{11, 12, 13, 14, 15}, mask),
+ 

[arrow] branch master updated: ARROW-5592: [Go] implement Duration array

2019-06-14 Thread sbinet
This is an automated email from the ASF dual-hosted git repository.

sbinet pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 462cbe8  ARROW-5592: [Go] implement Duration array
462cbe8 is described below

commit 462cbe8715df8b1ad3a3d2c2418de9433d4b00d3
Author: Sebastien Binet 
AuthorDate: Fri Jun 14 18:21:59 2019 +0200

ARROW-5592: [Go] implement Duration array

Author: Sebastien Binet 

Closes #4563 from sbinet/issue-5592 and squashes the following commits:

de7bbd32e  ARROW-5592:  implement Duration array
---
 go/arrow/array/array.go|   2 +-
 go/arrow/array/array_test.go   |   2 +-
 go/arrow/array/compare.go  |   6 +
 go/arrow/array/numeric.gen.go  |  57 ++
 go/arrow/array/numericbuilder.gen.go   | 138 +
 go/arrow/array/numericbuilder.gen_test.go  | 815 +
 go/arrow/array/numericbuilder.gen_test.go.tmpl |  43 ++
 go/arrow/datatype_fixedwidth.go|  77 ++-
 go/arrow/numeric.tmpldata  |  12 +
 go/arrow/type_traits_numeric.gen.go|  49 ++
 10 files changed, 1171 insertions(+), 30 deletions(-)

diff --git a/go/arrow/array/array.go b/go/arrow/array/array.go
index c13dd07..1912f3e 100644
--- a/go/arrow/array/array.go
+++ b/go/arrow/array/array.go
@@ -194,7 +194,7 @@ func init() {
arrow.MAP:   unsupportedArrayType,
arrow.EXTENSION: unsupportedArrayType,
arrow.FIXED_SIZE_LIST:   func(data *Data) Interface { return 
NewFixedSizeListData(data) },
-   arrow.DURATION:  unsupportedArrayType,
+   arrow.DURATION:  func(data *Data) Interface { return 
NewDurationData(data) },
 
// invalid data types to fill out array size 2⁵-1
31: invalidDataType,
diff --git a/go/arrow/array/array_test.go b/go/arrow/array/array_test.go
index 884bb8d..724f3b4 100644
--- a/go/arrow/array/array_test.go
+++ b/go/arrow/array/array_test.go
@@ -80,13 +80,13 @@ func TestMakeFromData(t *testing.T) {
array.NewData(&testDataType{arrow.INT64}, 0, 
make([]*memory.Buffer, 4), nil, 0, 0),
array.NewData(&testDataType{arrow.INT64}, 0, 
make([]*memory.Buffer, 4), nil, 0, 0),
}},
+   {name: "duration", d: &testDataType{arrow.DURATION}},
 
// unsupported types
{name: "union", d: &testDataType{arrow.UNION}, expPanic: true, 
expError: "unsupported data type: UNION"},
{name: "dictionary", d: &testDataType{arrow.DICTIONARY}, 
expPanic: true, expError: "unsupported data type: DICTIONARY"},
{name: "map", d: &testDataType{arrow.Type(27)}, expPanic: true, 
expError: "unsupported data type: MAP"},
{name: "extension", d: &testDataType{arrow.Type(28)}, expPanic: 
true, expError: "unsupported data type: EXTENSION"},
-   {name: "duration", d: &testDataType{arrow.Type(30)}, expPanic: 
true, expError: "unsupported data type: DURATION"},
 
// invalid types
{name: "invalid(-1)", d: &testDataType{arrow.Type(-1)}, 
expPanic: true, expError: "invalid data type: Type(-1)"},
diff --git a/go/arrow/array/compare.go b/go/arrow/array/compare.go
index 0ea0b61..c6665c9 100644
--- a/go/arrow/array/compare.go
+++ b/go/arrow/array/compare.go
@@ -158,6 +158,9 @@ func ArrayEqual(left, right Interface) bool {
case *DayTimeInterval:
r := right.(*DayTimeInterval)
return arrayEqualDayTimeInterval(l, r)
+   case *Duration:
+   r := right.(*Duration)
+   return arrayEqualDuration(l, r)
 
default:
panic(errors.Errorf("arrow/array: unknown array type %T", l))
@@ -341,6 +344,9 @@ func arrayApproxEqual(left, right Interface, opt 
equalOption) bool {
case *DayTimeInterval:
r := right.(*DayTimeInterval)
return arrayEqualDayTimeInterval(l, r)
+   case *Duration:
+   r := right.(*Duration)
+   return arrayEqualDuration(l, r)
 
default:
panic(errors.Errorf("arrow/array: unknown array type %T", l))
diff --git a/go/arrow/array/numeric.gen.go b/go/arrow/array/numeric.gen.go
index d72d7d0..21c4e4b 100644
--- a/go/arrow/array/numeric.gen.go
+++ b/go/arrow/array/numeric.gen.go
@@ -879,3 +879,60 @@ func arrayEqualDate64(left, right *Date64) bool {
}
return true
 }
+
+// A type which represents an immutable sequence of arrow.Duration values.
+type Duration struct {
+   array
+   values []arrow.Duration
+}
+
+func NewDurationData(data *Data) *Duration {
+   a := &Duration{}
+   a.refCount = 1
+   a.setData(data)
+   return a
+}
+
+func (a *Duration) Val

[arrow] branch master updated: ARROW-3671: [Go] implement MonthInterval and DayTimeInterval

2019-06-14 Thread sbinet
This is an automated email from the ASF dual-hosted git repository.

sbinet pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 634c8d2  ARROW-3671: [Go] implement MonthInterval and DayTimeInterval
634c8d2 is described below

commit 634c8d26eb60d261fe5096a9835bbc386860dcab
Author: Sebastien Binet 
AuthorDate: Fri Jun 14 17:39:06 2019 +0200

ARROW-3671: [Go] implement MonthInterval and DayTimeInterval

Author: Sebastien Binet 

Closes #4562 from sbinet/issue-3671 and squashes the following commits:

9a7d04da6  ARROW-3671:  implement MonthInterval and 
DayTimeInterval
---
 go/arrow/array/array.go  |   2 +-
 go/arrow/array/compare.go|  12 ++
 go/arrow/array/interval.go   | 434 +++
 go/arrow/array/interval_test.go  | 276 +
 go/arrow/datatype_fixedwidth.go  |  71 +--
 go/arrow/type_traits_interval.go | 125 +++
 6 files changed, 901 insertions(+), 19 deletions(-)

diff --git a/go/arrow/array/array.go b/go/arrow/array/array.go
index 2f8be78..c13dd07 100644
--- a/go/arrow/array/array.go
+++ b/go/arrow/array/array.go
@@ -185,7 +185,7 @@ func init() {
arrow.TIMESTAMP: func(data *Data) Interface { return 
NewTimestampData(data) },
arrow.TIME32:func(data *Data) Interface { return 
NewTime32Data(data) },
arrow.TIME64:func(data *Data) Interface { return 
NewTime64Data(data) },
-   arrow.INTERVAL:  unsupportedArrayType,
+   arrow.INTERVAL:  func(data *Data) Interface { return 
NewIntervalData(data) },
arrow.DECIMAL:   unsupportedArrayType,
arrow.LIST:  func(data *Data) Interface { return 
NewListData(data) },
arrow.STRUCT:func(data *Data) Interface { return 
NewStructData(data) },
diff --git a/go/arrow/array/compare.go b/go/arrow/array/compare.go
index da8f5ab..0ea0b61 100644
--- a/go/arrow/array/compare.go
+++ b/go/arrow/array/compare.go
@@ -152,6 +152,12 @@ func ArrayEqual(left, right Interface) bool {
case *Struct:
r := right.(*Struct)
return arrayEqualStruct(l, r)
+   case *MonthInterval:
+   r := right.(*MonthInterval)
+   return arrayEqualMonthInterval(l, r)
+   case *DayTimeInterval:
+   r := right.(*DayTimeInterval)
+   return arrayEqualDayTimeInterval(l, r)
 
default:
panic(errors.Errorf("arrow/array: unknown array type %T", l))
@@ -329,6 +335,12 @@ func arrayApproxEqual(left, right Interface, opt 
equalOption) bool {
case *Struct:
r := right.(*Struct)
return arrayApproxEqualStruct(l, r, opt)
+   case *MonthInterval:
+   r := right.(*MonthInterval)
+   return arrayEqualMonthInterval(l, r)
+   case *DayTimeInterval:
+   r := right.(*DayTimeInterval)
+   return arrayEqualDayTimeInterval(l, r)
 
default:
panic(errors.Errorf("arrow/array: unknown array type %T", l))
diff --git a/go/arrow/array/interval.go b/go/arrow/array/interval.go
new file mode 100644
index 000..21efd6e
--- /dev/null
+++ b/go/arrow/array/interval.go
@@ -0,0 +1,434 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package array // import "github.com/apache/arrow/go/arrow/array"
+
+import (
+   "fmt"
+   "strings"
+   "sync/atomic"
+
+   "github.com/apache/arrow/go/arrow"
+   "github.com/apache/arrow/go/arrow/internal/bitutil"
+   "github.com/apache/arrow/go/arrow/internal/debug"
+   "github.com/apache/arrow/go/arrow/memory"
+   "github.com/pkg/errors"
+)
+
+func NewIntervalData(data *Data) Interface {
+   switch data.dtype.(type) {
+   case *arrow.MonthIntervalType:
+   return NewMonthIntervalData(data)
+   case *arrow.DayTimeIntervalType:
+   return NewDayTimeIntervalData(data)
+   default:
+   panic(errors.Errorf("arrow/array: unknown inte

[arrow] branch master updated: ARROW-2981: [C++] improve clang-tidy usability

2019-06-14 Thread fsaintjacques
This is an automated email from the ASF dual-hosted git repository.

fsaintjacques pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 72b5531  ARROW-2981: [C++] improve clang-tidy usability
72b5531 is described below

commit 72b553147e4bd47e100fbfd58ed49041561b7bc4
Author: Benjamin Kietzman 
AuthorDate: Fri Jun 14 11:14:10 2019 -0400

ARROW-2981: [C++] improve clang-tidy usability

- adds a docker-compose service for running clang-tidy
- docker-compose runs as root, so the files touched by clang-tidy and 
clang-format were owned by root. They are now passed back to the user
- clang-format is run after clang-tidy because the latter munges formatting

I ran clang-tidy then cleaned up the build errors [in this 
branch](https://github.com/apache/arrow/compare/f92830fa751d791854e9bac9c34755dd730ec375...bkietz:clang-tidy-run-example)
 to give an idea of what things are changed and what things can go wrong.

Author: Benjamin Kietzman 

Closes #4293 from bkietz/2981-Support-scripts-documentation-for-runnin and 
squashes the following commits:

63ac52c8d  refactor clang-tidy: don't modify sources
31e475598  run code-modifying linters with lint_user
fa5af80dd  add description of producing HeaderFilterRegex
3ecc91d0f  built-in clang-format didn't match ninja 
format
2193dab2e  mention clang-tidy in integration.rst
0724b5554  update clang-tidy's header regex to sort-of 
match lint_exclusions.txt
df3a12148  clang-tidy can run clang-format automatically
ed2b2311e  maintain ownership when running 
clang-{format,tidy}
a87dd0464  adding docker-compose endpoint for clang-tidy
---
 .clang-tidy   |  4 +++-
 cpp/build-support/run_clang_tidy.py   | 12 +---
 .clang-tidy => dev/lint/run_clang_tidy.sh | 22 +-
 docker-compose.yml| 10 ++
 docs/source/developers/integration.rst|  3 ++-
 5 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/.clang-tidy b/.clang-tidy
index b05faa4..0874ab0 100644
--- a/.clang-tidy
+++ b/.clang-tidy
@@ -16,7 +16,9 @@
 # under the License.
 ---
 Checks:  
'clang-diagnostic-*,clang-analyzer-*,-clang-analyzer-alpha*,google-*,modernize-*,readability-*'
-HeaderFilterRegex: 'arrow/.*'
+# produce HeaderFilterRegex from cpp/build-support/lint_exclusions.txt with:
+# echo -n '^('; sed -e 's/*/\.*/g' cpp/build-support/lint_exclusions.txt | tr 
'\n' '|'; echo ')$'
+HeaderFilterRegex: 
'^(.*codegen.*|.*_generated.*|.*windows_compatibility.h|.*pyarrow_api.h|.*pyarrow_lib.h|.*python/config.h|.*python/platform.h|.*thirdparty/ae/.*|.*vendored/.*|.*RcppExports.cpp.*|)$'
 AnalyzeTemporaryDtors: true
 CheckOptions:
   - key: 
google-readability-braces-around-statements.ShortStatementLines
diff --git a/cpp/build-support/run_clang_tidy.py 
b/cpp/build-support/run_clang_tidy.py
index 57a3e91..857fc26 100755
--- a/cpp/build-support/run_clang_tidy.py
+++ b/cpp/build-support/run_clang_tidy.py
@@ -94,8 +94,13 @@ if __name__ == "__main__":
 help="If specified, only print errors")
 arguments = parser.parse_args()
 
+exclude_globs = []
+if arguments.exclude_globs:
+for line in open(arguments.exclude_globs):
+exclude_globs.append(line.strip())
+
 linted_filenames = []
-for path in lintutils.get_sources(arguments.source_dir):
+for path in lintutils.get_sources(arguments.source_dir, exclude_globs):
 linted_filenames.append(path)
 
 if not arguments.quiet:
@@ -111,8 +116,9 @@ if __name__ == "__main__":
 cmd.append('-fix')
 results = lintutils.run_parallel(
 [cmd + some for some in lintutils.chunk(linted_filenames, 16)])
-for result in results:
-result.check_returncode()
+for returncode, stdout, stderr in results:
+if returncode != 0:
+sys.exit(returncode)
 
 else:
 _check_all(cmd, linted_filenames)
diff --git a/.clang-tidy b/dev/lint/run_clang_tidy.sh
old mode 100644
new mode 100755
similarity index 56%
copy from .clang-tidy
copy to dev/lint/run_clang_tidy.sh
index b05faa4..8068e2c
--- a/.clang-tidy
+++ b/dev/lint/run_clang_tidy.sh
@@ -1,3 +1,4 @@
+#!/bin/bash
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
@@ -14,16 +15,11 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.

-Checks:  
'clang-diagnostic-*,clang-analyzer-*,-clang-analyzer-alpha*,google-*,modernize-*,readability-*'
-HeaderFilterRegex: 'arrow/.*'
-AnalyzeTemporaryDtors: true
-CheckOptions:
-  - key: 
google-readability-brac

[arrow] branch master updated: ARROW-5601: [C++][Gandiva] fail if the output type is not supported

2019-06-14 Thread ravindra
This is an automated email from the ASF dual-hosted git repository.

ravindra pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new f8cd263  ARROW-5601: [C++][Gandiva] fail if the output type is not 
supported
f8cd263 is described below

commit f8cd2639b2f36b3d84dceaead8a1d0b3ed493c2c
Author: Pindikura Ravindra 
AuthorDate: Fri Jun 14 19:41:01 2019 +0530

ARROW-5601: [C++][Gandiva] fail if the output type is not supported

Author: Pindikura Ravindra 

Closes #4569 from pravindra/arrow-5601 and squashes the following commits:

bee31332  ARROW-5601:  fail if the output type is not 
supported
---
 cpp/src/gandiva/llvm_generator.cc  |  9 +++--
 cpp/src/gandiva/tests/utf8_test.cc | 17 +
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/cpp/src/gandiva/llvm_generator.cc 
b/cpp/src/gandiva/llvm_generator.cc
index 28887c9..867f07b 100644
--- a/cpp/src/gandiva/llvm_generator.cc
+++ b/cpp/src/gandiva/llvm_generator.cc
@@ -318,11 +318,16 @@ Status LLVMGenerator::CodeGenExprValue(DexPtr value_expr, 
FieldDescriptorPtr out
 
   // save the value in the output vector.
   builder->SetInsertPoint(loop_body_tail);
-  if (output->Type()->id() == arrow::Type::BOOL) {
+  auto output_type_id = output->Type()->id();
+  if (output_type_id == arrow::Type::BOOL) {
 SetPackedBitValue(output_ref, loop_var, output_value->data());
-  } else {
+  } else if (arrow::is_primitive(output_type_id) ||
+ output_type_id == arrow::Type::DECIMAL) {
 llvm::Value* slot_offset = builder->CreateGEP(output_ref, loop_var);
 builder->CreateStore(output_value->data(), slot_offset);
+  } else {
+return Status::NotImplemented("output type ", output->Type()->ToString(),
+  " not supported");
   }
   ADD_TRACE("saving result " + output->Name() + " value %T", 
output_value->data());
 
diff --git a/cpp/src/gandiva/tests/utf8_test.cc 
b/cpp/src/gandiva/tests/utf8_test.cc
index 8129169..6df4da6 100644
--- a/cpp/src/gandiva/tests/utf8_test.cc
+++ b/cpp/src/gandiva/tests/utf8_test.cc
@@ -504,4 +504,21 @@ TEST_F(TestUtf8, TestIsNull) {
 outputs[1]);  // isnotnull
 }
 
+TEST_F(TestUtf8, TestVarlenOutput) {
+  // schema for input fields
+  auto field_a = field("a", utf8());
+  auto schema = arrow::schema({field_a});
+
+  // build expressions.
+  auto expr = 
TreeExprBuilder::MakeExpression(TreeExprBuilder::MakeField(field_a),
+  field("res", utf8()));
+
+  // Build a projector for the expressions.
+  std::shared_ptr projector;
+
+  // assert that it fails gracefully.
+  ASSERT_RAISES(NotImplemented,
+Projector::Make(schema, {expr}, TestConfiguration(), 
&projector));
+}
+
 }  // namespace gandiva



[arrow] branch master updated: ARROW-5582: [Go] implement RecordEqual

2019-06-14 Thread sbinet
This is an automated email from the ASF dual-hosted git repository.

sbinet pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 40632c8  ARROW-5582: [Go] implement RecordEqual
40632c8 is described below

commit 40632c847c93291ed025b8e539677882e0a69d35
Author: Sebastien Binet 
AuthorDate: Fri Jun 14 15:53:42 2019 +0200

ARROW-5582: [Go] implement RecordEqual

Author: Sebastien Binet 

Closes #4561 from sbinet/issue-5582 and squashes the following commits:

751ba393c  go/arrow/array: add RecordApproxEqual
a67379b60  ARROW-5582:  implement RecordEqual
---
 go/arrow/array/compare.go  | 41 ++
 go/arrow/array/compare_test.go | 50 ++
 2 files changed, 91 insertions(+)

diff --git a/go/arrow/array/compare.go b/go/arrow/array/compare.go
index 9fa13a1..da8f5ab 100644
--- a/go/arrow/array/compare.go
+++ b/go/arrow/array/compare.go
@@ -24,6 +24,47 @@ import (
"github.com/pkg/errors"
 )
 
+// RecordEqual reports whether the two provided records are equal.
+func RecordEqual(left, right Record) bool {
+   switch {
+   case left.NumCols() != right.NumCols():
+   return false
+   case left.NumRows() != right.NumRows():
+   return false
+   }
+
+   for i := range left.Columns() {
+   lc := left.Column(i)
+   rc := right.Column(i)
+   if !ArrayEqual(lc, rc) {
+   return false
+   }
+   }
+   return true
+}
+
+// RecordApproxEqual reports whether the two provided records are 
approximately equal.
+// For non-floating point columns, it is equivalent to RecordEqual.
+func RecordApproxEqual(left, right Record, opts ...EqualOption) bool {
+   switch {
+   case left.NumCols() != right.NumCols():
+   return false
+   case left.NumRows() != right.NumRows():
+   return false
+   }
+
+   opt := newEqualOption(opts...)
+
+   for i := range left.Columns() {
+   lc := left.Column(i)
+   rc := right.Column(i)
+   if !arrayApproxEqual(lc, rc, opt) {
+   return false
+   }
+   }
+   return true
+}
+
 // ArrayEqual reports whether the two provided arrays are equal.
 func ArrayEqual(left, right Interface) bool {
switch {
diff --git a/go/arrow/array/compare_test.go b/go/arrow/array/compare_test.go
index 9985f51..e9927f0 100644
--- a/go/arrow/array/compare_test.go
+++ b/go/arrow/array/compare_test.go
@@ -479,3 +479,53 @@ func TestArrayEqualDifferentMaskedValues(t *testing.T) {
t.Errorf("%v must be equal to %v", a1, a2)
}
 }
+
+func TestRecordEqual(t *testing.T) {
+   for name, recs := range arrdata.Records {
+   t.Run(name, func(t *testing.T) {
+   rec0 := recs[0]
+   rec1 := recs[1]
+   if !array.RecordEqual(rec0, rec0) {
+   t.Fatalf("identical records should compare 
equal:\nrecord:\n%v", rec0)
+   }
+
+   if array.RecordEqual(rec0, rec1) {
+   t.Fatalf("non-identical records should not 
compare equal:\nrec0:\n%v\nrec1:\n%v", rec0, rec1)
+   }
+
+   sub00 := rec0.NewSlice(0, recs[0].NumRows()-1)
+   defer sub00.Release()
+   sub01 := rec0.NewSlice(1, recs[0].NumRows())
+   defer sub01.Release()
+
+   if array.RecordEqual(sub00, sub01) {
+   t.Fatalf("non-identical records should not 
compare equal:\nsub0:\n%v\nsub1:\n%v", sub00, sub01)
+   }
+   })
+   }
+}
+
+func TestRecordApproxEqual(t *testing.T) {
+   for name, recs := range arrdata.Records {
+   t.Run(name, func(t *testing.T) {
+   rec0 := recs[0]
+   rec1 := recs[1]
+   if !array.RecordApproxEqual(rec0, rec0) {
+   t.Fatalf("identical records should compare 
equal:\nrecord:\n%v", rec0)
+   }
+
+   if array.RecordApproxEqual(rec0, rec1) {
+   t.Fatalf("non-identical records should not 
compare equal:\nrec0:\n%v\nrec1:\n%v", rec0, rec1)
+   }
+
+   sub00 := rec0.NewSlice(0, recs[0].NumRows()-1)
+   defer sub00.Release()
+   sub01 := rec0.NewSlice(1, recs[0].NumRows())
+   defer sub01.Release()
+
+   if array.RecordApproxEqual(sub00, sub01) {
+   t.Fatalf("non-identical records should not 
compare equal:\nsub0:\n%v\nsub1

[arrow] branch master updated: ARROW-5600: [R] R package namespace cleanup

2019-06-14 Thread romainfrancois
This is an automated email from the ASF dual-hosted git repository.

romainfrancois pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 2ef96c8  ARROW-5600: [R] R package namespace cleanup
2ef96c8 is described below

commit 2ef96c8623cbad1770f82e97df733bd881ab967b
Author: Romain Francois 
AuthorDate: Fri Jun 14 15:45:30 2019 +0200

ARROW-5600: [R] R package namespace cleanup

This is instead of https://github.com/apache/arrow/pull/4491, without the 
function naming change that we wanted to think about more intentionally.

It also removes a few lingering references to `tibble` in the package, 
which were still passing in tests because tibble is in Suggests and the test 
hosts install all of the Suggests packages.

@romainfrancois

Author: Romain Francois 
Author: Neal Richardson 

Closes #4566 from nealrichardson/clean-imports and squashes the following 
commits:

e0cf0051  not importing glue::glue
002c0f01  no need for glue either at this point
25d4873b  Prune unused Imports; fix a couple of lingering 
tibble references
---
 r/DESCRIPTION| 4 +---
 r/NAMESPACE  | 1 -
 r/R/R6.R | 2 +-
 r/R/RecordBatch.R| 2 +-
 r/R/Table.R  | 2 +-
 r/R/arrow-package.R  | 1 -
 r/R/read_table.R | 4 ++--
 r/man/read_table.Rd  | 2 +-
 r/tests/testthat/{test-arrow-csv-.R => test-arrow-csv.R} | 0
 9 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index 103a63b..9bec314 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -27,11 +27,8 @@ Imports:
 rlang,
 purrr,
 assertthat,
-glue,
 R6,
-vctrs (>= 0.1.0),
 fs,
-crayon,
 bit64
 Roxygen: list(markdown = TRUE)
 RoxygenNote: 6.1.1
@@ -43,6 +40,7 @@ Suggests:
 roxygen2,
 testthat,
 lubridate,
+vctrs,
 hms
 Collate:
 'enums.R'
diff --git a/r/NAMESPACE b/r/NAMESPACE
index 3f91568..78cdfd5 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -173,7 +173,6 @@ importFrom(Rcpp,sourceCpp)
 importFrom(assertthat,assert_that)
 importFrom(bit64,print.integer64)
 importFrom(bit64,str.integer64)
-importFrom(glue,glue)
 importFrom(purrr,map)
 importFrom(purrr,map2)
 importFrom(purrr,map_int)
diff --git a/r/R/R6.R b/r/R/R6.R
index e343116..41169f3 100644
--- a/r/R/R6.R
+++ b/r/R/R6.R
@@ -26,7 +26,7 @@
   self$`.:xp:.` <- xp
 },
 print = function(...){
-  cat(crayon::silver(glue::glue("{cl}", cl = class(self)[[1]])), "\n")
+  cat(class(self)[[1]], "\n")
   if(!is.null(self$ToString)){
 cat(self$ToString(), "\n")
   }
diff --git a/r/R/RecordBatch.R b/r/R/RecordBatch.R
index d60c823..8c90254 100644
--- a/r/R/RecordBatch.R
+++ b/r/R/RecordBatch.R
@@ -97,7 +97,7 @@
 #' @return a [arrow::RecordBatch][arrow__RecordBatch]
 #' @export
 record_batch <- function(..., schema = NULL){
-  arrays <- tibble::lst(...)
+  arrays <- list2(...)
   stopifnot(length(arrays) > 0)
   shared_ptr(`arrow::RecordBatch`, RecordBatch__from_arrays(schema, arrays))
 }
diff --git a/r/R/Table.R b/r/R/Table.R
index 6d50394..d1e4b18 100644
--- a/r/R/Table.R
+++ b/r/R/Table.R
@@ -60,7 +60,7 @@
 #'
 #' @export
 table <- function(..., schema = NULL){
-  dots <- tibble::lst(...)
+  dots <- list2(...)
   stopifnot(length(dots) > 0)
   shared_ptr(`arrow::Table`, Table__from_dots(dots, schema))
 }
diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R
index 41cbc2a..faaaf2a 100644
--- a/r/R/arrow-package.R
+++ b/r/R/arrow-package.R
@@ -16,7 +16,6 @@
 # under the License.
 
 #' @importFrom R6 R6Class
-#' @importFrom glue glue
 #' @importFrom purrr map map_int map2
 #' @importFrom assertthat assert_that
 #' @importFrom rlang list2 %||% is_false abort dots_n warn
diff --git a/r/R/read_table.R b/r/R/read_table.R
index 57ef5ec..ff2c5dd 100644
--- a/r/R/read_table.R
+++ b/r/R/read_table.R
@@ -36,7 +36,7 @@
 #' @return
 #'
 #'  - `read_table` returns an [arrow::Table][arrow__Table]
-#'  - `read_arrow` returns a [tibble::tibble()]
+#'  - `read_arrow` returns a `data.frame`
 #'
 #' @details
 #'
@@ -84,5 +84,5 @@ read_table.fs_path <- function(stream) {
 #' @rdname read_table
 #' @export
 read_arrow <- function(stream){
-  as_tibble(read_table(stream))
+  as.data.frame(read_table(stream))
 }
diff --git a/r/man/read_table.Rd b/r/man/read_table.Rd
index 3231b26..c5863c1 100644
--- a/r/man/read_table.Rd
+++ b/r/man/read_table.Rd
@@ -27,7 +27,7 @@ to process it.
 \value{
 \itemize{
 \item \code{read_table} returns an \link[=arrow__Table]{arrow::Table}
-\item \code{read_arrow} returns a \code{\link[tibble:tibble]{tibble::t

[arrow] branch master updated: ARROW-5342: [Format] Formalize "extension types" in Arrow protocol metadata

2019-06-14 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 6fb850c  ARROW-5342: [Format] Formalize "extension types" in Arrow 
protocol metadata
6fb850c is described below

commit 6fb850cf57fd6227573cca6d43a46e1d5d2b0a66
Author: Wes McKinney 
AuthorDate: Fri Jun 14 08:19:38 2019 -0500

ARROW-5342: [Format] Formalize "extension types" in Arrow protocol metadata

This patch proposes a language-independent scheme for annotating built-in 
Arrow types with a custom type name and serialized representation, per previous 
discussions on the mailing list.

I am starting a mailing list discussion to hold a vote about this and see 
if there are other ideas about how to proceed.

Author: Wes McKinney 

Closes #4332 from wesm/ARROW-5342 and squashes the following commits:

ff7ca2c37  Fix formatting issue and missing backtick
4d0317482  Add language to formalize extension type 
machinery. Change C++ metadata key names to use ARROW: prefix
---
 cpp/src/arrow/extension_type-test.cc   |  4 +-
 cpp/src/arrow/ipc/metadata-internal.cc |  8 ++--
 docs/source/format/Metadata.rst| 77 --
 3 files changed, 70 insertions(+), 19 deletions(-)

diff --git a/cpp/src/arrow/extension_type-test.cc 
b/cpp/src/arrow/extension_type-test.cc
index 90f96cd..6b632a9 100644
--- a/cpp/src/arrow/extension_type-test.cc
+++ b/cpp/src/arrow/extension_type-test.cc
@@ -279,8 +279,8 @@ TEST_F(TestExtensionType, UnrecognizedExtension) {
 
   ASSERT_OK(UnregisterExtensionType("uuid"));
   auto ext_metadata =
-  key_value_metadata({{"arrow_extension_name", "uuid"},
-  {"arrow_extension_data", "uuid-type-unique-code"}});
+  key_value_metadata({{"ARROW:extension:name", "uuid"},
+  {"ARROW:extension:metadata", 
"uuid-type-unique-code"}});
   auto ext_field = field("f0", fixed_size_binary(16), true, ext_metadata);
   auto batch_no_ext = RecordBatch::Make(schema({ext_field}), 4, {storage_arr});
 
diff --git a/cpp/src/arrow/ipc/metadata-internal.cc 
b/cpp/src/arrow/ipc/metadata-internal.cc
index 1d0ac8a..46f3366 100644
--- a/cpp/src/arrow/ipc/metadata-internal.cc
+++ b/cpp/src/arrow/ipc/metadata-internal.cc
@@ -62,8 +62,8 @@ using Offset = flatbuffers::Offset;
 using FBString = flatbuffers::Offset;
 using KVVector = flatbuffers::Vector;
 
-static const char kExtensionTypeKeyName[] = "arrow_extension_name";
-static const char kExtensionDataKeyName[] = "arrow_extension_data";
+static const char kExtensionTypeKeyName[] = "ARROW:extension:name";
+static const char kExtensionMetadataKeyName[] = "ARROW:extension:metadata";
 
 MetadataVersion GetMetadataVersion(flatbuf::MetadataVersion version) {
   switch (version) {
@@ -370,7 +370,7 @@ static Status TypeFromFlatbuffer(const flatbuf::Field* 
field,
   return Status::OK();
 }
 std::string type_name = field_metadata->value(name_index);
-int data_index = field_metadata->FindKey(kExtensionDataKeyName);
+int data_index = field_metadata->FindKey(kExtensionMetadataKeyName);
 std::string type_data = data_index == -1 ? "" : 
field_metadata->value(data_index);
 
 std::shared_ptr type = GetExtensionType(type_name);
@@ -674,7 +674,7 @@ class FieldToFlatbufferVisitor {
   Status Visit(const ExtensionType& type) {
 RETURN_NOT_OK(VisitType(*type.storage_type()));
 extra_type_metadata_[kExtensionTypeKeyName] = type.extension_name();
-extra_type_metadata_[kExtensionDataKeyName] = type.Serialize();
+extra_type_metadata_[kExtensionMetadataKeyName] = type.Serialize();
 return Status::OK();
   }
 
diff --git a/docs/source/format/Metadata.rst b/docs/source/format/Metadata.rst
index b6c2a5f..f4be82b 100644
--- a/docs/source/format/Metadata.rst
+++ b/docs/source/format/Metadata.rst
@@ -29,9 +29,6 @@ systems to communicate the
 * "Data headers" indicating the physical locations of memory buffers sufficient
   to reconstruct a Arrow data structures without copying memory.
 
-Canonical implementation
-
-
 We are using `Flatbuffers`_ for low-overhead reading and writing of the Arrow
 metadata. See ``Message.fbs``.
 
@@ -65,8 +62,8 @@ the columns. The Flatbuffers IDL for a field is: ::
 The ``type`` is the logical type of the field. Nested types, such as List,
 Struct, and Union, have a sequence of child fields.
 
-Record data headers

+Record Batch Data Headers
+-
 
 A record batch is a collection of top-level named, equal length Arrow arrays
 (or vectors). If one of the arrays contains nested data, its child arrays are
@@ -193,12 +190,74 @@ categories:
 Refer to `Schema.fbs`_ for up-to-date descriptions of each built-in
 logical type.
 
+Custom Application Metadata
+---
+
+We provi

[arrow] branch master updated: ARROW-5603: [Python] Register custom pytest markers to avoid warnings

2019-06-14 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 1423df1  ARROW-5603: [Python] Register custom pytest markers to avoid 
warnings
1423df1 is described below

commit 1423df1a83173cbd9f76f81274acabdf9259cb5a
Author: Joris Van den Bossche 
AuthorDate: Fri Jun 14 08:04:49 2019 -0500

ARROW-5603: [Python] Register custom pytest markers to avoid warnings

https://issues.apache.org/jira/browse/ARROW-5603

Author: Joris Van den Bossche 

Closes #4570 from jorisvandenbossche/ARROW-5603-pytest-markers and squashes 
the following commits:

72feab5b1  ARROW-5603:  register custom pytest 
markers to avoid warnings
---
 python/pyarrow/tests/conftest.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyarrow/tests/conftest.py b/python/pyarrow/tests/conftest.py
index 8a6304e..4907557 100644
--- a/python/pyarrow/tests/conftest.py
+++ b/python/pyarrow/tests/conftest.py
@@ -105,7 +105,10 @@ except ImportError:
 
 
 def pytest_configure(config):
-pass
+for mark in groups:
+config.addinivalue_line(
+"markers", mark,
+)
 
 
 def pytest_addoption(parser):



[arrow] branch master updated: ARROW-5584: [Java] Add import for link reference in FieldReader javadoc

2019-06-14 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 4cb827f  ARROW-5584: [Java] Add import for link reference in 
FieldReader javadoc
4cb827f is described below

commit 4cb827feb55610fdbb6f73e126ecedcc8be07192
Author: tianchen 
AuthorDate: Fri Jun 14 08:01:04 2019 -0500

ARROW-5584: [Java] Add import for link reference in FieldReader javadoc

see [ARROW-5584](https://issues.apache.org/jira/browse/ARROW-5584).

Author: tianchen 

Closes #4546 from tianchen92/ARROW-5584 and squashes the following commits:

33924aedc  ARROW-5584: Add import for link reference in 
FieldReader javadoc
---
 .../main/java/org/apache/arrow/vector/complex/reader/FieldReader.java   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/java/vector/src/main/java/org/apache/arrow/vector/complex/reader/FieldReader.java
 
b/java/vector/src/main/java/org/apache/arrow/vector/complex/reader/FieldReader.java
index d16992f..8825bc3 100644
--- 
a/java/vector/src/main/java/org/apache/arrow/vector/complex/reader/FieldReader.java
+++ 
b/java/vector/src/main/java/org/apache/arrow/vector/complex/reader/FieldReader.java
@@ -26,7 +26,7 @@ import 
org.apache.arrow.vector.complex.reader.BaseReader.StructReader;
 
 /**
  * Composite of all Reader types (e.g. {@link StructReader}, {@link 
ScalarReader}, etc).  Each reader type
- * is in essence a way of iterating over a {@link ValueVector}.
+ * is in essence a way of iterating over a {@link 
org.apache.arrow.vector.ValueVector}.
  */
 public interface FieldReader extends StructReader, ListReader, ScalarReader, 
RepeatedStructReader, RepeatedListReader {
 }



[arrow] branch master updated: ARROW-5545: [C++][Docs] Clarify expectation of UTC values for timestamps with time zones

2019-06-14 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 10a571b  ARROW-5545: [C++][Docs] Clarify expectation of UTC values for 
timestamps with time zones
10a571b is described below

commit 10a571b4334da95f00411a64959170fd08e0dba1
Author: TP Boudreau 
AuthorDate: Fri Jun 14 07:59:40 2019 -0500

ARROW-5545: [C++][Docs] Clarify expectation of UTC values for timestamps 
with time zones

Doxygen comments only.  No code changes.

Author: TP Boudreau 

Closes #4555 from tpboudreau/ARROW-5545 and squashes the following commits:

305b0d84d  Add comments for three temporal DataTypes
---
 cpp/src/arrow/type.h | 36 
 1 file changed, 36 insertions(+)

diff --git a/cpp/src/arrow/type.h b/cpp/src/arrow/type.h
index 98d2e4e..b581739 100644
--- a/cpp/src/arrow/type.h
+++ b/cpp/src/arrow/type.h
@@ -769,6 +769,8 @@ class ARROW_EXPORT TimeType : public TemporalType, public 
ParametricType {
   TimeUnit::type unit_;
 };
 
+/// Concrete type class for 32-bit time data (as number of seconds or 
milliseconds
+/// since midnight)
 class ARROW_EXPORT Time32Type : public TimeType {
  public:
   static constexpr Type::type type_id = Type::TIME32;
@@ -783,6 +785,8 @@ class ARROW_EXPORT Time32Type : public TimeType {
   std::string name() const override { return "time32"; }
 };
 
+/// Concrete type class for 64-bit time data (as number of microseconds or 
nanoseconds
+/// since midnight)
 class ARROW_EXPORT Time64Type : public TimeType {
  public:
   static constexpr Type::type type_id = Type::TIME64;
@@ -797,6 +801,38 @@ class ARROW_EXPORT Time64Type : public TimeType {
   std::string name() const override { return "time64"; }
 };
 
+/// \brief Concrete type class for datetime data (as number of seconds, 
milliseconds,
+/// microseconds or nanoseconds since UNIX epoch)
+///
+/// If supplied, the timezone string should take either the form (i) 
"Area/Location",
+/// with values drawn from the names in the IANA Time Zone Database (such as
+/// "Europe/Zurich"); or (ii) "(+|-)HH:MM" indicating an absolute offset from 
GMT
+/// (such as "-08:00").  To indicate a native UTC timestamp, one of the 
strings "UTC",
+/// "Etc/UTC" or "+00:00" should be used.
+///
+/// If any non-empty string is supplied as the timezone for a TimestampType, 
then the
+/// Arrow field containing that timestamp type (and by extension the column 
associated
+/// with such a field) is considered "timezone-aware".  The integer arrays 
that comprise
+/// a timezone-aware column must contain UTC normalized datetime values, 
regardless of
+/// the contents of their timezone string.  More precisely, (i) the producer 
of a
+/// timezone-aware column must populate its constituent arrays with valid UTC 
values
+/// (performing offset conversions from non-UTC values if necessary); and (ii) 
the
+/// consumer of a timezone-aware column may assume that the column's values 
are directly
+/// comparable (that is, with no offset adjustment required) to the values of 
any other
+/// timezone-aware column or to any other valid UTC datetime value (provided 
all values
+/// are expressed in the same units).
+///
+/// If a TimestampType is constructed without a timezone (or, equivalently, if 
the
+/// timezone supplied is an empty string) then the resulting Arrow field 
(column) is
+/// considered "timezone-naive".  The producer of a timezone-naive column may 
populate
+/// its constituent integer arrays with datetime values from any timezone; the 
consumer
+/// of a timezone-naive column should make no assumptions about the 
interoperability or
+/// comparability of the values of such a column with those of any other 
timestamp
+/// column or datetime value.
+///
+/// If a timezone-aware field contains a recognized timezone, its values may be
+/// localized to that locale upon display; the values of timezone-naive fields 
must
+/// always be displayed "as is", with no localization performed on them.
 class ARROW_EXPORT TimestampType : public TemporalType, public ParametricType {
  public:
   using Unit = TimeUnit;



[arrow] branch master updated: ARROW-5517: [C++] Only check header basename for 'internal' when collecting public headers

2019-06-14 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 03c0828  ARROW-5517: [C++] Only check header basename for 'internal' 
when collecting public headers
03c0828 is described below

commit 03c08285c692862a72cc794d9fe05961ec0ceb8e
Author: Benjamin Kietzman 
AuthorDate: Fri Jun 14 07:55:38 2019 -0500

ARROW-5517: [C++] Only check header basename for 'internal' when collecting 
public headers

Author: Benjamin Kietzman 

Closes #4551 from bkietz/5517-Header-collection-CMake-logic-should-onl and 
squashes the following commits:

140b95b81  only check header basename for 'internal' 
when collecting public headers
---
 cpp/cmake_modules/BuildUtils.cmake | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/cpp/cmake_modules/BuildUtils.cmake 
b/cpp/cmake_modules/BuildUtils.cmake
index 781cedc..293a7ef 100644
--- a/cpp/cmake_modules/BuildUtils.cmake
+++ b/cpp/cmake_modules/BuildUtils.cmake
@@ -720,9 +720,11 @@ function(ARROW_INSTALL_ALL_HEADERS PATH)
 
   set(PUBLIC_HEADERS)
   foreach(HEADER ${CURRENT_DIRECTORY_HEADERS})
-if(NOT ((HEADER MATCHES "internal")))
-  list(APPEND PUBLIC_HEADERS ${HEADER})
+get_filename_component(HEADER_BASENAME ${HEADER} NAME)
+if(HEADER_BASENAME MATCHES "internal")
+  continue()
 endif()
+list(APPEND PUBLIC_HEADERS ${HEADER})
   endforeach()
   install(FILES ${PUBLIC_HEADERS} DESTINATION 
"${CMAKE_INSTALL_INCLUDEDIR}/${PATH}")
 endfunction()



[arrow] branch master updated: ARROW-840: [Python] Expose extension types

2019-06-14 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new eb5dd50  ARROW-840: [Python] Expose extension types
eb5dd50 is described below

commit eb5dd508ee3f592bf1c2a04cce09ee95e137e89b
Author: Antoine Pitrou 
AuthorDate: Fri Jun 14 07:53:40 2019 -0500

ARROW-840: [Python] Expose extension types

Add infrastructure to consume C++ extension types and extension arrays from 
Python.

Also allow creating Python-specific extension types by subclassing 
`ExtensionType`, and creating extension arrays by passing the type and storage 
array to `ExtensionArray.from_storage`.

Author: Antoine Pitrou 

Closes #4532 from pitrou/ARROW-840-py-ext-types and squashes the following 
commits:

95ca6148e  Add IPC tests
44ac0a156  ARROW-840:  Expose extension types
---
 cpp/src/arrow/array.cc  |  11 +-
 cpp/src/arrow/extension_type.cc |  18 +++
 cpp/src/arrow/extension_type.h  |   9 +-
 cpp/src/arrow/python/CMakeLists.txt |   1 +
 cpp/src/arrow/python/extension_type.cc  | 196 +
 cpp/src/arrow/python/extension_type.h   |  77 ++
 cpp/src/arrow/python/pyarrow.h  |   1 +
 python/pyarrow/__init__.py  |   4 +-
 python/pyarrow/array.pxi|  42 +-
 python/pyarrow/includes/libarrow.pxd|  32 
 python/pyarrow/lib.pxd  |  20 ++-
 python/pyarrow/public-api.pxi   |  12 +-
 python/pyarrow/tests/test_extension_type.py | 219 
 python/pyarrow/types.pxi| 150 +--
 14 files changed, 775 insertions(+), 17 deletions(-)

diff --git a/cpp/src/arrow/array.cc b/cpp/src/arrow/array.cc
index 7a3d36e..9d37b45 100644
--- a/cpp/src/arrow/array.cc
+++ b/cpp/src/arrow/array.cc
@@ -1259,7 +1259,16 @@ struct ValidateVisitor {
 return Status::OK();
   }
 
-  Status Visit(const ExtensionArray& array) { return 
ValidateArray(*array.storage()); }
+  Status Visit(const ExtensionArray& array) {
+const auto& ext_type = checked_cast(*array.type());
+
+if (!array.storage()->type()->Equals(*ext_type.storage_type())) {
+  return Status::Invalid("Extension array of type '", 
array.type()->ToString(),
+ "' has storage array of incompatible type '",
+ array.storage()->type()->ToString(), "'");
+}
+return ValidateArray(*array.storage());
+  }
 
  protected:
   template 
diff --git a/cpp/src/arrow/extension_type.cc b/cpp/src/arrow/extension_type.cc
index e104c03..25945f3 100644
--- a/cpp/src/arrow/extension_type.cc
+++ b/cpp/src/arrow/extension_type.cc
@@ -27,10 +27,14 @@
 #include "arrow/array.h"
 #include "arrow/status.h"
 #include "arrow/type.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
 
+using internal::checked_cast;
+
 DataTypeLayout ExtensionType::layout() const { return storage_type_->layout(); 
}
 
 std::string ExtensionType::ToString() const {
@@ -41,7 +45,21 @@ std::string ExtensionType::ToString() const {
 
 std::string ExtensionType::name() const { return "extension"; }
 
+ExtensionArray::ExtensionArray(const std::shared_ptr& data) { 
SetData(data); }
+
+ExtensionArray::ExtensionArray(const std::shared_ptr& type,
+   const std::shared_ptr& storage) {
+  DCHECK_EQ(type->id(), Type::EXTENSION);
+  DCHECK(
+  storage->type()->Equals(*checked_cast(*type).storage_type()));
+  auto data = storage->data()->Copy();
+  // XXX This pointer is reverted below in SetData()...
+  data->type = type;
+  SetData(data);
+}
+
 void ExtensionArray::SetData(const std::shared_ptr& data) {
+  DCHECK_EQ(data->type->id(), Type::EXTENSION);
   this->Array::SetData(data);
 
   auto storage_data = data->Copy();
diff --git a/cpp/src/arrow/extension_type.h b/cpp/src/arrow/extension_type.h
index b3df2b3..6a1ca0b 100644
--- a/cpp/src/arrow/extension_type.h
+++ b/cpp/src/arrow/extension_type.h
@@ -84,7 +84,14 @@ class ARROW_EXPORT ExtensionType : public DataType {
 /// \brief Base array class for user-defined extension types
 class ARROW_EXPORT ExtensionArray : public Array {
  public:
-  explicit ExtensionArray(const std::shared_ptr& data) { 
SetData(data); }
+  /// \brief Construct an ExtensionArray from an ArrayData.
+  ///
+  /// The ArrayData must have the right ExtensionType.
+  explicit ExtensionArray(const std::shared_ptr& data);
+
+  /// \brief Construct an ExtensionArray from a type and the underlying 
storage.
+  ExtensionArray(const std::shared_ptr& type,
+ const std::shared_ptr& storage);
 
   /// \brief The physical storage for the extension array
   std::shared_ptr storage() const { return stor

[arrow] branch master updated: ARROW-4974: [Go] implement ArrayApproxEqual

2019-06-14 Thread sbinet
This is an automated email from the ASF dual-hosted git repository.

sbinet pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 3cee652  ARROW-4974: [Go] implement ArrayApproxEqual
3cee652 is described below

commit 3cee652fa40718a4fb16e4ecf331daa0ca8c53d5
Author: Sebastien Binet 
AuthorDate: Fri Jun 14 14:35:42 2019 +0200

ARROW-4974: [Go] implement ArrayApproxEqual

Author: Sebastien Binet 

Closes #4556 from sbinet/issue-4974 and squashes the following commits:

f1baaf9dd  ARROW-4974:  implement ArrayApproxEqual
---
 go/arrow/array/compare.go  | 256 +
 go/arrow/array/compare_test.go | 253 
 2 files changed, 509 insertions(+)

diff --git a/go/arrow/array/compare.go b/go/arrow/array/compare.go
index 60e21fb..9fa13a1 100644
--- a/go/arrow/array/compare.go
+++ b/go/arrow/array/compare.go
@@ -17,7 +17,10 @@
 package array
 
 import (
+   "math"
+
"github.com/apache/arrow/go/arrow"
+   "github.com/apache/arrow/go/arrow/float16"
"github.com/pkg/errors"
 )
 
@@ -124,6 +127,175 @@ func ArraySliceEqual(left Interface, lbeg, lend int64, 
right Interface, rbeg, re
return ArrayEqual(l, r)
 }
 
+const defaultAbsoluteTolerance = 1e-5
+
+type equalOption struct {
+   atol   float64 // absolute tolerance
+   nansEq bool// whether NaNs are considered equal.
+}
+
+func (eq equalOption) f16(f1, f2 float16.Num) bool {
+   v1 := float64(f1.Float32())
+   v2 := float64(f2.Float32())
+   switch {
+   case eq.nansEq:
+   return math.Abs(v1-v2) <= eq.atol || (math.IsNaN(v1) && 
math.IsNaN(v2))
+   default:
+   return math.Abs(v1-v2) <= eq.atol
+   }
+}
+
+func (eq equalOption) f32(f1, f2 float32) bool {
+   v1 := float64(f1)
+   v2 := float64(f2)
+   switch {
+   case eq.nansEq:
+   return math.Abs(v1-v2) <= eq.atol || (math.IsNaN(v1) && 
math.IsNaN(v2))
+   default:
+   return math.Abs(v1-v2) <= eq.atol
+   }
+}
+
+func (eq equalOption) f64(v1, v2 float64) bool {
+   switch {
+   case eq.nansEq:
+   return math.Abs(v1-v2) <= eq.atol || (math.IsNaN(v1) && 
math.IsNaN(v2))
+   default:
+   return math.Abs(v1-v2) <= eq.atol
+   }
+}
+
+func newEqualOption(opts ...EqualOption) equalOption {
+   eq := equalOption{
+   atol:   defaultAbsoluteTolerance,
+   nansEq: false,
+   }
+   for _, opt := range opts {
+   opt(&eq)
+   }
+
+   return eq
+}
+
+// EqualOption is a functional option type used to configure how Records and 
Arrays are compared.
+type EqualOption func(*equalOption)
+
+// WithNaNsEqual configures the comparison functions so that NaNs are 
considered equal.
+func WithNaNsEqual(v bool) EqualOption {
+   return func(o *equalOption) {
+   o.nansEq = v
+   }
+}
+
+// WithAbsTolerance configures the comparison functions so that 2 floating 
point values
+// v1 and v2 are considered equal if |v1-v2| <= atol.
+func WithAbsTolerance(atol float64) EqualOption {
+   return func(o *equalOption) {
+   o.atol = atol
+   }
+}
+
+// ArrayApproxEqual reports whether the two provided arrays are approximately 
equal.
+// For non-floating point arrays, it is equivalent to ArrayEqual.
+func ArrayApproxEqual(left, right Interface, opts ...EqualOption) bool {
+   opt := newEqualOption(opts...)
+   return arrayApproxEqual(left, right, opt)
+}
+
+func arrayApproxEqual(left, right Interface, opt equalOption) bool {
+   switch {
+   case !baseArrayEqual(left, right):
+   return false
+   case left.Len() == 0:
+   return true
+   case left.NullN() == left.Len():
+   return true
+   }
+
+   // at this point, we know both arrays have same type, same length, same 
number of nulls
+   // and nulls at the same place.
+   // compare the values.
+
+   switch l := left.(type) {
+   case *Null:
+   return true
+   case *Boolean:
+   r := right.(*Boolean)
+   return arrayEqualBoolean(l, r)
+   case *FixedSizeBinary:
+   r := right.(*FixedSizeBinary)
+   return arrayEqualFixedSizeBinary(l, r)
+   case *Binary:
+   r := right.(*Binary)
+   return arrayEqualBinary(l, r)
+   case *String:
+   r := right.(*String)
+   return arrayEqualString(l, r)
+   case *Int8:
+   r := right.(*Int8)
+   return arrayEqualInt8(l, r)
+   case *Int16:
+   r := right.(*Int16)
+   return arrayEqualInt16(l, r)
+   case *Int32:
+   r := right.(*Int32)
+   return arrayEqualInt32(l, r)
+   case 

[arrow] branch master updated: ARROW-5565: [Python][Docs] Add instructions how to use gdb to debug C++ libraries when running Python unit tests

2019-06-14 Thread uwe
This is an automated email from the ASF dual-hosted git repository.

uwe pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new c938464  ARROW-5565: [Python][Docs] Add instructions how to use gdb to 
debug C++ libraries when running Python unit tests
c938464 is described below

commit c9384641e44707c41f78703a8be738e77a072896
Author: Wes McKinney 
AuthorDate: Fri Jun 14 13:43:11 2019 +0200

ARROW-5565: [Python][Docs] Add instructions how to use gdb to debug C++ 
libraries when running Python unit tests

Author: Wes McKinney 

Closes #4560 from wesm/ARROW-5565 and squashes the following commits:

325b3670  Add docs section about how to use gdb to debug from 
Python
---
 docs/source/developers/python.rst | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/docs/source/developers/python.rst 
b/docs/source/developers/python.rst
index 69bd59d..0242714 100644
--- a/docs/source/developers/python.rst
+++ b/docs/source/developers/python.rst
@@ -341,6 +341,32 @@ environment variable when building pyarrow:
 
export PYARROW_WITH_CUDA=1
 
+Debugging
+-
+
+Since pyarrow depends on the Arrow C++ libraries, debugging can
+frequently involve crossing between Python and C++ shared libraries.
+
+Using gdb on Linux
+~~
+
+To debug the C++ libraries with gdb while running the Python unit
+   test, first start pytest with gdb:
+
+.. code-block:: shell
+
+   gdb --args python -m pytest pyarrow/tests/test_to_run.py -k $TEST_TO_MATCH
+
+To set a breakpoint, use the same gdb syntax that you would when
+debugging a C++ unitttest, for example:
+
+.. code-block:: shell
+
+   (gdb) b src/arrow/python/arrow_to_pandas.cc:1874
+   No source file named src/arrow/python/arrow_to_pandas.cc.
+   Make breakpoint pending on future shared library load? (y or [n]) y
+   Breakpoint 1 (src/arrow/python/arrow_to_pandas.cc:1874) pending.
+
 Building on Windows
 ===
 



[arrow] branch master updated (d20963d -> 6743dc0)

2019-06-14 Thread ravindra
This is an automated email from the ASF dual-hosted git repository.

ravindra pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from d20963d  ARROW-1278: [Integration] Adding integration tests for 
fixed_size_list
 add 6743dc0  ARROW-5602: [Java][Gandiva] Add tests for round/cast

No new revisions were added by this update.

Summary of changes:
 cpp/src/gandiva/decimal_ir.cc  |  14 +
 cpp/src/gandiva/precompiled/decimal_wrapper.cc |  20 +-
 cpp/src/gandiva/tests/decimal_test.cc  |   2 +-
 .../gandiva/evaluator/ProjectorDecimalTest.java| 321 -
 .../arrow/memory/AllocationOutcomeDetails.java |   2 +-
 5 files changed, 350 insertions(+), 9 deletions(-)