[jira] [Updated] (ARROW-2087) [Python] Binaries of 3rdparty are not stripped in manylinux1 base image

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2087:
--
Component/s: Python
 Packaging

> [Python] Binaries of 3rdparty are not stripped in manylinux1 base image
> ---
>
> Key: ARROW-2087
> URL: https://issues.apache.org/jira/browse/ARROW-2087
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> CMake pip package: 
> [https://github.com/scikit-build/cmake-python-distributions/issues/32]
> Pandas pip package: [https://github.com/pandas-dev/pandas/issues/19531]
> NumPy pip package: https://github.com/numpy/numpy/issues/10519



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2114) [Python] Pull latest docker manylinux1 image

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2114:
--
Component/s: Python
 Packaging

> [Python] Pull latest docker manylinux1 image
> 
>
> Key: ARROW-2114
> URL: https://issues.apache.org/jira/browse/ARROW-2114
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2146) [GLib] Implement Slice for ChunkedArray

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2146:
--
Component/s: GLib

> [GLib] Implement Slice for ChunkedArray
> ---
>
> Key: ARROW-2146
> URL: https://issues.apache.org/jira/browse/ARROW-2146
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Yosuke Shiro
>Assignee: Yosuke Shiro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Add {{Slice}} api to ChunkedArray.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2143) [Python] Provide a manylinux1 wheel for cp27m

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2143:
--
Component/s: Python

> [Python] Provide a manylinux1 wheel for cp27m
> -
>
> Key: ARROW-2143
> URL: https://issues.apache.org/jira/browse/ARROW-2143
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only provide it for cp27mu, we should also build them for cp27m



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2163) Install apt dependencies separate from built-in Travis commands, retry on flakiness

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2163:
--
Component/s: Continuous Integration

> Install apt dependencies separate from built-in Travis commands, retry on 
> flakiness
> ---
>
> Key: ARROW-2163
> URL: https://issues.apache.org/jira/browse/ARROW-2163
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> This would also allow us to run the detect changes script earlier than 
> installing apt dependencies, so unnecessary builds will terminate faster



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2168) [C++] Build toolchain builds with jemalloc

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2168:
--
Component/s: C++

> [C++] Build toolchain builds with jemalloc
> --
>
> Key: ARROW-2168
> URL: https://issues.apache.org/jira/browse/ARROW-2168
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> We have fixed all known problems in the jemalloc 4.x branch and should be 
> able to gradually reactivate it in our builds to get its performance boost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2190) [GLib] Add add/remove field functions for RecordBatch.

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2190:
--
Component/s: GLib

> [GLib] Add add/remove field functions for RecordBatch.
> --
>
> Key: ARROW-2190
> URL: https://issues.apache.org/jira/browse/ARROW-2190
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Reporter: Yosuke Shiro
>Assignee: Yosuke Shiro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Add AddColumn and RemoveColumn api to RecordBatch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2179) [C++] arrow/util/io-util.h missing from libarrow-dev

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2179:
--
Component/s: C++

> [C++] arrow/util/io-util.h missing from libarrow-dev
> 
>
> Key: ARROW-2179
> URL: https://issues.apache.org/jira/browse/ARROW-2179
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Rares Vernica
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> {{arrow/util/io-util.h}} is missing from the {{libarow-dev}} package 
> (ubuntu/trusty): 
> {code:java}
> > ls -1 /usr/include/arrow/util/
> bit-stream-utils.h
> bit-util.h
> bpacking.h
> compiler-util.h
> compression.h
> compression_brotli.h
> compression_lz4.h
> compression_snappy.h
> compression_zlib.h
> compression_zstd.h
> cpu-info.h
> decimal.h
> hash-util.h
> hash.h
> key_value_metadata.h
> logging.h
> macros.h
> parallel.h
> rle-encoding.h
> sse-util.h
> stl.h
> type_traits.h
> variant
> variant.h
> visibility.h
> {code}
> {code:java}
> > apt-cache show libarrow-dev
> Package: libarrow-dev
> Architecture: amd64
> Version: 0.8.0-2
> Multi-Arch: same
> Priority: optional
> Section: libdevel
> Source: apache-arrow
> Maintainer: Kouhei Sutou 
> Installed-Size: 5696
> Depends: libarrow0 (= 0.8.0-2)
> Filename: pool/trusty/universe/a/apache-arrow/libarrow-dev_0.8.0-2_amd64.deb
> Size: 602716
> MD5sum: de5f2bfafd90ff29e4b192f4e5d26605
> SHA1: e3d9146b30f07c07b62f8bdf9f779d0ee5d05a75
> SHA256: 30a89b2ac6845998f22434e660b1a7c9d91dc8b2ba947e1f4333b3cf74c69982
> SHA512: 
> 99f511bee6645a68708848a58b4eba669a2ec8c45fb411c56ed2c920d3ff34552c77821eff7e428c886d16e450bdd25cc4e67597972f77a4255f302a56d1eac8
> Homepage: https://arrow.apache.org/
> Description: Apache Arrow is a data processing library for analysis
>  .
>  This package provides header files.
> Description-md5: e4855d5dbadacb872bf8c4ca67f624e3
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2185) Remove CI directives from squashed commit messages

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2185:
--
Component/s: Continuous Integration

> Remove CI directives from squashed commit messages
> --
>
> Key: ARROW-2185
> URL: https://issues.apache.org/jira/browse/ARROW-2185
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> In our PR squash tool, we are potentially picking up CI directives like 
> {{[skip appveyor]}} from intermediate commits. We should regex these away and 
> instead use directives in the PR title if we wish the commit to master to 
> behave in a certain way



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2183) [C++] Add helper CMake function for globbing the right header files

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2183:
--
Component/s: C++

> [C++] Add helper CMake function for globbing the right header files 
> 
>
> Key: ARROW-2183
> URL: https://issues.apache.org/jira/browse/ARROW-2183
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> Brought up by discussion in https://github.com/apache/arrow/pull/1631 on 
> ARROW-2179. We should collect header files but do not install ones containing 
> particular patterns for non-public headers, like {{-internal}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2203) [C++] StderrStream class

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2203:
--
Component/s: C++

> [C++] StderrStream class
> 
>
> Key: ARROW-2203
> URL: https://issues.apache.org/jira/browse/ARROW-2203
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Rares Vernica
>Assignee: Rares Vernica
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> The C++ API has support for reading and writing data from and to STDIN and 
> STDOUT. The classes are arrow::io::StdinStream and arrow::io::StdoutStream. 
> It some scenarios it might be useful to write data to STDERR. Adding a 
> StderrStream class should be a trivial addition given the StdoutStream class.
> If you think a StderrStream class is a good idea, I am more than happy to 
> submit a PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2216) [CI] CI descriptions and envars are misleading

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2216:
--
Component/s: Continuous Integration

> [CI] CI descriptions and envars are misleading
> --
>
> Key: ARROW-2216
> URL: https://issues.apache.org/jira/browse/ARROW-2216
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Phillip Cloud
>Assignee: Antoine Pitrou
>Priority: Minor
> Fix For: 0.12.0
>
>
> The descriptions of each of the CI builds are hard to decipher without 
> looking at the build scripts, which are themselves quite complex.
> For example in this job: https://travis-ci.org/apache/arrow/jobs/346309532 
> you can see that the envars {{CC}} and {{CXX}} are set to {{"clang-5.0"}} and 
> {{"clang++-5.0"}} respectively and they are then immediately set to {{gcc}} 
> and {{g++}}.'
> Without intimate knowledge of the script it's very hard to diagnose CI issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2212) [C++/Python] Build Protobuf in base manylinux 1 docker image

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2212:
--
Component/s: Python
 Packaging

> [C++/Python] Build Protobuf in base manylinux 1 docker image
> 
>
> Key: ARROW-2212
> URL: https://issues.apache.org/jira/browse/ARROW-2212
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> This should cut down the build times of the {{manylinux1}} CI matrix entry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2268) Remove MD5 checksums from release process

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2268:
--
Component/s: Developer Tools

> Remove MD5 checksums from release process
> -
>
> Key: ARROW-2268
> URL: https://issues.apache.org/jira/browse/ARROW-2268
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> The ASF has changed its release policy for signatures and checksums to 
> contraindicate the use of MD5 checksums: 
> http://www.apache.org/dev/release-distribution#sigs-and-sums. We should 
> remove this from our various release scripts prior to the 0.9.0 release



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2650) [JS] Finish implementing Unions

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2650:
--
Component/s: JavaScript

> [JS] Finish implementing Unions
> ---
>
> Key: ARROW-2650
> URL: https://issues.apache.org/jira/browse/ARROW-2650
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Finish implementing Unions in JS and add to integration tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2329) [Website]: 0.9.0 release update

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2329:
--
Component/s: Website

> [Website]: 0.9.0 release update
> ---
>
> Key: ARROW-2329
> URL: https://issues.apache.org/jira/browse/ARROW-2329
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2476) [Python/Question] Maximum length of an Array created from ndarray

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2476:
--
Component/s: Python

> [Python/Question] Maximum length of an Array created from ndarray
> -
>
> Key: ARROW-2476
> URL: https://issues.apache.org/jira/browse/ARROW-2476
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Krisztian Szucs
>Priority: Minor
> Fix For: 0.12.0
>
>
> So the format 
> [describes|https://github.com/apache/arrow/blob/master/format/Layout.md#array-lengths]
>  that an array max length is 2^31 - 1, however the following python snippet 
> creates a 2**32 length arrow array:
> {code:python}
> a = np.ones((2**32,), dtype='int8')
> A = pa.Array.from_pandas(a)
> type(A)
> {code}
> {code}pyarrow.lib.Int8Array{code}
> Based the layout specification I'd expect a ChunkedArray of three Int8Array's 
> with lengths:
> [2^31 - 1, 2^31 - 1, 2] or should raise an exception?
> If it's the expectation is there any documentation for it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2803) [C++] Put hashing function into src/arrow/util

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2803:
--
Component/s: C++

> [C++] Put hashing function into src/arrow/util
> --
>
> Key: ARROW-2803
> URL: https://issues.apache.org/jira/browse/ARROW-2803
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Philipp Moritz
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: easytask
> Fix For: 0.12.0
>
>
> See [https://github.com/apache/arrow/pull/2220]
> We should decide what our default go-to hash function should be (maybe 
> murmur3?) and put it into src/arrow/util



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3199) [Plasma] Check for EAGAIN in recvmsg and sendmsg

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3199:
--
Component/s: C++ - Plasma

> [Plasma] Check for EAGAIN in recvmsg and sendmsg
> 
>
> Key: ARROW-3199
> URL: https://issues.apache.org/jira/browse/ARROW-3199
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> It turns out that 
> [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L63]
>  and probably also 
> [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L49]
>  can block and give an EAGAIN error.
> This was discovered during stress tests by https://github.com/stephanie-wang/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3070) [Release] Host binary artifacts for RCs and releases on ASF Bintray account instead of dist/mirror system

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3070:
--
Component/s: Developer Tools

> [Release] Host binary artifacts for RCs and releases on ASF Bintray account 
> instead of dist/mirror system
> -
>
> Key: ARROW-3070
> URL: https://issues.apache.org/jira/browse/ARROW-3070
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Sutou Kouhei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Since the artifacts are large this is a better place for them. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2953) [Plasma] Store memory usage

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2953:
--
Component/s: C++ - Plasma

> [Plasma] Store memory usage
> ---
>
> Key: ARROW-2953
> URL: https://issues.apache.org/jira/browse/ARROW-2953
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While doing some memory profiling on the store, it became clear that at the 
> moment the metadata of the objects takes up much more space than it should. 
> In particular, for each object:
>  * The object id (20 bytes) is stored three times
>  * The object checksum (8 bytes) is stored twice
> We can therefore significantly reduce the metadata overhead with some 
> refactoring.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3467) Building against external double conversion is broken

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3467:
--
Component/s: C++

> Building against external double conversion is broken
> -
>
> Key: ARROW-3467
> URL: https://issues.apache.org/jira/browse/ARROW-3467
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.11.0
>Reporter: Dmitry Kalinkin
>Assignee: Dmitry Kalinkin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> double-conversion 3.1.1 defines double-conversion::double-conversion target 
> instead of double-conversion [1]. So the build fails with:
> {noformat}
> CMake Error at cmake_modules/BuildUtils.cmake:98 (message):
>   No static or shared library provided for double-conversion
> Call Stack (most recent call first):
>   cmake_modules/ThirdpartyToolchain.cmake:476 (ADD_THIRDPARTY_LIB)
>   CMakeLists.txt:386 (include)
> {noformat}
> [1] 
> https://github.com/google/double-conversion/commit/e13e72e17692f5dc0036460d734c637b563f3ac7#diff-af3b638bc2a3e6c650974192a53c7291R57



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3551) Change MapD to OmniSci on Powered By page

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3551:
--
Component/s: Website

> Change MapD to OmniSci on Powered By page
> -
>
> Key: ARROW-3551
> URL: https://issues.apache.org/jira/browse/ARROW-3551
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Todd Mostak
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> MapD recently changed its name to OmniSci. We should update the Powered By 
> page to reflect this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3504) [Plasma] Add support for Plasma Client to put/get raw bytes without pyarrow serialization.

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3504:
--
Component/s: C++ - Plasma

> [Plasma] Add support for Plasma Client to put/get raw bytes without pyarrow 
> serialization.
> --
>
> Key: ARROW-3504
> URL: https://issues.apache.org/jira/browse/ARROW-3504
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Plasma
>Reporter: Yuhong Guo
>Assignee: Yuhong Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is a feature enables Java Client to read data that python client puts .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3527) [R] Unused variables in R-package C++ code

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3527:
--
Component/s: R

> [R] Unused variables in R-package C++ code
> --
>
> Key: ARROW-3527
> URL: https://issues.apache.org/jira/browse/ARROW-3527
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: James Lamb
>Assignee: James Lamb
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Tonight I noticed a few "unused variable" compiler warnings tonight while 
> building the arrow R package.
>  
> {code:java}
> DataType.cpp:118:7: warning: unused variable 'n' [-Wunused-variable]
> int n = x.size();
> RecordBatch.cpp:132:7: warning: unused variable 'nc' [-Wunused-variable]
> int nc = tbl.size();
> {code}
> Creating this issue to accompany the PR I'll submit to propose removing these 
> calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3489) [Gandiva] Support for in expressions

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3489:
--
Component/s: C++ - Gandiva

> [Gandiva] Support for in expressions
> 
>
> Key: ARROW-3489
> URL: https://issues.apache.org/jira/browse/ARROW-3489
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: gandiva, pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Add support for in-expressions to gandiva.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3528) [R] Typo in R documentation

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3528:
--
Component/s: R

> [R] Typo in R documentation
> ---
>
> Key: ARROW-3528
> URL: https://issues.apache.org/jira/browse/ARROW-3528
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: James Lamb
>Assignee: James Lamb
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There is a typo in the R-package documentation.
>  
> *"ordred" --> "ordered"*
>  
> Just creating the story here to accompany a pending PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3515) Introduce NumericTensor class

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3515:
--
Component/s: C++

> Introduce NumericTensor class
> -
>
> Key: ARROW-3515
> URL: https://issues.apache.org/jira/browse/ARROW-3515
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/arrow/pull/2759]
> This commit defines the new NumericTensor class as a subclass of Tensor 
> class. NumericTensor extends Tensor class by adding a member function to 
> access element values in a tensor.
> I want to use this new feature for writing tests of SparseTensor in 
> [#2546|https://github.com/apache/arrow/pull/2546].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3746:
--
Component/s: Python
 C++ - Gandiva

> [Gandiva] [Python] Make it possible to list all functions registered with 
> Gandiva
> -
>
> Key: ARROW-3746
> URL: https://issues.apache.org/jira/browse/ARROW-3746
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva, Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This will also be useful for documentation purposes (right now it is not very 
> easy to get a list of all the functions that are registered).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2835) [C++] ReadAt/WriteAt are inconsistent with moving the files position

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2835:
-

Assignee: Antoine Pitrou

> [C++] ReadAt/WriteAt are inconsistent with moving the files position
> 
>
> Key: ARROW-2835
> URL: https://issues.apache.org/jira/browse/ARROW-2835
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Dimitri Vorona
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Right now, there is inconsistent behaviour regarding moving the files 
> position pointer after calling ReadAt or WriteAt. For example, the default 
> implementation of ReadAt seeks to the desired offset and calls Read which 
> moves the position pointer. MemoryMappedFile::ReadAt, however, doesn't change 
> the position. WriteableFile::WriteAt seem to move the position in the current 
> implementation, but there is no docstring which prescribes this behaviour.
> Antoine suggested that *At methods shouldn't touch the position and it makes 
> more sense, IMHO. The change isn't huge and doesn't seem to break anything 
> internally, but it might break the existing user code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2835) [C++] ReadAt/WriteAt are inconsistent with moving the files position

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2835.
---
Resolution: Fixed

Issue resolved by pull request 4417
[https://github.com/apache/arrow/pull/4417]

> [C++] ReadAt/WriteAt are inconsistent with moving the files position
> 
>
> Key: ARROW-2835
> URL: https://issues.apache.org/jira/browse/ARROW-2835
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Dimitri Vorona
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Right now, there is inconsistent behaviour regarding moving the files 
> position pointer after calling ReadAt or WriteAt. For example, the default 
> implementation of ReadAt seeks to the desired offset and calls Read which 
> moves the position pointer. MemoryMappedFile::ReadAt, however, doesn't change 
> the position. WriteableFile::WriteAt seem to move the position in the current 
> implementation, but there is no docstring which prescribes this behaviour.
> Antoine suggested that *At methods shouldn't touch the position and it makes 
> more sense, IMHO. The change isn't huge and doesn't seem to break anything 
> internally, but it might break the existing user code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3662) [C++] Add a const overload to MemoryMappedFile::GetSize

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3662:
--
Component/s: C++

> [C++] Add a const overload to MemoryMappedFile::GetSize
> ---
>
> Key: ARROW-3662
> URL: https://issues.apache.org/jira/browse/ARROW-3662
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Dimitri Vorona
>Assignee: Dimitri Vorona
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  
> While GetSize in general is not a const function, it can be on a 
> MemoryMappedFile. I propose to add a const override directly to the 
> MemoryMappedFile.
> Alternatively we could add a const version on the RandomAccessFile level 
> which would fail, if a const size getting (e.g. without a seek) isn't 
> possible, but it seems to me to be a potential source of hard-to-debug bugs 
> and spurious failures. At would at least require a careful analysis of the 
> platform support of different size getting options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3576) [Python] Expose compressed file readers as NativeFile

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3576:
--
Component/s: Python

> [Python] Expose compressed file readers as NativeFile
> -
>
> Key: ARROW-3576
> URL: https://issues.apache.org/jira/browse/ARROW-3576
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3664) [Rust] Add benchmark for PrimitiveArrayBuilder

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3664:
--
Component/s: Rust

> [Rust] Add benchmark for PrimitiveArrayBuilder
> --
>
> Key: ARROW-3664
> URL: https://issues.apache.org/jira/browse/ARROW-3664
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should add a benchmark for the {{PrimitiveArrayBuilder}} to measure and 
> track its performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3569) [Packaging] Run pyarrow unittests when building conda package

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3569:
--
Component/s: Packaging

> [Packaging] Run pyarrow unittests when building conda package
> -
>
> Key: ARROW-3569
> URL: https://issues.apache.org/jira/browse/ARROW-3569
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3667) [JS] Incorrectly reads record batches with an all null column

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3667:
--
Component/s: JavaScript

> [JS] Incorrectly reads record batches with an all null column
> -
>
> Key: ARROW-3667
> URL: https://issues.apache.org/jira/browse/ARROW-3667
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.3.1
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> The JS library seems to incorrectly read any columns that come after an 
> all-null column in IPC buffers produced by pyarrow.
> Here's a python script that generates two arrow buffers, one with an all-null 
> column followed by a utf-8 column, and a second with those two reversed
> {code:python}
> import pyarrow as pa
> import pandas as pd
> def serialize_to_arrow(df, fd, compress=True):
>   batch = pa.RecordBatch.from_pandas(df)
>   writer = pa.RecordBatchFileWriter(fd, batch.schema)
>   writer.write_batch(batch)
>   writer.close()
> if __name__ == "__main__":
> df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', 
> 'def', 'ghi']}, columns=['nulls', 'not nulls'])
> with open('bad.arrow', 'wb') as fd:
> serialize_to_arrow(df, fd)
> df = pd.DataFrame(df, columns=['not nulls', 'nulls'])
> with open('good.arrow', 'wb') as fd:
> serialize_to_arrow(df, fd)
> {code}
> JS incorrectly interprets the [null, not null] case:
> {code:javascript}
> > var arrow = require('apache-arrow')
> undefined
> > var fs = require('fs')
> undefined
> > arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not 
> > nulls').get(0)
> 'abc'
> > arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0)
> '\u\u\u\u\u0003\u\u\u\u0006\u\u\u\t\u\u\u'
> {code}
> Presumably this is because pyarrow is omitting some (or all) of the buffers 
> associated with the all-null column, but the JS IPC reader is still looking 
> for them, causing the buffer count to get out of sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3555) [Plasma] Unify plasma client get function using metadata.

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3555:
--
Component/s: C++ - Plasma

> [Plasma] Unify plasma client get function using metadata.
> -
>
> Key: ARROW-3555
> URL: https://issues.apache.org/jira/browse/ARROW-3555
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Plasma
>Reporter: Yuhong Guo
>Assignee: Yuhong Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Sometimes, it is very hard for the data consumer to know whether an object is 
> a buffer or other objects. If we use try-catch to catch the pyarrow 
> deserialization exception and then using `plasma_client.get_buffer`, the code 
> is not clean.
> We may leverage the metadata which is not used at all to mark the buffer 
> data. In the client of other language, this would be simple to implement. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3586) [Python] Segmentation fault when converting empty table to pandas with categoricals

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3586:
--
Component/s: Python

> [Python] Segmentation fault when converting empty table to pandas with 
> categoricals
> ---
>
> Key: ARROW-3586
> URL: https://issues.apache.org/jira/browse/ARROW-3586
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.10.0, 0.11.0
> Environment: - Ubuntu 16.04, Python 2.7.12, pyarrow 0.11.0, pandas 
> 0.23.4
> - Debian9, Python 2.7.13, pyarrow 0.10.0, pandas 0.23.4
>Reporter: Andreas
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java}
> import pyarrow as pa
> table = pa.Table.from_arrays(arrays=[pa.array([], type=pa.int32())], 
> names=['col'])
> table.to_pandas(categories=['col']){code}
> This produces a segmentation fault for certain types (e.g, int\{32,64}) while 
> it works for others (e.g. string, binary).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3566) Clarify that the type of dictionary encoded field should be the encoded(index) type

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3566:
--
Component/s: Format

> Clarify that the type of dictionary encoded field should be the 
> encoded(index) type
> ---
>
> Key: ARROW-3566
> URL: https://issues.apache.org/jira/browse/ARROW-3566
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Li Jin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3721) [Gandiva] [Python] Support all Gandiva literals

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3721:
--
Component/s: C++ - Gandiva

> [Gandiva] [Python] Support all Gandiva literals
> ---
>
> Key: ARROW-3721
> URL: https://issues.apache.org/jira/browse/ARROW-3721
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Support all the literals from 
> [https://github.com/apache/arrow/blob/5b116ab175292fe70ed3c8727bcc6868b9695f4a/cpp/src/gandiva/tree_expr_builder.h#L35]
>  in the Cython bindings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3797) [Rust] BinaryArray::value_offset incorrect in offset case

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3797:
--
Component/s: Rust

> [Rust] BinaryArray::value_offset incorrect in offset case
> -
>
> Key: ARROW-3797
> URL: https://issues.apache.org/jira/browse/ARROW-3797
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Brent Kerby
>Assignee: Brent Kerby
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>   Original Estimate: 5m
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The method BinaryArray::value_offset does not take into account the offset in 
> the underlying ArrayData; hence it gives incorrect results when the ArrayData 
> offset is not zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3859) [Java] Fix ComplexWriter backward incompatible change

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3859:
--
Component/s: Java

> [Java] Fix ComplexWriter backward incompatible change
> -
>
> Key: ARROW-3859
> URL: https://issues.apache.org/jira/browse/ARROW-3859
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This commit 
> [https://github.com/apache/arrow/commit/a56c009257a71979d5ed0b021197c7a9d5ed5021]
>  changed the default behavior for some of the methods to be non-backward 
> compatible.
> Will raise the PR to revert it to previous behavior while adhering to check 
> style guidelines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3860) [Gandiva] [C++] Add option to use -static-libstdc++ when building libgandiva_jni.so

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3860:
--
Component/s: C++ - Gandiva

> [Gandiva] [C++] Add option to use -static-libstdc++ when building 
> libgandiva_jni.so
> ---
>
> Key: ARROW-3860
> URL: https://issues.apache.org/jira/browse/ARROW-3860
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Praveen Kumar Desabandu
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This 
> [commit|https://github.com/apache/arrow/commit/ba2b2ea2301f067cc95306e11546ddb6d402a55c#diff-d5e5df5984ba660e999a7c657039f6af]
>  broke gandiva packaging by removing static linking of std c++, since dremio 
> consumes a fat jar that includes packaged gandiva native libraries we would 
> need to statically link std c++
> As suggested in the commit message will re-introduce it as a CMake Flag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3891) [Java] Remove Long.bitCount with simple bitmap operations

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3891:
--
Component/s: Java

> [Java] Remove Long.bitCount with simple bitmap operations
> -
>
> Key: ARROW-3891
> URL: https://issues.apache.org/jira/browse/ARROW-3891
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Animesh Trivedi
>Assignee: Animesh Trivedi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> the `public int isSet(int index)` routine checks if the bit is set by calling 
> Long.bitCount function. This is unnecessary and creates performance 
> degradation. This can simply be replaced by bit shift and bitwise & 
> operation. 
> `return Long.bitCount(b & (1L << bitIndex));`
> to 
> `return (b >> bitIndex) & 0x01;` 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3878) [Rust] Improve primitive types

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3878:
--
Component/s: Rust

> [Rust] Improve primitive types 
> ---
>
> Key: ARROW-3878
> URL: https://issues.apache.org/jira/browse/ARROW-3878
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently we simply uses Rust's native types as primitive types, and relies 
> on macros such as 
> [this|https://github.com/apache/arrow/blob/master/rust/src/array.rs#L298] to 
> link the Arrow data type with the native type. A better approach may be to 
> define richer primitive types which contain both the Arrow type and the Rust 
> native type, as well as other information such as type's bit width, 
> precision, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3936) Add _O_NOINHERIT to the file open flags on Windows

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3936:
--
Component/s: C++

> Add _O_NOINHERIT to the file open flags on Windows
> --
>
> Key: ARROW-3936
> URL: https://issues.apache.org/jira/browse/ARROW-3936
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Philip Felton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Unlike Linux, Windows doesn't let you delete files that are currently opened 
> by another process. So if you create a child process while a Parquet file is 
> open, with the current code the file handle is inherited to the child 
> process, and the parent process can't then delete the file after closing it 
> without the child process terminating first.
> By default, Win32 file handles are not inheritable (likely because of the 
> aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX 
> compatibility.
> This is a serious problem for us.
> We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path 
> is a sensible approach and would likely be the correct behaviour as it 
> matches the main Win32 API.
> However, it could be that some developers rely on the current inheritable 
> behaviour. In which case, the Arrow public API should take a boolean argument 
> on whether the created file descriptor should be inheritable. But this would 
> break API backward compatibility (unless a new overloaded method is 
> introduced).
> Is forking and inheriting Arrow internal file descriptor something that Arrow 
> actually means to support?
> See [https://github.com/apache/arrow/pull/3085.] What do we think of the 
> proposed fix?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3934) [Gandiva] Don't compile precompiled tests if ARROW_GANDIVA_BUILD_TESTS=off

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3934:
--
Component/s: C++ - Gandiva

> [Gandiva] Don't compile precompiled tests if ARROW_GANDIVA_BUILD_TESTS=off
> --
>
> Key: ARROW-3934
> URL: https://issues.apache.org/jira/browse/ARROW-3934
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently the precompiled tests are compiled in any case, even if 
> ARROW_GANDIVA_BUILD_TESTS=off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3950) [Plasma] Don't force loading the TensorFlow op on import

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3950:
--
Component/s: Python

> [Plasma] Don't force loading the TensorFlow op on import
> 
>
> Key: ARROW-3950
> URL: https://issues.apache.org/jira/browse/ARROW-3950
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In certain situation, users want more control over when the TensorFlow op is 
> loaded, so we should make it optional (even if it exists). This happens in 
> Ray for example, where we need to make sure that if multiple python workers 
> try to compile and import the TensorFlow op in parallel, there is no race 
> condition (e.g. one worker could try to import a half-built version of the 
> op).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3970) [Gandiva][C++] Remove unnecessary boost dependencies

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3970:
--
Component/s: C++ - Gandiva

> [Gandiva][C++] Remove unnecessary boost dependencies
> 
>
> Key: ARROW-3970
> URL: https://issues.apache.org/jira/browse/ARROW-3970
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Affects Versions: 0.12.0
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Remove unnecessary dynamic dependencies on Boost since we are anyway using 
> the static versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3983) [Gandiva][Crossbow] Use static boost while packaging

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3983:
--
Component/s: C++ - Gandiva

> [Gandiva][Crossbow] Use static boost while packaging
> 
>
> Key: ARROW-3983
> URL: https://issues.apache.org/jira/browse/ARROW-3983
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Gandiva is getting some transitive dependencies to Boost from Arrow. Since we 
> are using the static version of arrow in the packaged gandiva library, it was 
> thought that we would be using the static versions of boost.
> This holds true in linux where there is no dependency on shared arrow 
> library, but in mac there seems to be a dependency on shared boost libraries 
> even for the static arrow library.
> So using "ARROW_BOOST_USE_SHARED" to force use the boost static libraries 
> while packaging Gandiva in Crossbow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4006) Add CODE_OF_CONDUCT.md

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4006:
--
Component/s: Documentation

> Add CODE_OF_CONDUCT.md
> --
>
> Key: ARROW-4006
> URL: https://issues.apache.org/jira/browse/ARROW-4006
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Apache Software Foundation has a code of conduct that applies to its 
> projects
> https://www.apache.org/foundation/policies/conduct.html
> We should add a document to the root of the git repository to direct 
> interested individuals to the CoC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4114) [C++][DOCUMENTATION] Add "python" to Linux build instructions

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4114:
--
Component/s: Documentation
 C++

> [C++][DOCUMENTATION] Add "python" to Linux build instructions
> -
>
> Key: ARROW-4114
> URL: https://issues.apache.org/jira/browse/ARROW-4114
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Documentation
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> make unittest step in the C++ README.md do not work on  fresh ubuntu image 
> without python installed.
> {{Error message from the ctest --output-on-failure indicates it is trying to 
> find python:}}
> {{
> }}{{Running arrow-allocator-test, redirecting output into 
> /home/micahk/arrow/cpp/debug/build/test-logs/arrow-allocator-test.txt 
> (attempt 1/1)}}{{/usr/bin/env: ‘python’: No such file or directory}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4102) [C++] FixedSizeBinary identity cast not implemented

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4102:
--
Component/s: C++

> [C++] FixedSizeBinary identity cast not implemented
> ---
>
> Key: ARROW-4102
> URL: https://issues.apache.org/jira/browse/ARROW-4102
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4100) [Gandiva][C++] Fix regex to ignore "." character

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4100:
--
Component/s: C++ - Gandiva

> [Gandiva][C++] Fix regex to ignore "." character
> 
>
> Key: ARROW-4100
> URL: https://issues.apache.org/jira/browse/ARROW-4100
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4043) [Packaging/Docker] Python tests on alpine miss pytest dependency

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4043:
--
Component/s: Packaging

> [Packaging/Docker] Python tests on alpine miss pytest dependency
> 
>
> Key: ARROW-4043
> URL: https://issues.apache.org/jira/browse/ARROW-4043
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:java}
> Using /usr/lib/python2.7/site-packages
> Searching for numpy==1.15.4
> Best match: numpy 1.15.4
> Adding numpy 1.15.4 to easy-install.pth file
> Using /usr/lib/python2.7/site-packages
> Finished processing dependencies for pyarrow==0.11.1.dev385+g9c8ddae1
> /
> /bin/sh: pytest: not found
> The command "docker-compose run python-alpine" exited with 127.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4130) [Go] offset not used when accessing binary array

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4130:
--
Component/s: Go

> [Go] offset not used when accessing binary array
> 
>
> Key: ARROW-4130
> URL: https://issues.apache.org/jira/browse/ARROW-4130
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Reporter: Joshua Lapacik
>Assignee: Joshua Lapacik
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When accessing a binary array, the offset of the underlying data buffer is 
> not used. This affects the behavior of slicing. See 
> [https://github.com/apache/arrow/issues/3270] .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4266) [Python][CI] Disable ORC tests in dask integration test

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4266:
--
Component/s: Python
 Continuous Integration

> [Python][CI] Disable ORC tests in dask integration test
> ---
>
> Key: ARROW-4266
> URL: https://issues.apache.org/jira/browse/ARROW-4266
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/ARROW-3910 changed the default value of 
> to_pandas: to_pandas(date_as_object=True) which breaks dask's ORC tests 
> [https://github.com/dask/dask/blob/e48aca49af9005c938ff4773aa05ca8b20e2e1b1/dask/dataframe/io/orc.py#L19]
>  
> cc [~mrocklin]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4269) [Python] AttributeError: module 'pandas.core' has no attribute 'arrays'

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4269:
--
Component/s: Python

> [Python] AttributeError: module 'pandas.core' has no attribute 'arrays'
> ---
>
> Key: ARROW-4269
> URL: https://issues.apache.org/jira/browse/ARROW-4269
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This happens with pandas 0.22:
> ```
> In [1]: import pyarrow
> ---
> AttributeError Traceback (most recent call last)
>  in ()
> > 1 import pyarrow
> ~/arrow/python/pyarrow/__init__.py in ()
>  174 localfs = LocalFileSystem.get_instance()
>  175 
> --> 176 from pyarrow.serialization import (default_serialization_context,
>  177 register_default_serialization_handlers,
>  178 register_torch_serialization_handlers)
> ~/arrow/python/pyarrow/serialization.py in ()
>  303 
>  304 
> --> 305 
> register_default_serialization_handlers(_default_serialization_context)
> ~/arrow/python/pyarrow/serialization.py in 
> register_default_serialization_handlers(serialization_context)
>  294 custom_deserializer=_deserialize_pyarrow_table)
>  295 
> --> 296 _register_custom_pandas_handlers(serialization_context)
>  297 
>  298
> ~/arrow/python/pyarrow/serialization.py in 
> _register_custom_pandas_handlers(context)
>  175 custom_deserializer=_load_pickle_from_buffer)
>  176 
> --> 177 if hasattr(pd.core.arrays, 'interval'):
>  178 context.register_type(
>  179 pd.core.arrays.interval.IntervalArray,
> AttributeError: module 'pandas.core' has no attribute 'arrays'
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4197) [C++] Emscripten compiler fails building Arrow

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4197:
--
Component/s: C++

> [C++] Emscripten compiler fails building Arrow
> --
>
> Key: ARROW-4197
> URL: https://issues.apache.org/jira/browse/ARROW-4197
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: OS X
>Reporter: Timothy Paine
>Assignee: Timothy Paine
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The emscripten compiler ([https://kripken.github.io/emscripten-site/)] fails 
> when compiling arrow with a few relatively minor issues:
>  
>  * there is no -ggdb flag for debug support, only -g
>  * there is no execinfo.h, so even if Backtrace is found it cannot be used
>  * when using the emscripten compiler, even on mac, you cannot pass the 
> -undefined dynamic_lookup argument



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4209) [Gandiva] returning IR structs causes issues with windows

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4209:
--
Component/s: C++ - Gandiva

> [Gandiva] returning IR structs causes issues with windows
> -
>
> Key: ARROW-4209
> URL: https://issues.apache.org/jira/browse/ARROW-4209
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The decimal add fn return a struct (of high/low values). This is known to be 
> fragile, due to abi compatibility issues. so, fixing this to switch to 
> primitive types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4237) [Packaging] Fix CMAKE_INSTALL_LIBDIR in release verification script

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4237:
--
Component/s: Packaging

> [Packaging] Fix CMAKE_INSTALL_LIBDIR in release verification script
> ---
>
> Key: ARROW-4237
> URL: https://issues.apache.org/jira/browse/ARROW-4237
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Set to
> {{-DCMAKE_INSTALL_LIBDIR=lib}}
> instead of
> {{-DCMAKE_INSTALL_LIBDIR=$ARROW_HOME/lib}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4156) [C++] xcodebuild failure for cmake generated project

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4156:
--
Component/s: C++

> [C++] xcodebuild failure for cmake generated project
> 
>
> Key: ARROW-4156
> URL: https://issues.apache.org/jira/browse/ARROW-4156
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Hatem Helal
>Assignee: Uwe L. Korn
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
> Attachments: cmakeoutput.txt, xcodebuildOutput.txt
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Using the cmake xcode project generator fails to build using xcodebuild as 
> follows:
> {code:java}
> $ cmake .. -G Xcode -DARROW_PARQUET=ON  -DPARQUET_BUILD_EXECUTABLES=ON 
> -DPARQUET_BUILD_EXAMPLES=ON  
> -DFLATBUFFERS_HOME=/usr/local/Cellar/flatbuffers/1.10.0 
> -DCMAKE_BUILD_TYPE=Debug  -DTHRIFT_HOME=/usr/local/Cellar/thrift/0.11.0 
> -DARROW_EXTRA_ERROR_CONTEXT=ON -DARROW_BUILD_TESTS=ON 
> -DClangTools_PATH=/usr/local/Cellar/llvm@6/6.0.1_1
> 
> Libtool 
> xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a
>  normal x86_64
> cd /Users/hhelal/Documents/code/arrow/cpp
> export MACOSX_DEPLOYMENT_TARGET=10.14
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool
>  -static -arch_only x86_64 -syslibroot 
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk
>  
> -L/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal
>  -filelist 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/x86_64/arrow_objlib.LinkFileList
>  -o 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh
> cd /Users/hhelal/Documents/code/arrow/cpp
> /bin/sh -c 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh
> echo "Depend check for xcode"
> Depend check for xcode
> cd /Users/hhelal/Documents/code/arrow/cpp/xcode-build && make -C 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/CMakeScripts/XCODE_DEPEND_HELPER.make
>  PostBuild.arrow_objlib.Debug
> /bin/rm -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib
> /bin/rm -f 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.a
> === BUILD TARGET arrow_shared OF PROJECT arrow WITH THE DEFAULT CONFIGURATION 
> (Debug) ===
> Check dependencies
> Write auxiliary files
> write-file 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> chmod 0755 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> cd /Users/hhelal/Documents/code/arrow/cpp
> /bin/sh -c 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> echo "Creating symlinks"
> Creating symlinks
> /usr/local/Cellar/cmake/3.12.4/bin/cmake -E cmake_symlink_library 
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.0.0.dylib
>  
> /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.dylib
>  /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib
> CMake Error: cmake_symlink_library: System Error: No such file or directory
> CMake Error: cmake_symlink_library: System Error: No such file or directory
> make: *** [arrow_shared_buildpart_0] Error 1
> ** BUILD FAILED **
> The following build commands failed:
> PhaseScriptExecution CMake\ PostBuild\ Rules 
> xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh
> (1 failure)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4134) [Packaging] Properly setup timezone in docker tests to prevent ORC adapter's abort

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4134:
--
Component/s: Packaging

> [Packaging] Properly setup timezone in docker tests to prevent ORC adapter's 
> abort
> --
>
> Key: ARROW-4134
> URL: https://issues.apache.org/jira/browse/ARROW-4134
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4569) [Gandiva] validate that the precision/scale are within bounds

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4569:
--
Component/s: C++ - Gandiva

> [Gandiva] validate that the precision/scale are within bounds
> -
>
> Key: ARROW-4569
> URL: https://issues.apache.org/jira/browse/ARROW-4569
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4348) encountered error when building parquet

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4348:
--
Component/s: C++

> encountered error when building parquet
> ---
>
> Key: ARROW-4348
> URL: https://issues.apache.org/jira/browse/ARROW-4348
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: lei yu
>Priority: Major
>
> I am trying to build c++ libraries on Centos 7.5. parquet only. I followed 
> the instruction on github and did as below
>  
> {code:java}
> git clone https://github.com/apache/arrow.git
> cd arrow/cpp
> mkdir debug
> cd debug
> cmake .. -DARROW_PARQUET=ON -DARROW_OPTIONAL_INSTALL=ON
> make parquet
> {code}
>  
> I don't have Third party libraries installed on my box, so it tries to 
> download thirdparties in the building process but I got error after it says 
> that thrift has been downloaded and installed.
> {code:java}
> No rule to make target thrift_ep/src/thrift_ep-install/lib/libthriftd.a', 
> needed bysrc/parquet/parquet_types.cpp'. Stop{code}
> before the error, it says
> {code:java}
> [ 7%] Performing configure step for 'thrift_ep'
> -- thrift_ep configure command succeeded. See also 
> /home/ylei/development/third_party/parquet/arrow/arrow/cpp/debug/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure-.log
> [ 8%] Performing build step for 'thrift_ep'
> -- thrift_ep build command succeeded. See also 
> /home/ylei/development/third_party/parquet/arrow/arrow/cpp/debug/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build-.log
> [ 9%] Performing install step for 'thrift_ep'
> -- thrift_ep install command succeeded. See also 
> /home/ylei/development/third_party/parquet/arrow/arrow/cpp/debug/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-install-*.log
> [ 10%] Completed 'thrift_ep'
> [ 10%] Built target thrift_ep
> {code}
> I had to build thrift separately and then I can build parquet sucessfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4663) [Packaging] Conda-forge build misses gflags on linux

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4663:
--
Component/s: Packaging

> [Packaging] Conda-forge build misses gflags on linux
> 
>
> Key: ARROW-4663
> URL: https://issues.apache.org/jira/browse/ARROW-4663
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: ci-failure, pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> See build: https://travis-ci.org/kszucs/crossbow/builds/496958426



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-263) Design an initial IPC mechanism for Arrow Vectors

2019-06-03 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854525#comment-16854525
 ] 

Antoine Pitrou commented on ARROW-263:
--

Should this be kept open? It looks essentially like a brain dump and no 
discussion took place for the last 3 years.

> Design an initial IPC mechanism for Arrow Vectors
> -
>
> Key: ARROW-263
> URL: https://issues.apache.org/jira/browse/ARROW-263
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Micah Kornfield
>Priority: Major
>
> Prior discussion on this topic [1].
> Use-cases:
> 1.  User defined function (UDF) execution:  One process wants to execute a 
> user defined function written in another language (e.g. Java executing a 
> function defined in python, this involves creating Arrow Arrays in java, 
> sending them to python and receiving a new set of Arrow Arrays produced in 
> python back in the java process).
> 2.  If a storage system and a query engine are running on the same host we 
> might want use IPC instead of RPC (e.g. Apache Drill querying Apache Kudu)
> Assumptions:
> 1.  IPC mechanism should be useable from the core set of supported languages 
> (Java, Python, C) on POSIX and ideally windows systems.  Ideally, we would 
> not need to add dependencies on additional libraries outside of each 
> languages outside of this document.
> We want leverage shared memory for Arrays to avoid doubling RAM requirements 
> by duplicating the same Array in different memory locations.  
> 2. Under some circumstances shared memory might be more efficient than FIFOs 
> or sockets (in other scenarios they won’t see thread below).
> 3. Security is not a concern for V1, we assume all processes running are 
> “trusted”.
> Requirements:
> 1.Resource management: 
> a.  Both processes need a way of allocating memory for Arrow Arrays so 
> that data can be passed from one process to another.
> b. There must be a mechanism to cleanup unused Arrow Arrays to limit 
> resource usage but avoid race conditions when processing arrays
> 2.  Schema negotiation - before sending data, both processes need to agree on 
> schema each one will produce.
> Out of scope requirements:
> 1.  IPC channel metadata discovery is out of scope of this document.  
> Discovery can be provided by passing appropriate command line arguments, 
> configuration files or other mechanisms like RPC (in which case RPC channel 
> discovery is still an issue).
> [1] 
> http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c8d5f7e3237b3ed47b84cf187bb17b666148e7...@shsmsx103.ccr.corp.intel.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5316) [Rust] Interfaces for gandiva bindings.

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5316:
--
Component/s: Rust
 C++ - Gandiva

> [Rust] Interfaces for gandiva bindings.
> ---
>
> Key: ARROW-5316
> URL: https://issues.apache.org/jira/browse/ARROW-5316
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++ - Gandiva, Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>
> Create interfaces to demonstrate high level design and ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5315) [Rust] Gandiva binding.

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5315:
--
Component/s: Rust
 C++ - Gandiva

> [Rust] Gandiva binding.
> ---
>
> Key: ARROW-5315
> URL: https://issues.apache.org/jira/browse/ARROW-5315
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva, Rust
>Reporter: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Add gandiva binding for rust.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5224) [Java] Add APIs for supporting directly serialize/deserialize ValueVector

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5224:
--
Component/s: Java

> [Java] Add APIs for supporting directly serialize/deserialize ValueVector
> -
>
> Key: ARROW-5224
> URL: https://issues.apache.org/jira/browse/ARROW-5224
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> There is no API to directly serialize/deserialize ValueVector. The only way 
> to implement this is to put a single FieldVector in VectorSchemaRoot and 
> convert it to ArrowRecordBatch, and the deserialize process is as well. 
> Provide a utility class to implement this may be better, I know all 
> serializations should follow IPC format so that data can be shared between 
> different Arrow implementations. But for users who only use Java API and want 
> to do some further optimization, this seem to be no problem and we could 
> provide them a more option.
> This may take some benefits for Java user who only use ValueVector rather 
> than IPC series classes such as ArrowReordBatch:
>  * We could do some shuffle optimization such as compression and some 
> encoding algorithm for numerical type which could greatly improve performance.
>  * Do serialize/deserialize with the actual buffer size within vector since 
> the buffer size is power of 2 which is actually bigger than it really need.
>  * Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it 
> user-friendly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5259) [Java] Add option for ValueVector to allocate buffers with actual size

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5259:
--
Summary: [Java] Add option for ValueVector to allocate buffers with actual 
size  (was: Add option for ValueVector to allocate buffers with actual size)

> [Java] Add option for ValueVector to allocate buffers with actual size
> --
>
> Key: ARROW-5259
> URL: https://issues.apache.org/jira/browse/ARROW-5259
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>
> Currently in _BaseValueVector#computeCombinedBufferSize_, it calculates the 
> buffer size with _valueCount_ and _typeWidth_ as inputs and then allocates 
> memory for dataBuffer and validityBuffer. However, it always allocate memory 
> greater than the actual size, because of the invoke of 
> _BaseAllocator.nextPowerOfTwo(bufferSize)_.
> For example, IntVector will allocate buffers with size 8192 with valueCount = 
> 1025, memory usage is almost double what it actually is. So in some cases, 
> there have enough memory for actual use but throws OOM when the allocated 
> memory is increased to next power of 2 and I think this problem is absolutely 
> avoidable.
> Is it feasible to add option for ValueVector to allocate actual buffer size 
> rather than make it next power of 2 to reduce memory allocation?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5278) [C#] ArrowBuffer should either implement IEquatable correctly or not at all

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5278:
--
Component/s: C#

> [C#] ArrowBuffer should either implement IEquatable correctly or not at all
> ---
>
> Key: ARROW-5278
> URL: https://issues.apache.org/jira/browse/ARROW-5278
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#
>Reporter: Eric Erhardt
>Priority: Major
>
> See the discussion 
> [here|https://github.com/apache/arrow/pull/3925/#discussion_r281378027].
> ArrowBuffer currently implement IEquatable, but doesn't override 
> `GetHashCode`.
> We should either implement IEquatable correctly by overriding Equals and 
> GetHashCode, or remove IEquatable all together.
> Looking at ArrowBuffer's [Equals 
> implementation|https://github.com/apache/arrow/blob/08829248fd540b7e3bd96b980e357f8a4db7970e/csharp/src/Apache.Arrow/ArrowBuffer.cs#L66-L69],
>  it compares each value in the buffer, which is not very efficient. Also, 
> this implementation is not consistent with how `Memory` implements 
> IEquatable - 
> [https://source.dot.net/#System.Private.CoreLib/shared/System/Memory.cs,500].
> If we continue implementing IEquatable on ArrowBuffer, we should consider 
> implementing it in the same fashion as Memory does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5259) Add option for ValueVector to allocate buffers with actual size

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5259:
--
Component/s: Java

> Add option for ValueVector to allocate buffers with actual size
> ---
>
> Key: ARROW-5259
> URL: https://issues.apache.org/jira/browse/ARROW-5259
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>
> Currently in _BaseValueVector#computeCombinedBufferSize_, it calculates the 
> buffer size with _valueCount_ and _typeWidth_ as inputs and then allocates 
> memory for dataBuffer and validityBuffer. However, it always allocate memory 
> greater than the actual size, because of the invoke of 
> _BaseAllocator.nextPowerOfTwo(bufferSize)_.
> For example, IntVector will allocate buffers with size 8192 with valueCount = 
> 1025, memory usage is almost double what it actually is. So in some cases, 
> there have enough memory for actual use but throws OOM when the allocated 
> memory is increased to next power of 2 and I think this problem is absolutely 
> avoidable.
> Is it feasible to add option for ValueVector to allocate actual buffer size 
> rather than make it next power of 2 to reduce memory allocation?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5324) [Plasma] API requests

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5324:
--
Summary: [Plasma] API requests  (was: plasma API requests)

> [Plasma] API requests
> -
>
> Key: ARROW-5324
> URL: https://issues.apache.org/jira/browse/ARROW-5324
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Darren Weber
>Priority: Minor
>
> Copied from [https://github.com/apache/arrow/issues/4318] (it's easier to 
> read there, sorry hate Jira formatting)
> Related to https://issues.apache.org/jira/browse/ARROW-3444 
> While working with the plasma API to create/seal an object for a table, using 
> a custom object-ID, it would help to have a convenience API to get the size 
> of the table.
> The following code might help to illustrate the request and notes below:
> {code:java}
> if not parquet_path:
> parquet_path = f"./data/dataset_{size}.parquet"
> if not plasma_path:
> plasma_path = f"./data/dataset_{size}.plasma"
> try:
> plasma_client = plasma.connect(plasma_path)
> except:
> plasma_client = None
> if plasma_client:
> table_id = plasma.ObjectID(bytes(parquet_path[:20], encoding='utf8'))
> try:
> table = plasma_client.get(table_id, timeout_ms=4000)
> if table.__name__ == 'ObjectNotAvailable':
> raise ValueError('Failed to get plasma object')
> except ValueError:
> table = pq.read_table(parquet_path, use_threads=True)
> plasma_client.create_and_seal(table_id, table)
> {code}
>  
> The use case is a workflow something like this:
>  - process-A
>  ** generate a pandas DataFrame `df`
>  ** save the `df` to parquet, using pyarrow.parquet, with a unique parquet 
> path
>  ** (this process will not save directly to plasma)
>  - process-B
>  ** get the data from plasma or load it into plasma from the parquet file
>  ** use the unique parquet path to generate a unique object-ID
> Notes:
>  - `plasma_client.put` for the same data-table is not idempotent, it 
> generates unique object-ID values that are not based on any hash of the data 
> payload, so every put saves a new object-ID; could it use a data hash for 
> idempotent puts? e.g.
>  - 
> {code:java}
> In : plasma_client.put(table)
> ObjectID(25fcb60959d23b6bfc739f88816da29e04d6)
> In : plasma_client.put(table)
> ObjectID(d2a4662999db30177b090f9fc2bf6b28687d2f8d)
> In : plasma_client.put(table)
> ObjectID(b2928ad786de2fdb74d374055597f6e7bd97fd61)
> In : hash(table)
> TypeError: unhashable type: 'pyarrow.lib.Table'{code}
>  - In process-B, when the data is not already in plasma, it reads data from a 
> parquet file into a pyarrow.Table and then needs an object-ID and the table 
> size to use plasma `client.create_and_seal` but it's not easy to get the 
> table size - this might be related to github issue #2707 (#3444) - it might 
> be ideal if the `client.create_and_seal` accepts responsibility for the size 
> of the object to be created when given a pyarrow data object like a table.
>  - when the plasma store does not have the object, it could have a default 
> timeout rather than hang indefinitely, and it's a bit clumsy to return an 
> object that is not easily checked with `isinstance` and it could be better to 
> have an exception handling pattern (or something like the requests 404 
> patterns and options?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5381) [C++] Crash at arrow::internal::CountSetBits

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5381:
--
Summary: [C++] Crash at arrow::internal::CountSetBits  (was: Crash at 
arrow::internal::CountSetBits)

> [C++] Crash at arrow::internal::CountSetBits
> 
>
> Key: ARROW-5381
> URL: https://issues.apache.org/jira/browse/ARROW-5381
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Operating System: Windows 7 Professional 64-bit (6.1, 
> Build 7601) Service Pack 1(7601.win7sp1_ldr_escrow.181110-1429)
> Language: English (Regional Setting: English)
> System Manufacturer: SAMSUNG ELECTRONICS CO., LTD.
> System Model: RV420/RV520/RV720/E3530/S3530/E3420/E3520
> BIOS: Phoenix SecureCore-Tiano(tm) NB Version 2.1 05PQ
> Processor: Intel(R) Pentium(R) CPU B950 @ 2.10GHz (2 CPUs), ~2.1GHz
> Memory: 2048MB RAM
> Available OS Memory: 1962MB RAM
>   Page File: 1517MB used, 2405MB available
> Windows Dir: C:\Windows
> DirectX Version: DirectX 11
>Reporter: Tham
>Priority: Major
>
> I've got a lot of crash dump from a customer's windows machine. The 
> stacktrace shows that it crashed at arrow::internal::CountSetBits.
>  
> {code:java}
> STACK_TEXT:  
> 00c9`5354a4c0 7ff7`2f2830fd : 00c9`544841c0 ` 
> `1e00 ` : 
> CortexService!arrow::internal::CountSetBits+0x16d
> 00c9`5354a550 7ff7`2f2834b7 : 00c9`5337c930 ` 
> ` ` : 
> CortexService!arrow::ArrayData::GetNullCount+0x8d
> 00c9`5354a580 7ff7`2f13df55 : 00c9`54476080 00c9`5354a5d8 
> ` ` : 
> CortexService!arrow::Array::null_count+0x37
> 00c9`5354a5b0 7ff7`2f13fb68 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::Visit >+0xa5
> 00c9`5354a640 7ff7`2f12fa34 : 00c9`5354a6f8 00c9`54476080 
> 00c9`5354ab40 ` : 
> CortexService!arrow::VisitArrayInline namespace'::LevelBuilder>+0x298
> 00c9`5354a680 7ff7`2f14bf03 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::VisitInline+0x44
> 00c9`5354a6c0 7ff7`2f12fe2a : 00c9`5354ab40 00c9`5354ae18 
> 00c9`54476080 00c9`5354b208 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::GenerateLevels+0x93
> 00c9`5354aa00 7ff7`2f14de56 : 00c9`5354b1f8 00c9`5354afc8 
> 00c9`54476080 `1e00 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x25a
> 00c9`5354af20 7ff7`2f14e66b : 00c9`5354b1f8 00c9`5354b238 
> 00c9`54445c20 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x2a6
> 00c9`5354b040 7ff7`2f12f137 : 00c9`544041f0 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::Impl::WriteColumnChunk+0x70b
> 00c9`5354b400 7ff7`2f14b4d5 : 00c9`54431180 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::WriteColumnChunk+0x67
> 00c9`5354b450 7ff7`2f12eef1 : 00c9`5354b5d8 00c9`5354b648 
> ` `1e00 : 
> CortexService!::operator()+0x195
> 00c9`5354b530 7ff7`2eb8e31e : 00c9`54431180 00c9`5354b760 
> 00c9`54442fb0 `1e00 : 
> CortexService!parquet::arrow::FileWriter::WriteTable+0x521
> 00c9`5354b730 7ff7`2eb58ac5 : 00c9`5307bd88 00c9`54442fb0 
> ` ` : 
> CortexService!Cortex::Storage::ParquetStreamWriter::writeRowGroup+0xfe
> 00c9`5354b860 7ff7`2eafdce6 : 00c9`5307bd80 00c9`5354ba08 
> 00c9`5354b9e0 00c9`5354b9d8 : 
> CortexService!Cortex::Storage::ParquetFileWriter::writeRowGroup+0x545
> 00c9`5354b9a0 7ff7`2eaf8bae : 00c9`53275600 00c9`53077220 
> `fffe ` : 
> CortexService!Cortex::Storage::DataStreamWriteWorker::onNewData+0x1a6
> {code}
> {code:java}
> FAILED_INSTRUCTION_ADDRESS: 
> CortexService!arrow::internal::CountSetBits+16d 
> [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc
>  @ 99]
> 7ff7`2f3a4e4d f3480fb800  popcnt  rax,qword ptr [rax]
> FOLLOWUP_IP: 
> CortexService!arrow::internal::CountSetBits+16d 
> [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc
>  @ 99]
> 7ff7`2f3a4e4d f3480fb800  popcnt  rax,qword ptr [rax]
> FAULTING_SOURCE_LINE:  
> 

[jira] [Updated] (ARROW-5324) plasma API requests

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5324:
--
Component/s: C++ - Plasma

> plasma API requests
> ---
>
> Key: ARROW-5324
> URL: https://issues.apache.org/jira/browse/ARROW-5324
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Darren Weber
>Priority: Minor
>
> Copied from [https://github.com/apache/arrow/issues/4318] (it's easier to 
> read there, sorry hate Jira formatting)
> Related to https://issues.apache.org/jira/browse/ARROW-3444 
> While working with the plasma API to create/seal an object for a table, using 
> a custom object-ID, it would help to have a convenience API to get the size 
> of the table.
> The following code might help to illustrate the request and notes below:
> {code:java}
> if not parquet_path:
> parquet_path = f"./data/dataset_{size}.parquet"
> if not plasma_path:
> plasma_path = f"./data/dataset_{size}.plasma"
> try:
> plasma_client = plasma.connect(plasma_path)
> except:
> plasma_client = None
> if plasma_client:
> table_id = plasma.ObjectID(bytes(parquet_path[:20], encoding='utf8'))
> try:
> table = plasma_client.get(table_id, timeout_ms=4000)
> if table.__name__ == 'ObjectNotAvailable':
> raise ValueError('Failed to get plasma object')
> except ValueError:
> table = pq.read_table(parquet_path, use_threads=True)
> plasma_client.create_and_seal(table_id, table)
> {code}
>  
> The use case is a workflow something like this:
>  - process-A
>  ** generate a pandas DataFrame `df`
>  ** save the `df` to parquet, using pyarrow.parquet, with a unique parquet 
> path
>  ** (this process will not save directly to plasma)
>  - process-B
>  ** get the data from plasma or load it into plasma from the parquet file
>  ** use the unique parquet path to generate a unique object-ID
> Notes:
>  - `plasma_client.put` for the same data-table is not idempotent, it 
> generates unique object-ID values that are not based on any hash of the data 
> payload, so every put saves a new object-ID; could it use a data hash for 
> idempotent puts? e.g.
>  - 
> {code:java}
> In : plasma_client.put(table)
> ObjectID(25fcb60959d23b6bfc739f88816da29e04d6)
> In : plasma_client.put(table)
> ObjectID(d2a4662999db30177b090f9fc2bf6b28687d2f8d)
> In : plasma_client.put(table)
> ObjectID(b2928ad786de2fdb74d374055597f6e7bd97fd61)
> In : hash(table)
> TypeError: unhashable type: 'pyarrow.lib.Table'{code}
>  - In process-B, when the data is not already in plasma, it reads data from a 
> parquet file into a pyarrow.Table and then needs an object-ID and the table 
> size to use plasma `client.create_and_seal` but it's not easy to get the 
> table size - this might be related to github issue #2707 (#3444) - it might 
> be ideal if the `client.create_and_seal` accepts responsibility for the size 
> of the object to be created when given a pyarrow data object like a table.
>  - when the plasma store does not have the object, it could have a default 
> timeout rather than hang indefinitely, and it's a bit clumsy to return an 
> object that is not easily checked with `isinstance` and it could be better to 
> have an exception handling pattern (or something like the requests 404 
> patterns and options?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5381) Crash at arrow::internal::CountSetBits

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5381:
--
Component/s: C++

> Crash at arrow::internal::CountSetBits
> --
>
> Key: ARROW-5381
> URL: https://issues.apache.org/jira/browse/ARROW-5381
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Operating System: Windows 7 Professional 64-bit (6.1, 
> Build 7601) Service Pack 1(7601.win7sp1_ldr_escrow.181110-1429)
> Language: English (Regional Setting: English)
> System Manufacturer: SAMSUNG ELECTRONICS CO., LTD.
> System Model: RV420/RV520/RV720/E3530/S3530/E3420/E3520
> BIOS: Phoenix SecureCore-Tiano(tm) NB Version 2.1 05PQ
> Processor: Intel(R) Pentium(R) CPU B950 @ 2.10GHz (2 CPUs), ~2.1GHz
> Memory: 2048MB RAM
> Available OS Memory: 1962MB RAM
>   Page File: 1517MB used, 2405MB available
> Windows Dir: C:\Windows
> DirectX Version: DirectX 11
>Reporter: Tham
>Priority: Major
>
> I've got a lot of crash dump from a customer's windows machine. The 
> stacktrace shows that it crashed at arrow::internal::CountSetBits.
>  
> {code:java}
> STACK_TEXT:  
> 00c9`5354a4c0 7ff7`2f2830fd : 00c9`544841c0 ` 
> `1e00 ` : 
> CortexService!arrow::internal::CountSetBits+0x16d
> 00c9`5354a550 7ff7`2f2834b7 : 00c9`5337c930 ` 
> ` ` : 
> CortexService!arrow::ArrayData::GetNullCount+0x8d
> 00c9`5354a580 7ff7`2f13df55 : 00c9`54476080 00c9`5354a5d8 
> ` ` : 
> CortexService!arrow::Array::null_count+0x37
> 00c9`5354a5b0 7ff7`2f13fb68 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::Visit >+0xa5
> 00c9`5354a640 7ff7`2f12fa34 : 00c9`5354a6f8 00c9`54476080 
> 00c9`5354ab40 ` : 
> CortexService!arrow::VisitArrayInline namespace'::LevelBuilder>+0x298
> 00c9`5354a680 7ff7`2f14bf03 : 00c9`5354ab40 00c9`5354a6f8 
> 00c9`54476080 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::VisitInline+0x44
> 00c9`5354a6c0 7ff7`2f12fe2a : 00c9`5354ab40 00c9`5354ae18 
> 00c9`54476080 00c9`5354b208 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::LevelBuilder::GenerateLevels+0x93
> 00c9`5354aa00 7ff7`2f14de56 : 00c9`5354b1f8 00c9`5354afc8 
> 00c9`54476080 `1e00 : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x25a
> 00c9`5354af20 7ff7`2f14e66b : 00c9`5354b1f8 00c9`5354b238 
> 00c9`54445c20 ` : 
> CortexService!parquet::arrow::`anonymous 
> namespace'::ArrowColumnWriter::Write+0x2a6
> 00c9`5354b040 7ff7`2f12f137 : 00c9`544041f0 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::Impl::WriteColumnChunk+0x70b
> 00c9`5354b400 7ff7`2f14b4d5 : 00c9`54431180 00c9`5354b4d8 
> 00c9`5354b4a8 ` : 
> CortexService!parquet::arrow::FileWriter::WriteColumnChunk+0x67
> 00c9`5354b450 7ff7`2f12eef1 : 00c9`5354b5d8 00c9`5354b648 
> ` `1e00 : 
> CortexService!::operator()+0x195
> 00c9`5354b530 7ff7`2eb8e31e : 00c9`54431180 00c9`5354b760 
> 00c9`54442fb0 `1e00 : 
> CortexService!parquet::arrow::FileWriter::WriteTable+0x521
> 00c9`5354b730 7ff7`2eb58ac5 : 00c9`5307bd88 00c9`54442fb0 
> ` ` : 
> CortexService!Cortex::Storage::ParquetStreamWriter::writeRowGroup+0xfe
> 00c9`5354b860 7ff7`2eafdce6 : 00c9`5307bd80 00c9`5354ba08 
> 00c9`5354b9e0 00c9`5354b9d8 : 
> CortexService!Cortex::Storage::ParquetFileWriter::writeRowGroup+0x545
> 00c9`5354b9a0 7ff7`2eaf8bae : 00c9`53275600 00c9`53077220 
> `fffe ` : 
> CortexService!Cortex::Storage::DataStreamWriteWorker::onNewData+0x1a6
> {code}
> {code:java}
> FAILED_INSTRUCTION_ADDRESS: 
> CortexService!arrow::internal::CountSetBits+16d 
> [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc
>  @ 99]
> 7ff7`2f3a4e4d f3480fb800  popcnt  rax,qword ptr [rax]
> FOLLOWUP_IP: 
> CortexService!arrow::internal::CountSetBits+16d 
> [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc
>  @ 99]
> 7ff7`2f3a4e4d f3480fb800  popcnt  rax,qword ptr [rax]
> FAULTING_SOURCE_LINE:  
> c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc

[jira] [Updated] (ARROW-5402) [Plasma] Pin objects in plasma store

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5402:
--
Component/s: C++ - Plasma

> [Plasma] Pin objects in plasma store
> 
>
> Key: ARROW-5402
> URL: https://issues.apache.org/jira/browse/ARROW-5402
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Zhijun Fu
>Assignee: Zhijun Fu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/arrow/issues/4368]
> Sometimes we want to "pin" an object in plasma store - we don't want this 
> object to be deleted even though there's nobody that's currently referencing 
> it. In this case, we can specify a flag when creating the object so that it 
> won't be deleted by LRU cache when its refcnt drops to 0, and can only be 
> deleted by an explicit {{Delete()}} call.
> Currently, we found that an actor FO problem. The actor creation task depends 
> on a plasma object put by user. After the the actor running for a long time, 
> the object will be deleted by plasma LRU. Then, an Actor FO happens, the 
> creation task cannot find the object put by user, so the FO is hanging 
> forever.
> Would this make sense to you?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5336) [C++] Implement arrow::Concatenate for dictionary-encoded arrays with unequal dictionaries

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5336:
--
Component/s: C++

> [C++] Implement arrow::Concatenate for dictionary-encoded arrays with unequal 
> dictionaries
> --
>
> Key: ARROW-5336
> URL: https://issues.apache.org/jira/browse/ARROW-5336
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> Currently (as of ARROW-3144) if any dictionary is different, an error is 
> returned



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5438) [JS] Utilize stream EOS in File format

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5438:
--
Component/s: JavaScript

> [JS] Utilize stream EOS in File format
> --
>
> Key: ARROW-5438
> URL: https://issues.apache.org/jira/browse/ARROW-5438
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: John Muehlhausen
>Priority: Minor
>
> We currently do not write EOS at the end of a Message stream inside the File 
> format.  As a result, the file cannot be parsed sequentially.  This change 
> prepares for other implementations or future reference features that parse a 
> File sequentially... i.e. without access to seek().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5417) [Website] http://arrow.apache.org doesn't redirect to https

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5417:
--
Component/s: Website

> [Website] http://arrow.apache.org doesn't redirect to https
> ---
>
> Key: ARROW-5417
> URL: https://issues.apache.org/jira/browse/ARROW-5417
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Neal Richardson
>Priority: Minor
>
> This should be a simple (for someone sufficiently authorized) config change 
> somewhere.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5476) [Java][Memory] Fix Netty ArrowBuf Slice

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5476:
--
Component/s: Java

> [Java][Memory] Fix Netty ArrowBuf Slice
> ---
>
> Key: ARROW-5476
> URL: https://issues.apache.org/jira/browse/ARROW-5476
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Affects Versions: 0.14.0
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The slice of netty arrow buf depends on arrow buf reader and writer indexes, 
> but arrow buf is supposed to only track memory addr + length and there are 
> places where the arrow buf indexes are not in sync with netty.
> So slice should use the indexes in Netty Arrow Buf instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5439) [Java] Utilize stream EOS in File format

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5439:
--
Component/s: Java

> [Java] Utilize stream EOS in File format
> 
>
> Key: ARROW-5439
> URL: https://issues.apache.org/jira/browse/ARROW-5439
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: John Muehlhausen
>Priority: Minor
>
> We currently do not write EOS at the end of a Message stream inside the File 
> format.  As a result, the file cannot be parsed sequentially.  This change 
> prepares for other implementations or future reference features that parse a 
> File sequentially... i.e. without access to seek().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5435) [Java] IntervalYearVector#getObject should return Period with both year and month

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5435:
--
Component/s: Java

> [Java] IntervalYearVector#getObject should return Period with both year and 
> month
> -
>
> Key: ARROW-5435
> URL: https://issues.apache.org/jira/browse/ARROW-5435
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> IntervalYearVector#getObject today return Period with specific month. 
> However, this vector stores interval (years and months, e.g. 2 years and 3 
> months is stored as 27(total months)), it should return Period with both 
> years and months(now only months is assigned). 
> As shown in the example above, now it return Period(27 months), I think it 
> should return Period(2 years, 3 months).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5471) [C++][Gandiva]Array offset is ignored in Gandiva projector

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5471:
--
Component/s: C++ - Gandiva

> [C++][Gandiva]Array offset is ignored in Gandiva projector
> --
>
> Key: ARROW-5471
> URL: https://issues.apache.org/jira/browse/ARROW-5471
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Zeyuan Shang
>Priority: Major
>
> I used the test case in 
> [https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_gandiva.py#L25],
>  and found an issue when I was using the slice operator {{input_batch[1:]}}. 
> It seems that the offset is ignored in the Gandiva projector.
> {code:java}
> import pyarrow as pa
> import pyarrow.gandiva as gandiva
> builder = gandiva.TreeExprBuilder()
> field_a = pa.field('a', pa.int32())
> field_b = pa.field('b', pa.int32())
> schema = pa.schema([field_a, field_b])
> field_result = pa.field('res', pa.int32())
> node_a = builder.make_field(field_a)
> node_b = builder.make_field(field_b)
> condition = builder.make_function("greater_than", [node_a, node_b],
> pa.bool_())
> if_node = builder.make_if(condition, node_a, node_b, pa.int32())
> expr = builder.make_expression(if_node, field_result)
> projector = gandiva.make_projector(
> schema, [expr], pa.default_memory_pool())
> a = pa.array([10, 12, -20, 5], type=pa.int32())
> b = pa.array([5, 15, 15, 17], type=pa.int32())
> e = pa.array([10, 15, 15, 17], type=pa.int32())
> input_batch = pa.RecordBatch.from_arrays([a, b], names=['a', 'b'])
> r, = projector.evaluate(input_batch[1:])
> print(r)
> {code}
> If we use the full record batch {{input_batch}}, the expected output is 
> {{[10, 15, 15, 17]}}. So if we use {{input_batch[1:]}}, the expected output 
> should be {{[15, 15, 17]}}, however this script returned {{[10, 15, 15]}}. It 
> seems that the projector ignores the offset and always reads from 0.
>  
> A corresponding issue is created in GitHub as well 
> [https://github.com/apache/arrow/issues/4420]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5440) [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5440:
--
Component/s: Rust

> [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos
> -
>
> Key: ARROW-5440
> URL: https://issues.apache.org/jira/browse/ARROW-5440
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
> Environment: CentOS Linux release 7.6.1810 (Core) 
>Reporter: Tenzin Rigden
>Priority: Major
> Attachments: parquet-test-libstd.tar.gz
>
>
> Hello,
> In the rust parquet implementation ([https://github.com/sunchao/parquet-rs]) 
> on centos, the binary created has a `libstd-hash.so` shared library 
> dependency that is causing issues since it's a shared library found in the 
> rustup directory. This `libstd-hash.so` dependency isn't there on any other 
> rust binaries I've made before. This dependency means that I can't run this 
> binary anywhere where rustup isn't installed with that exact libstd library.
> This is not an issue on Mac.
> I've attached the rust files and here is the command line output below.
> {code:java|title=cli-output|borderStyle=solid}
> [centos@_ parquet-test]$ cat /etc/centos-release
> CentOS Linux release 7.6.1810 (Core)
> [centos@_ parquet-test]$ rustc --version
> rustc 1.36.0-nightly (e70d5386d 2019-05-27)
> [centos@_ parquet-test]$ ldd target/release/parquet-test
> linux-vdso.so.1 =>  (0x7ffd02fee000)
> libstd-44988553032616b2.so => not found
> librt.so.1 => /lib64/librt.so.1 (0x7f6ecd209000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f6eccfed000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f6eccdd7000)
> libc.so.6 => /lib64/libc.so.6 (0x7f6ecca0a000)
> libm.so.6 => /lib64/libm.so.6 (0x7f6ecc708000)
> /lib64/ld-linux-x86-64.so.2 (0x7f6ecd8b1000)
> [centos@_ parquet-test]$ ls -l 
> ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> -rw-r--r--. 1 centos centos 5623568 May 27 21:46 
> /home/centos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5450) [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too large to convert to C long

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5450:
--
Component/s: Python

> [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too 
> large to convert to C long
> ---
>
> Key: ARROW-5450
> URL: https://issues.apache.org/jira/browse/ARROW-5450
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Tim Swast
>Priority: Major
>
> When I attempt to roundtrip from a list of moderately large (beyond what can 
> be represented in nanosecond precision, but within microsecond precision) 
> datetime objects to pyarrow and back, I get an OverflowError: Python int too 
> large to convert to C long.
> pyarrow version:
> {noformat}
> $ pip freeze | grep pyarrow
> pyarrow==0.13.0{noformat}
>  
> Reproduction:
> {code:java}
> import datetime
> import pandas
> import pyarrow
> import pytz
> timestamp_rows = [
> datetime.datetime(1, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
> None,
> datetime.datetime(, 12, 31, 23, 59, 59, 99, tzinfo=pytz.utc),
> datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
> ]
> timestamp_array = pyarrow.array(timestamp_rows, pyarrow.timestamp("us", 
> tz="UTC"))
> timestamp_roundtrip = timestamp_array.to_pylist()
> # ---
> # OverflowError Traceback (most recent call last)
> #  in 
> # > 1 timestamp_roundtrip = timestamp_array.to_pylist()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
>  in __iter__()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib.TimestampValue.as_py()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib._datetime_conversion_functions.lambda5()
> #
> # pandas/_libs/tslibs/timestamps.pyx in 
> pandas._libs.tslibs.timestamps.Timestamp.__new__()
> #
> # pandas/_libs/tslibs/conversion.pyx in 
> pandas._libs.tslibs.conversion.convert_to_tsobject()
> #
> # OverflowError: Python int too large to convert to C long
> {code}
> For good measure, I also tested with timezone-naive timestamps with the same 
> error:
> {code:java}
> naive_rows = [
> datetime.datetime(1, 1, 1, 0, 0, 0),
> None,
> datetime.datetime(, 12, 31, 23, 59, 59, 99),
> datetime.datetime(1970, 1, 1, 0, 0, 0),
> ]
> naive_array = pyarrow.array(naive_rows, pyarrow.timestamp("us", tz=None))
> naive_roundtrip = naive_array.to_pylist()
> # ---
> # OverflowError Traceback (most recent call last)
> #  in 
> # > 1 naive_roundtrip = naive_array.to_pylist()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
>  in __iter__()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib.TimestampValue.as_py()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib._datetime_conversion_functions.lambda5()
> #
> # pandas/_libs/tslibs/timestamps.pyx in 
> pandas._libs.tslibs.timestamps.Timestamp.__new__()
> #
> # pandas/_libs/tslibs/conversion.pyx in 
> pandas._libs.tslibs.conversion.convert_to_tsobject()
> #
> # OverflowError: Python int too large to convert to C long
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5410) [C++] Crash at arrow::internal::FileWrite

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5410:
--
Component/s: C++

> [C++] Crash at arrow::internal::FileWrite
> -
>
> Key: ARROW-5410
> URL: https://issues.apache.org/jira/browse/ARROW-5410
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
> Environment: Windows version 10.0.14393.0 (rs1_release.160715-1616)
>Reporter: Tham
>Priority: Major
>  Labels: parquet
>
> My application is writing a bunch of parquet files and it often crashes. Most 
> of the time it crashes when writing the first file, sometimes it can write 
> the first file and crashing at the 2nd file. The file can always be opened. 
> It only crashes at writeTable.
> As I tested, my application crashes when build with release mode, but don't 
> crash with debug mode. It crashed only on one Windows machine, not others.
> Here is stack trace from dump file:
> {code:java}
> STACK_TEXT:  
> 001e`10efd840 7ffc`0333d53f : ` 001e`10efe230 
> `0033 7ffc`032dbe21 : 
> CortexSync!google_breakpad::ExceptionHandler::HandleInvalidParameter+0x1a0
> 001e`10efe170 7ffc`0333d559 : `ff02 7ffc`032da63d 
> `0033 `0033 : ucrtbase!invalid_parameter+0x13f
> 001e`10efe1b0 7ffc`03318664 : 7ff7`7f7c8489 `ff02 
> 001e`10efe230 `0033 : ucrtbase!invalid_parameter_noinfo+0x9
> 001e`10efe1f0 7ffc`032d926d : ` `0140 
> `0005 0122`bbe61e30 : 
> ucrtbase!_acrt_uninitialize_command_line+0x6fd4
> 001e`10efe250 7ff7`7f66585e : 0010`0005 ` 
> 001e`10efe560 0122`b2337b88 : ucrtbase!write+0x8d
> 001e`10efe2a0 7ff7`7f632785 : 7ff7` 7ff7`7f7bb153 
> 0122`bbe890e0 001e`10efe634 : 
> CortexSync!arrow::internal::FileWrite+0x5e
> 001e`10efe360 7ff7`7f632442 : `348a `0004 
> 733f`5e86f38c 0122`bbe14c40 : 
> CortexSync!arrow::io::OSFile::Write+0x1d5
> 001e`10efe510 7ff7`7f71c1b9 : 001e`10efe738 7ff7`7f665522 
> 0122`bbffe6e0 ` : 
> CortexSync!arrow::io::FileOutputStream::Write+0x12
> 001e`10efe540 7ff7`7f79cb2f : 0122`bbe61e30 0122`bbffe6e0 
> `0013 001e`10efe730 : 
> CortexSync!parquet::ArrowOutputStream::Write+0x39
> 001e`10efe6e0 7ff7`7f7abbaf : 7ff7`7fd75b78 7ff7`7fd75b78 
> 001e`10efe9c0 ` : 
> CortexSync!parquet::ThriftSerializer::Serialize+0x11f
> 001e`10efe8c0 7ff7`7f7aaf93 : ` 0122`bbe3f450 
> `0002 0122`bc0218d0 : 
> CortexSync!parquet::SerializedPageWriter::WriteDictionaryPage+0x44f
> 001e`10efee20 7ff7`7f7a3707 : 0122`bbe3f450 001e`10eff250 
> ` 0122`b168 : 
> CortexSync!parquet::TypedColumnWriterImpl 
> >::WriteDictionaryPage+0x143
> 001e`10efeed0 7ff7`7f710480 : 001e`10eff1c0 ` 
> 0122`bbe3f540 0122`b2439998 : 
> CortexSync!parquet::ColumnWriterImpl::Close+0x47
> 001e`10efef60 7ff7`7f7154da : 0122`bbec3cd0 001e`10eff1c0 
> 0122`bbec4bb0 0122`b2439998 : 
> CortexSync!parquet::arrow::FileWriter::Impl::`vector deleting 
> destructor'+0x100
> 001e`10efefa0 7ff7`7f71619c : ` 001e`10eff1c0 
> 0122`bbe89390 ` : 
> CortexSync!parquet::arrow::FileWriter::Impl::WriteColumnChunk+0x6fa
> 001e`10eff150 7ff7`7f202de9 : `0001 001e`10eff430 
> `000f ` : 
> CortexSync!parquet::arrow::FileWriter::WriteTable+0x6cc
> 001e`10eff410 7ff7`7f18baf3 : 0122`bbec39b0 0122`b24c53f8 
> `3f80 ` : 
> CortexSync!Cortex::Storage::ParquetStreamWriter::writeRowGroup+0x49{code}
> I tried a lot of ways to find out the root cause, but failed. Can anyone here 
> give me some information/advice please, so that I can investigate more? 
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4906) [Format] Fix document to describe that SparseMatrixIndexCSR assumes indptr is sorted for each row

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4906:
--
Component/s: Format

> [Format] Fix document to describe that SparseMatrixIndexCSR assumes indptr is 
> sorted for each row
> -
>
> Key: ARROW-4906
> URL: https://issues.apache.org/jira/browse/ARROW-4906
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4831) [C++] CMAKE_AR is not passed to ZSTD thirdparty dependency

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4831:
--
Component/s: C++

> [C++] CMAKE_AR is not passed to ZSTD thirdparty dependency 
> ---
>
> Key: ARROW-4831
> URL: https://issues.apache.org/jira/browse/ARROW-4831
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ZSTD_CMAKE_ARGS should utilize 
> https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L359



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4994) [website] Update Details for ptgoetz

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4994:
--
Component/s: Website

> [website] Update Details for ptgoetz
> 
>
> Key: ARROW-4994
> URL: https://issues.apache.org/jira/browse/ARROW-4994
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: P. Taylor Goetz
>Assignee: P. Taylor Goetz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I'm no longer with Hortonworks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4950) [C++] Thirdparty CMake error get_target_property() called with non-existent target LZ4::lz4

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4950:
--
Component/s: C++

> [C++] Thirdparty CMake error get_target_property() called with non-existent 
> target LZ4::lz4
> ---
>
> Key: ARROW-4950
> URL: https://issues.apache.org/jira/browse/ARROW-4950
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With CMake 3.2 https://travis-ci.org/kszucs/crossbow/builds/507811485
> {code}
> docker-compose build cpp-cmake32
> docker-compose run --rm cpp-cmake32
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4988) [JS] Bump required node version to 11.12

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4988:
--
Component/s: JavaScript

> [JS] Bump required node version to 11.12
> 
>
> Key: ARROW-4988
> URL: https://issues.apache.org/jira/browse/ARROW-4988
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The cause of ARROW-4948 and 
> http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C5ce620e0-0063-4bee-8ad6-a41301ac08c4%40www.fastmail.com%3E
> was actually a regression in node v11.11, resolved in v11.12 see 
> https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V11.md#2019-03-15-version-11120-current-bridgear
>  and https://github.com/nodejs/node/pull/26488
> Bump requirement up to 11.12



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5010) [Release] Fix release script with llvm-7

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5010:
--
Component/s: Developer Tools

> [Release] Fix release script with llvm-7
> 
>
> Key: ARROW-5010
> URL: https://issues.apache.org/jira/browse/ARROW-5010
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Source release script fails to compile gandiva because it requires llvm-7 and 
> only llvm-6 is available in the ubuntu18 docker image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5011) [Release] Add support in the source release script for custom hash

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5011:
--
Component/s: Developer Tools

> [Release] Add support in the source release script for custom hash
> --
>
> Key: ARROW-5011
> URL: https://issues.apache.org/jira/browse/ARROW-5011
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is a minor feature to help debugging said script on a by overriding the 
> git-archive hash instead of the hash inferred from the release tag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5447) [CI] [Ruby] CI is failed on AppVeyor

2019-06-03 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854510#comment-16854510
 ] 

Antoine Pitrou commented on ARROW-5447:
---

It seems the error is non-deterministic. Another instance: 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/24998213

> [CI] [Ruby] CI is failed on AppVeyor
> 
>
> Key: ARROW-5447
> URL: https://issues.apache.org/jira/browse/ARROW-5447
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Ruby
>Reporter: Yosuke Shiro
>Priority: Major
>
> This happens sometimes.
> {code:java}
> Error: test: csv.gz(TableTest::#save and .load::path:::format::load: auto 
> detect): Arrow::Error::Io: [csv-reader][read]: IOError: zlib inflate failed: 
> invalid distance too far back
> c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in
>  `invoke'
> c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in
>  `invoke'
> c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:533:in
>  `block in define_method'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:158:in `block (2 
> levels) in load_from_path'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:147:in `block (2 
> levels) in wrap_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:140:in 
> `open_encoding_convert_stream'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:146:in `block in 
> wrap_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:125:in `block in 
> open_decompress_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:124:in 
> `open_decompress_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:145:in `wrap_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:157:in `block in 
> load_from_path'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:156:in 
> `load_from_path'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:39:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:26:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:158:in 
> `load_as_csv'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:50:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:22:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table.rb:27:in `load'
> C:/projects/arrow/ruby/red-arrow/test/test-table.rb:503:in `block (5 levels) 
> in '
> ===
> {code}
>  
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/24813909/job/kkc98r3e4ltxeor3#L2328



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5447) [CI] [Ruby] CI is failed on AppVeyor

2019-06-03 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854511#comment-16854511
 ] 

Antoine Pitrou commented on ARROW-5447:
---

[~kou]

> [CI] [Ruby] CI is failed on AppVeyor
> 
>
> Key: ARROW-5447
> URL: https://issues.apache.org/jira/browse/ARROW-5447
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Ruby
>Reporter: Yosuke Shiro
>Priority: Major
>
> This happens sometimes.
> {code:java}
> Error: test: csv.gz(TableTest::#save and .load::path:::format::load: auto 
> detect): Arrow::Error::Io: [csv-reader][read]: IOError: zlib inflate failed: 
> invalid distance too far back
> c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in
>  `invoke'
> c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in
>  `invoke'
> c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:533:in
>  `block in define_method'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:158:in `block (2 
> levels) in load_from_path'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:147:in `block (2 
> levels) in wrap_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:140:in 
> `open_encoding_convert_stream'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:146:in `block in 
> wrap_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:125:in `block in 
> open_decompress_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:124:in 
> `open_decompress_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:145:in `wrap_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:157:in `block in 
> load_from_path'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:156:in 
> `load_from_path'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:39:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:26:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:158:in 
> `load_as_csv'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:50:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:22:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table.rb:27:in `load'
> C:/projects/arrow/ruby/red-arrow/test/test-table.rb:503:in `block (5 levels) 
> in '
> ===
> {code}
>  
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/24813909/job/kkc98r3e4ltxeor3#L2328



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5447) [CI] [Ruby] CI is failed on AppVeyor

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5447:
--
Summary: [CI] [Ruby] CI is failed on AppVeyor  (was: [CI] [Ruby] CI is 
failued on AppVeyor)

> [CI] [Ruby] CI is failed on AppVeyor
> 
>
> Key: ARROW-5447
> URL: https://issues.apache.org/jira/browse/ARROW-5447
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Ruby
>Reporter: Yosuke Shiro
>Priority: Major
>
> This happens sometimes.
> {code:java}
> Error: test: csv.gz(TableTest::#save and .load::path:::format::load: auto 
> detect): Arrow::Error::Io: [csv-reader][read]: IOError: zlib inflate failed: 
> invalid distance too far back
> c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in
>  `invoke'
> c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in
>  `invoke'
> c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:533:in
>  `block in define_method'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:158:in `block (2 
> levels) in load_from_path'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:147:in `block (2 
> levels) in wrap_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:140:in 
> `open_encoding_convert_stream'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:146:in `block in 
> wrap_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:125:in `block in 
> open_decompress_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:124:in 
> `open_decompress_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:145:in `wrap_input'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:157:in `block in 
> load_from_path'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:156:in 
> `load_from_path'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:39:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:26:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:158:in 
> `load_as_csv'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:50:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:22:in `load'
> C:/projects/arrow/ruby/red-arrow/lib/arrow/table.rb:27:in `load'
> C:/projects/arrow/ruby/red-arrow/test/test-table.rb:503:in `block (5 levels) 
> in '
> ===
> {code}
>  
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/24813909/job/kkc98r3e4ltxeor3#L2328



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3676) [Go] implement Decimal128 array

2019-06-03 Thread Sebastien Binet (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854415#comment-16854415
 ] 

Sebastien Binet commented on ARROW-3676:


one possible package to leverage and implement this (w/o reaching for, say, 
`math/big.Int`) could be: [https://github.com/lukechampine/uint128]

or just piggyback on what was implemented in C++:

- 
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/basic_decimal.h]

> [Go] implement Decimal128 array
> ---
>
> Key: ARROW-3676
> URL: https://issues.apache.org/jira/browse/ARROW-3676
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5488) [R] Workaround when C++ lib not available

2019-06-03 Thread JIRA
Romain François created ARROW-5488:
--

 Summary: [R] Workaround when C++ lib not available
 Key: ARROW-5488
 URL: https://issues.apache.org/jira/browse/ARROW-5488
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Romain François


As a way to get to CRAN, we need some way for the package still compile and 
install and test (although do nothing useful) even when the c++ lib is not 
available. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    1   2   3   4   5   >