[jira] [Updated] (ARROW-2087) [Python] Binaries of 3rdparty are not stripped in manylinux1 base image
[ https://issues.apache.org/jira/browse/ARROW-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2087: -- Component/s: Python Packaging > [Python] Binaries of 3rdparty are not stripped in manylinux1 base image > --- > > Key: ARROW-2087 > URL: https://issues.apache.org/jira/browse/ARROW-2087 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > CMake pip package: > [https://github.com/scikit-build/cmake-python-distributions/issues/32] > Pandas pip package: [https://github.com/pandas-dev/pandas/issues/19531] > NumPy pip package: https://github.com/numpy/numpy/issues/10519 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2114) [Python] Pull latest docker manylinux1 image
[ https://issues.apache.org/jira/browse/ARROW-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2114: -- Component/s: Python Packaging > [Python] Pull latest docker manylinux1 image > > > Key: ARROW-2114 > URL: https://issues.apache.org/jira/browse/ARROW-2114 > Project: Apache Arrow > Issue Type: Task > Components: Packaging, Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2146) [GLib] Implement Slice for ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2146: -- Component/s: GLib > [GLib] Implement Slice for ChunkedArray > --- > > Key: ARROW-2146 > URL: https://issues.apache.org/jira/browse/ARROW-2146 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Yosuke Shiro >Assignee: Yosuke Shiro >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Add {{Slice}} api to ChunkedArray. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2143) [Python] Provide a manylinux1 wheel for cp27m
[ https://issues.apache.org/jira/browse/ARROW-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2143: -- Component/s: Python > [Python] Provide a manylinux1 wheel for cp27m > - > > Key: ARROW-2143 > URL: https://issues.apache.org/jira/browse/ARROW-2143 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Currently we only provide it for cp27mu, we should also build them for cp27m -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2163) Install apt dependencies separate from built-in Travis commands, retry on flakiness
[ https://issues.apache.org/jira/browse/ARROW-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2163: -- Component/s: Continuous Integration > Install apt dependencies separate from built-in Travis commands, retry on > flakiness > --- > > Key: ARROW-2163 > URL: https://issues.apache.org/jira/browse/ARROW-2163 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > This would also allow us to run the detect changes script earlier than > installing apt dependencies, so unnecessary builds will terminate faster -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2168) [C++] Build toolchain builds with jemalloc
[ https://issues.apache.org/jira/browse/ARROW-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2168: -- Component/s: C++ > [C++] Build toolchain builds with jemalloc > -- > > Key: ARROW-2168 > URL: https://issues.apache.org/jira/browse/ARROW-2168 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > We have fixed all known problems in the jemalloc 4.x branch and should be > able to gradually reactivate it in our builds to get its performance boost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2190) [GLib] Add add/remove field functions for RecordBatch.
[ https://issues.apache.org/jira/browse/ARROW-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2190: -- Component/s: GLib > [GLib] Add add/remove field functions for RecordBatch. > -- > > Key: ARROW-2190 > URL: https://issues.apache.org/jira/browse/ARROW-2190 > Project: Apache Arrow > Issue Type: New Feature > Components: GLib >Reporter: Yosuke Shiro >Assignee: Yosuke Shiro >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Add AddColumn and RemoveColumn api to RecordBatch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2179) [C++] arrow/util/io-util.h missing from libarrow-dev
[ https://issues.apache.org/jira/browse/ARROW-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2179: -- Component/s: C++ > [C++] arrow/util/io-util.h missing from libarrow-dev > > > Key: ARROW-2179 > URL: https://issues.apache.org/jira/browse/ARROW-2179 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.8.0 >Reporter: Rares Vernica >Assignee: Wes McKinney >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > > {{arrow/util/io-util.h}} is missing from the {{libarow-dev}} package > (ubuntu/trusty): > {code:java} > > ls -1 /usr/include/arrow/util/ > bit-stream-utils.h > bit-util.h > bpacking.h > compiler-util.h > compression.h > compression_brotli.h > compression_lz4.h > compression_snappy.h > compression_zlib.h > compression_zstd.h > cpu-info.h > decimal.h > hash-util.h > hash.h > key_value_metadata.h > logging.h > macros.h > parallel.h > rle-encoding.h > sse-util.h > stl.h > type_traits.h > variant > variant.h > visibility.h > {code} > {code:java} > > apt-cache show libarrow-dev > Package: libarrow-dev > Architecture: amd64 > Version: 0.8.0-2 > Multi-Arch: same > Priority: optional > Section: libdevel > Source: apache-arrow > Maintainer: Kouhei Sutou > Installed-Size: 5696 > Depends: libarrow0 (= 0.8.0-2) > Filename: pool/trusty/universe/a/apache-arrow/libarrow-dev_0.8.0-2_amd64.deb > Size: 602716 > MD5sum: de5f2bfafd90ff29e4b192f4e5d26605 > SHA1: e3d9146b30f07c07b62f8bdf9f779d0ee5d05a75 > SHA256: 30a89b2ac6845998f22434e660b1a7c9d91dc8b2ba947e1f4333b3cf74c69982 > SHA512: > 99f511bee6645a68708848a58b4eba669a2ec8c45fb411c56ed2c920d3ff34552c77821eff7e428c886d16e450bdd25cc4e67597972f77a4255f302a56d1eac8 > Homepage: https://arrow.apache.org/ > Description: Apache Arrow is a data processing library for analysis > . > This package provides header files. > Description-md5: e4855d5dbadacb872bf8c4ca67f624e3 > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2185) Remove CI directives from squashed commit messages
[ https://issues.apache.org/jira/browse/ARROW-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2185: -- Component/s: Continuous Integration > Remove CI directives from squashed commit messages > -- > > Key: ARROW-2185 > URL: https://issues.apache.org/jira/browse/ARROW-2185 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > In our PR squash tool, we are potentially picking up CI directives like > {{[skip appveyor]}} from intermediate commits. We should regex these away and > instead use directives in the PR title if we wish the commit to master to > behave in a certain way -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2183) [C++] Add helper CMake function for globbing the right header files
[ https://issues.apache.org/jira/browse/ARROW-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2183: -- Component/s: C++ > [C++] Add helper CMake function for globbing the right header files > > > Key: ARROW-2183 > URL: https://issues.apache.org/jira/browse/ARROW-2183 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.12.0 > > > Brought up by discussion in https://github.com/apache/arrow/pull/1631 on > ARROW-2179. We should collect header files but do not install ones containing > particular patterns for non-public headers, like {{-internal}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2203) [C++] StderrStream class
[ https://issues.apache.org/jira/browse/ARROW-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2203: -- Component/s: C++ > [C++] StderrStream class > > > Key: ARROW-2203 > URL: https://issues.apache.org/jira/browse/ARROW-2203 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.8.0 >Reporter: Rares Vernica >Assignee: Rares Vernica >Priority: Trivial > Labels: pull-request-available > Fix For: 0.9.0 > > > The C++ API has support for reading and writing data from and to STDIN and > STDOUT. The classes are arrow::io::StdinStream and arrow::io::StdoutStream. > It some scenarios it might be useful to write data to STDERR. Adding a > StderrStream class should be a trivial addition given the StdoutStream class. > If you think a StderrStream class is a good idea, I am more than happy to > submit a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2216) [CI] CI descriptions and envars are misleading
[ https://issues.apache.org/jira/browse/ARROW-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2216: -- Component/s: Continuous Integration > [CI] CI descriptions and envars are misleading > -- > > Key: ARROW-2216 > URL: https://issues.apache.org/jira/browse/ARROW-2216 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Phillip Cloud >Assignee: Antoine Pitrou >Priority: Minor > Fix For: 0.12.0 > > > The descriptions of each of the CI builds are hard to decipher without > looking at the build scripts, which are themselves quite complex. > For example in this job: https://travis-ci.org/apache/arrow/jobs/346309532 > you can see that the envars {{CC}} and {{CXX}} are set to {{"clang-5.0"}} and > {{"clang++-5.0"}} respectively and they are then immediately set to {{gcc}} > and {{g++}}.' > Without intimate knowledge of the script it's very hard to diagnose CI issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2212) [C++/Python] Build Protobuf in base manylinux 1 docker image
[ https://issues.apache.org/jira/browse/ARROW-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2212: -- Component/s: Python Packaging > [C++/Python] Build Protobuf in base manylinux 1 docker image > > > Key: ARROW-2212 > URL: https://issues.apache.org/jira/browse/ARROW-2212 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging, Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > This should cut down the build times of the {{manylinux1}} CI matrix entry. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2268) Remove MD5 checksums from release process
[ https://issues.apache.org/jira/browse/ARROW-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2268: -- Component/s: Developer Tools > Remove MD5 checksums from release process > - > > Key: ARROW-2268 > URL: https://issues.apache.org/jira/browse/ARROW-2268 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > The ASF has changed its release policy for signatures and checksums to > contraindicate the use of MD5 checksums: > http://www.apache.org/dev/release-distribution#sigs-and-sums. We should > remove this from our various release scripts prior to the 0.9.0 release -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2650) [JS] Finish implementing Unions
[ https://issues.apache.org/jira/browse/ARROW-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2650: -- Component/s: JavaScript > [JS] Finish implementing Unions > --- > > Key: ARROW-2650 > URL: https://issues.apache.org/jira/browse/ARROW-2650 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Finish implementing Unions in JS and add to integration tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2329) [Website]: 0.9.0 release update
[ https://issues.apache.org/jira/browse/ARROW-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2329: -- Component/s: Website > [Website]: 0.9.0 release update > --- > > Key: ARROW-2329 > URL: https://issues.apache.org/jira/browse/ARROW-2329 > Project: Apache Arrow > Issue Type: Task > Components: Website >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2476) [Python/Question] Maximum length of an Array created from ndarray
[ https://issues.apache.org/jira/browse/ARROW-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2476: -- Component/s: Python > [Python/Question] Maximum length of an Array created from ndarray > - > > Key: ARROW-2476 > URL: https://issues.apache.org/jira/browse/ARROW-2476 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Krisztian Szucs >Priority: Minor > Fix For: 0.12.0 > > > So the format > [describes|https://github.com/apache/arrow/blob/master/format/Layout.md#array-lengths] > that an array max length is 2^31 - 1, however the following python snippet > creates a 2**32 length arrow array: > {code:python} > a = np.ones((2**32,), dtype='int8') > A = pa.Array.from_pandas(a) > type(A) > {code} > {code}pyarrow.lib.Int8Array{code} > Based the layout specification I'd expect a ChunkedArray of three Int8Array's > with lengths: > [2^31 - 1, 2^31 - 1, 2] or should raise an exception? > If it's the expectation is there any documentation for it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2803) [C++] Put hashing function into src/arrow/util
[ https://issues.apache.org/jira/browse/ARROW-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2803: -- Component/s: C++ > [C++] Put hashing function into src/arrow/util > -- > > Key: ARROW-2803 > URL: https://issues.apache.org/jira/browse/ARROW-2803 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Philipp Moritz >Assignee: Antoine Pitrou >Priority: Major > Labels: easytask > Fix For: 0.12.0 > > > See [https://github.com/apache/arrow/pull/2220] > We should decide what our default go-to hash function should be (maybe > murmur3?) and put it into src/arrow/util -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3199) [Plasma] Check for EAGAIN in recvmsg and sendmsg
[ https://issues.apache.org/jira/browse/ARROW-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3199: -- Component/s: C++ - Plasma > [Plasma] Check for EAGAIN in recvmsg and sendmsg > > > Key: ARROW-3199 > URL: https://issues.apache.org/jira/browse/ARROW-3199 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > It turns out that > [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L63] > and probably also > [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L49] > can block and give an EAGAIN error. > This was discovered during stress tests by https://github.com/stephanie-wang/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3070) [Release] Host binary artifacts for RCs and releases on ASF Bintray account instead of dist/mirror system
[ https://issues.apache.org/jira/browse/ARROW-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3070: -- Component/s: Developer Tools > [Release] Host binary artifacts for RCs and releases on ASF Bintray account > instead of dist/mirror system > - > > Key: ARROW-3070 > URL: https://issues.apache.org/jira/browse/ARROW-3070 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Wes McKinney >Assignee: Sutou Kouhei >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Since the artifacts are large this is a better place for them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2953) [Plasma] Store memory usage
[ https://issues.apache.org/jira/browse/ARROW-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2953: -- Component/s: C++ - Plasma > [Plasma] Store memory usage > --- > > Key: ARROW-2953 > URL: https://issues.apache.org/jira/browse/ARROW-2953 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > While doing some memory profiling on the store, it became clear that at the > moment the metadata of the objects takes up much more space than it should. > In particular, for each object: > * The object id (20 bytes) is stored three times > * The object checksum (8 bytes) is stored twice > We can therefore significantly reduce the metadata overhead with some > refactoring. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3467) Building against external double conversion is broken
[ https://issues.apache.org/jira/browse/ARROW-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3467: -- Component/s: C++ > Building against external double conversion is broken > - > > Key: ARROW-3467 > URL: https://issues.apache.org/jira/browse/ARROW-3467 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.11.0 >Reporter: Dmitry Kalinkin >Assignee: Dmitry Kalinkin >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 50m > Remaining Estimate: 0h > > double-conversion 3.1.1 defines double-conversion::double-conversion target > instead of double-conversion [1]. So the build fails with: > {noformat} > CMake Error at cmake_modules/BuildUtils.cmake:98 (message): > No static or shared library provided for double-conversion > Call Stack (most recent call first): > cmake_modules/ThirdpartyToolchain.cmake:476 (ADD_THIRDPARTY_LIB) > CMakeLists.txt:386 (include) > {noformat} > [1] > https://github.com/google/double-conversion/commit/e13e72e17692f5dc0036460d734c637b563f3ac7#diff-af3b638bc2a3e6c650974192a53c7291R57 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3551) Change MapD to OmniSci on Powered By page
[ https://issues.apache.org/jira/browse/ARROW-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3551: -- Component/s: Website > Change MapD to OmniSci on Powered By page > - > > Key: ARROW-3551 > URL: https://issues.apache.org/jira/browse/ARROW-3551 > Project: Apache Arrow > Issue Type: Improvement > Components: Website >Reporter: Todd Mostak >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > > MapD recently changed its name to OmniSci. We should update the Powered By > page to reflect this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3504) [Plasma] Add support for Plasma Client to put/get raw bytes without pyarrow serialization.
[ https://issues.apache.org/jira/browse/ARROW-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3504: -- Component/s: C++ - Plasma > [Plasma] Add support for Plasma Client to put/get raw bytes without pyarrow > serialization. > -- > > Key: ARROW-3504 > URL: https://issues.apache.org/jira/browse/ARROW-3504 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Plasma >Reporter: Yuhong Guo >Assignee: Yuhong Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is a feature enables Java Client to read data that python client puts . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3527) [R] Unused variables in R-package C++ code
[ https://issues.apache.org/jira/browse/ARROW-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3527: -- Component/s: R > [R] Unused variables in R-package C++ code > -- > > Key: ARROW-3527 > URL: https://issues.apache.org/jira/browse/ARROW-3527 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: James Lamb >Assignee: James Lamb >Priority: Trivial > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Tonight I noticed a few "unused variable" compiler warnings tonight while > building the arrow R package. > > {code:java} > DataType.cpp:118:7: warning: unused variable 'n' [-Wunused-variable] > int n = x.size(); > RecordBatch.cpp:132:7: warning: unused variable 'nc' [-Wunused-variable] > int nc = tbl.size(); > {code} > Creating this issue to accompany the PR I'll submit to propose removing these > calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3489) [Gandiva] Support for in expressions
[ https://issues.apache.org/jira/browse/ARROW-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3489: -- Component/s: C++ - Gandiva > [Gandiva] Support for in expressions > > > Key: ARROW-3489 > URL: https://issues.apache.org/jira/browse/ARROW-3489 > Project: Apache Arrow > Issue Type: Task > Components: C++ - Gandiva >Reporter: Praveen Kumar Desabandu >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: gandiva, pull-request-available > Fix For: 0.12.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > Add support for in-expressions to gandiva. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3528) [R] Typo in R documentation
[ https://issues.apache.org/jira/browse/ARROW-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3528: -- Component/s: R > [R] Typo in R documentation > --- > > Key: ARROW-3528 > URL: https://issues.apache.org/jira/browse/ARROW-3528 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: James Lamb >Assignee: James Lamb >Priority: Trivial > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 40m > Remaining Estimate: 0h > > There is a typo in the R-package documentation. > > *"ordred" --> "ordered"* > > Just creating the story here to accompany a pending PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3515) Introduce NumericTensor class
[ https://issues.apache.org/jira/browse/ARROW-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3515: -- Component/s: C++ > Introduce NumericTensor class > - > > Key: ARROW-3515 > URL: https://issues.apache.org/jira/browse/ARROW-3515 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Kenta Murata >Assignee: Kenta Murata >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > [https://github.com/apache/arrow/pull/2759] > This commit defines the new NumericTensor class as a subclass of Tensor > class. NumericTensor extends Tensor class by adding a member function to > access element values in a tensor. > I want to use this new feature for writing tests of SparseTensor in > [#2546|https://github.com/apache/arrow/pull/2546]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva
[ https://issues.apache.org/jira/browse/ARROW-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3746: -- Component/s: Python C++ - Gandiva > [Gandiva] [Python] Make it possible to list all functions registered with > Gandiva > - > > Key: ARROW-3746 > URL: https://issues.apache.org/jira/browse/ARROW-3746 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Gandiva, Python >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > This will also be useful for documentation purposes (right now it is not very > easy to get a list of all the functions that are registered). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2835) [C++] ReadAt/WriteAt are inconsistent with moving the files position
[ https://issues.apache.org/jira/browse/ARROW-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-2835: - Assignee: Antoine Pitrou > [C++] ReadAt/WriteAt are inconsistent with moving the files position > > > Key: ARROW-2835 > URL: https://issues.apache.org/jira/browse/ARROW-2835 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Dimitri Vorona >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Right now, there is inconsistent behaviour regarding moving the files > position pointer after calling ReadAt or WriteAt. For example, the default > implementation of ReadAt seeks to the desired offset and calls Read which > moves the position pointer. MemoryMappedFile::ReadAt, however, doesn't change > the position. WriteableFile::WriteAt seem to move the position in the current > implementation, but there is no docstring which prescribes this behaviour. > Antoine suggested that *At methods shouldn't touch the position and it makes > more sense, IMHO. The change isn't huge and doesn't seem to break anything > internally, but it might break the existing user code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2835) [C++] ReadAt/WriteAt are inconsistent with moving the files position
[ https://issues.apache.org/jira/browse/ARROW-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-2835. --- Resolution: Fixed Issue resolved by pull request 4417 [https://github.com/apache/arrow/pull/4417] > [C++] ReadAt/WriteAt are inconsistent with moving the files position > > > Key: ARROW-2835 > URL: https://issues.apache.org/jira/browse/ARROW-2835 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Dimitri Vorona >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Right now, there is inconsistent behaviour regarding moving the files > position pointer after calling ReadAt or WriteAt. For example, the default > implementation of ReadAt seeks to the desired offset and calls Read which > moves the position pointer. MemoryMappedFile::ReadAt, however, doesn't change > the position. WriteableFile::WriteAt seem to move the position in the current > implementation, but there is no docstring which prescribes this behaviour. > Antoine suggested that *At methods shouldn't touch the position and it makes > more sense, IMHO. The change isn't huge and doesn't seem to break anything > internally, but it might break the existing user code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3662) [C++] Add a const overload to MemoryMappedFile::GetSize
[ https://issues.apache.org/jira/browse/ARROW-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3662: -- Component/s: C++ > [C++] Add a const overload to MemoryMappedFile::GetSize > --- > > Key: ARROW-3662 > URL: https://issues.apache.org/jira/browse/ARROW-3662 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Affects Versions: 0.11.1 >Reporter: Dimitri Vorona >Assignee: Dimitri Vorona >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > > While GetSize in general is not a const function, it can be on a > MemoryMappedFile. I propose to add a const override directly to the > MemoryMappedFile. > Alternatively we could add a const version on the RandomAccessFile level > which would fail, if a const size getting (e.g. without a seek) isn't > possible, but it seems to me to be a potential source of hard-to-debug bugs > and spurious failures. At would at least require a careful analysis of the > platform support of different size getting options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3576) [Python] Expose compressed file readers as NativeFile
[ https://issues.apache.org/jira/browse/ARROW-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3576: -- Component/s: Python > [Python] Expose compressed file readers as NativeFile > - > > Key: ARROW-3576 > URL: https://issues.apache.org/jira/browse/ARROW-3576 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3664) [Rust] Add benchmark for PrimitiveArrayBuilder
[ https://issues.apache.org/jira/browse/ARROW-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3664: -- Component/s: Rust > [Rust] Add benchmark for PrimitiveArrayBuilder > -- > > Key: ARROW-3664 > URL: https://issues.apache.org/jira/browse/ARROW-3664 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h > Remaining Estimate: 0h > > We should add a benchmark for the {{PrimitiveArrayBuilder}} to measure and > track its performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3569) [Packaging] Run pyarrow unittests when building conda package
[ https://issues.apache.org/jira/browse/ARROW-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3569: -- Component/s: Packaging > [Packaging] Run pyarrow unittests when building conda package > - > > Key: ARROW-3569 > URL: https://issues.apache.org/jira/browse/ARROW-3569 > Project: Apache Arrow > Issue Type: Sub-task > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3667) [JS] Incorrectly reads record batches with an all null column
[ https://issues.apache.org/jira/browse/ARROW-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3667: -- Component/s: JavaScript > [JS] Incorrectly reads record batches with an all null column > - > > Key: ARROW-3667 > URL: https://issues.apache.org/jira/browse/ARROW-3667 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: JS-0.3.1 >Reporter: Brian Hulette >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.1 > > > The JS library seems to incorrectly read any columns that come after an > all-null column in IPC buffers produced by pyarrow. > Here's a python script that generates two arrow buffers, one with an all-null > column followed by a utf-8 column, and a second with those two reversed > {code:python} > import pyarrow as pa > import pandas as pd > def serialize_to_arrow(df, fd, compress=True): > batch = pa.RecordBatch.from_pandas(df) > writer = pa.RecordBatchFileWriter(fd, batch.schema) > writer.write_batch(batch) > writer.close() > if __name__ == "__main__": > df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', > 'def', 'ghi']}, columns=['nulls', 'not nulls']) > with open('bad.arrow', 'wb') as fd: > serialize_to_arrow(df, fd) > df = pd.DataFrame(df, columns=['not nulls', 'nulls']) > with open('good.arrow', 'wb') as fd: > serialize_to_arrow(df, fd) > {code} > JS incorrectly interprets the [null, not null] case: > {code:javascript} > > var arrow = require('apache-arrow') > undefined > > var fs = require('fs') > undefined > > arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not > > nulls').get(0) > 'abc' > > arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0) > '\u\u\u\u\u0003\u\u\u\u0006\u\u\u\t\u\u\u' > {code} > Presumably this is because pyarrow is omitting some (or all) of the buffers > associated with the all-null column, but the JS IPC reader is still looking > for them, causing the buffer count to get out of sync. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3555) [Plasma] Unify plasma client get function using metadata.
[ https://issues.apache.org/jira/browse/ARROW-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3555: -- Component/s: C++ - Plasma > [Plasma] Unify plasma client get function using metadata. > - > > Key: ARROW-3555 > URL: https://issues.apache.org/jira/browse/ARROW-3555 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Plasma >Reporter: Yuhong Guo >Assignee: Yuhong Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Sometimes, it is very hard for the data consumer to know whether an object is > a buffer or other objects. If we use try-catch to catch the pyarrow > deserialization exception and then using `plasma_client.get_buffer`, the code > is not clean. > We may leverage the metadata which is not used at all to mark the buffer > data. In the client of other language, this would be simple to implement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3586) [Python] Segmentation fault when converting empty table to pandas with categoricals
[ https://issues.apache.org/jira/browse/ARROW-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3586: -- Component/s: Python > [Python] Segmentation fault when converting empty table to pandas with > categoricals > --- > > Key: ARROW-3586 > URL: https://issues.apache.org/jira/browse/ARROW-3586 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.10.0, 0.11.0 > Environment: - Ubuntu 16.04, Python 2.7.12, pyarrow 0.11.0, pandas > 0.23.4 > - Debian9, Python 2.7.13, pyarrow 0.10.0, pandas 0.23.4 >Reporter: Andreas >Assignee: Francois Saint-Jacques >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 40m > Remaining Estimate: 0h > > {code:java} > import pyarrow as pa > table = pa.Table.from_arrays(arrays=[pa.array([], type=pa.int32())], > names=['col']) > table.to_pandas(categories=['col']){code} > This produces a segmentation fault for certain types (e.g, int\{32,64}) while > it works for others (e.g. string, binary). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3566) Clarify that the type of dictionary encoded field should be the encoded(index) type
[ https://issues.apache.org/jira/browse/ARROW-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3566: -- Component/s: Format > Clarify that the type of dictionary encoded field should be the > encoded(index) type > --- > > Key: ARROW-3566 > URL: https://issues.apache.org/jira/browse/ARROW-3566 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Li Jin >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3721) [Gandiva] [Python] Support all Gandiva literals
[ https://issues.apache.org/jira/browse/ARROW-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3721: -- Component/s: C++ - Gandiva > [Gandiva] [Python] Support all Gandiva literals > --- > > Key: ARROW-3721 > URL: https://issues.apache.org/jira/browse/ARROW-3721 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Gandiva >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Support all the literals from > [https://github.com/apache/arrow/blob/5b116ab175292fe70ed3c8727bcc6868b9695f4a/cpp/src/gandiva/tree_expr_builder.h#L35] > in the Cython bindings. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3797) [Rust] BinaryArray::value_offset incorrect in offset case
[ https://issues.apache.org/jira/browse/ARROW-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3797: -- Component/s: Rust > [Rust] BinaryArray::value_offset incorrect in offset case > - > > Key: ARROW-3797 > URL: https://issues.apache.org/jira/browse/ARROW-3797 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Brent Kerby >Assignee: Brent Kerby >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Original Estimate: 5m > Time Spent: 1h > Remaining Estimate: 0h > > The method BinaryArray::value_offset does not take into account the offset in > the underlying ArrayData; hence it gives incorrect results when the ArrayData > offset is not zero. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3859) [Java] Fix ComplexWriter backward incompatible change
[ https://issues.apache.org/jira/browse/ARROW-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3859: -- Component/s: Java > [Java] Fix ComplexWriter backward incompatible change > - > > Key: ARROW-3859 > URL: https://issues.apache.org/jira/browse/ARROW-3859 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Praveen Kumar Desabandu >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 50m > Remaining Estimate: 0h > > This commit > [https://github.com/apache/arrow/commit/a56c009257a71979d5ed0b021197c7a9d5ed5021] > changed the default behavior for some of the methods to be non-backward > compatible. > Will raise the PR to revert it to previous behavior while adhering to check > style guidelines. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3860) [Gandiva] [C++] Add option to use -static-libstdc++ when building libgandiva_jni.so
[ https://issues.apache.org/jira/browse/ARROW-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3860: -- Component/s: C++ - Gandiva > [Gandiva] [C++] Add option to use -static-libstdc++ when building > libgandiva_jni.so > --- > > Key: ARROW-3860 > URL: https://issues.apache.org/jira/browse/ARROW-3860 > Project: Apache Arrow > Issue Type: Task > Components: C++ - Gandiva >Reporter: Praveen Kumar Desabandu >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > This > [commit|https://github.com/apache/arrow/commit/ba2b2ea2301f067cc95306e11546ddb6d402a55c#diff-d5e5df5984ba660e999a7c657039f6af] > broke gandiva packaging by removing static linking of std c++, since dremio > consumes a fat jar that includes packaged gandiva native libraries we would > need to statically link std c++ > As suggested in the commit message will re-introduce it as a CMake Flag. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3891) [Java] Remove Long.bitCount with simple bitmap operations
[ https://issues.apache.org/jira/browse/ARROW-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3891: -- Component/s: Java > [Java] Remove Long.bitCount with simple bitmap operations > - > > Key: ARROW-3891 > URL: https://issues.apache.org/jira/browse/ARROW-3891 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java >Reporter: Animesh Trivedi >Assignee: Animesh Trivedi >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 50m > Remaining Estimate: 0h > > the `public int isSet(int index)` routine checks if the bit is set by calling > Long.bitCount function. This is unnecessary and creates performance > degradation. This can simply be replaced by bit shift and bitwise & > operation. > `return Long.bitCount(b & (1L << bitIndex));` > to > `return (b >> bitIndex) & 0x01;` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3878) [Rust] Improve primitive types
[ https://issues.apache.org/jira/browse/ARROW-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3878: -- Component/s: Rust > [Rust] Improve primitive types > --- > > Key: ARROW-3878 > URL: https://issues.apache.org/jira/browse/ARROW-3878 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently we simply uses Rust's native types as primitive types, and relies > on macros such as > [this|https://github.com/apache/arrow/blob/master/rust/src/array.rs#L298] to > link the Arrow data type with the native type. A better approach may be to > define richer primitive types which contain both the Arrow type and the Rust > native type, as well as other information such as type's bit width, > precision, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3936) Add _O_NOINHERIT to the file open flags on Windows
[ https://issues.apache.org/jira/browse/ARROW-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3936: -- Component/s: C++ > Add _O_NOINHERIT to the file open flags on Windows > -- > > Key: ARROW-3936 > URL: https://issues.apache.org/jira/browse/ARROW-3936 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Philip Felton >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Unlike Linux, Windows doesn't let you delete files that are currently opened > by another process. So if you create a child process while a Parquet file is > open, with the current code the file handle is inherited to the child > process, and the parent process can't then delete the file after closing it > without the child process terminating first. > By default, Win32 file handles are not inheritable (likely because of the > aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX > compatibility. > This is a serious problem for us. > We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path > is a sensible approach and would likely be the correct behaviour as it > matches the main Win32 API. > However, it could be that some developers rely on the current inheritable > behaviour. In which case, the Arrow public API should take a boolean argument > on whether the created file descriptor should be inheritable. But this would > break API backward compatibility (unless a new overloaded method is > introduced). > Is forking and inheriting Arrow internal file descriptor something that Arrow > actually means to support? > See [https://github.com/apache/arrow/pull/3085.] What do we think of the > proposed fix? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3934) [Gandiva] Don't compile precompiled tests if ARROW_GANDIVA_BUILD_TESTS=off
[ https://issues.apache.org/jira/browse/ARROW-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3934: -- Component/s: C++ - Gandiva > [Gandiva] Don't compile precompiled tests if ARROW_GANDIVA_BUILD_TESTS=off > -- > > Key: ARROW-3934 > URL: https://issues.apache.org/jira/browse/ARROW-3934 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Gandiva >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently the precompiled tests are compiled in any case, even if > ARROW_GANDIVA_BUILD_TESTS=off. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3950) [Plasma] Don't force loading the TensorFlow op on import
[ https://issues.apache.org/jira/browse/ARROW-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3950: -- Component/s: Python > [Plasma] Don't force loading the TensorFlow op on import > > > Key: ARROW-3950 > URL: https://issues.apache.org/jira/browse/ARROW-3950 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > In certain situation, users want more control over when the TensorFlow op is > loaded, so we should make it optional (even if it exists). This happens in > Ray for example, where we need to make sure that if multiple python workers > try to compile and import the TensorFlow op in parallel, there is no race > condition (e.g. one worker could try to import a half-built version of the > op). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3970) [Gandiva][C++] Remove unnecessary boost dependencies
[ https://issues.apache.org/jira/browse/ARROW-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3970: -- Component/s: C++ - Gandiva > [Gandiva][C++] Remove unnecessary boost dependencies > > > Key: ARROW-3970 > URL: https://issues.apache.org/jira/browse/ARROW-3970 > Project: Apache Arrow > Issue Type: Task > Components: C++ - Gandiva >Affects Versions: 0.12.0 >Reporter: Praveen Kumar Desabandu >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Remove unnecessary dynamic dependencies on Boost since we are anyway using > the static versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3983) [Gandiva][Crossbow] Use static boost while packaging
[ https://issues.apache.org/jira/browse/ARROW-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3983: -- Component/s: C++ - Gandiva > [Gandiva][Crossbow] Use static boost while packaging > > > Key: ARROW-3983 > URL: https://issues.apache.org/jira/browse/ARROW-3983 > Project: Apache Arrow > Issue Type: Task > Components: C++ - Gandiva >Reporter: Praveen Kumar Desabandu >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Gandiva is getting some transitive dependencies to Boost from Arrow. Since we > are using the static version of arrow in the packaged gandiva library, it was > thought that we would be using the static versions of boost. > This holds true in linux where there is no dependency on shared arrow > library, but in mac there seems to be a dependency on shared boost libraries > even for the static arrow library. > So using "ARROW_BOOST_USE_SHARED" to force use the boost static libraries > while packaging Gandiva in Crossbow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4006) Add CODE_OF_CONDUCT.md
[ https://issues.apache.org/jira/browse/ARROW-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4006: -- Component/s: Documentation > Add CODE_OF_CONDUCT.md > -- > > Key: ARROW-4006 > URL: https://issues.apache.org/jira/browse/ARROW-4006 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The Apache Software Foundation has a code of conduct that applies to its > projects > https://www.apache.org/foundation/policies/conduct.html > We should add a document to the root of the git repository to direct > interested individuals to the CoC. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4114) [C++][DOCUMENTATION] Add "python" to Linux build instructions
[ https://issues.apache.org/jira/browse/ARROW-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4114: -- Component/s: Documentation C++ > [C++][DOCUMENTATION] Add "python" to Linux build instructions > - > > Key: ARROW-4114 > URL: https://issues.apache.org/jira/browse/ARROW-4114 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Documentation >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Trivial > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > > make unittest step in the C++ README.md do not work on fresh ubuntu image > without python installed. > {{Error message from the ctest --output-on-failure indicates it is trying to > find python:}} > {{ > }}{{Running arrow-allocator-test, redirecting output into > /home/micahk/arrow/cpp/debug/build/test-logs/arrow-allocator-test.txt > (attempt 1/1)}}{{/usr/bin/env: ‘python’: No such file or directory}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4102) [C++] FixedSizeBinary identity cast not implemented
[ https://issues.apache.org/jira/browse/ARROW-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4102: -- Component/s: C++ > [C++] FixedSizeBinary identity cast not implemented > --- > > Key: ARROW-4102 > URL: https://issues.apache.org/jira/browse/ARROW-4102 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Francois Saint-Jacques >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4100) [Gandiva][C++] Fix regex to ignore "." character
[ https://issues.apache.org/jira/browse/ARROW-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4100: -- Component/s: C++ - Gandiva > [Gandiva][C++] Fix regex to ignore "." character > > > Key: ARROW-4100 > URL: https://issues.apache.org/jira/browse/ARROW-4100 > Project: Apache Arrow > Issue Type: Task > Components: C++ - Gandiva >Reporter: Praveen Kumar Desabandu >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4043) [Packaging/Docker] Python tests on alpine miss pytest dependency
[ https://issues.apache.org/jira/browse/ARROW-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4043: -- Component/s: Packaging > [Packaging/Docker] Python tests on alpine miss pytest dependency > > > Key: ARROW-4043 > URL: https://issues.apache.org/jira/browse/ARROW-4043 > Project: Apache Arrow > Issue Type: Task > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > {code:java} > Using /usr/lib/python2.7/site-packages > Searching for numpy==1.15.4 > Best match: numpy 1.15.4 > Adding numpy 1.15.4 to easy-install.pth file > Using /usr/lib/python2.7/site-packages > Finished processing dependencies for pyarrow==0.11.1.dev385+g9c8ddae1 > / > /bin/sh: pytest: not found > The command "docker-compose run python-alpine" exited with 127.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4130) [Go] offset not used when accessing binary array
[ https://issues.apache.org/jira/browse/ARROW-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4130: -- Component/s: Go > [Go] offset not used when accessing binary array > > > Key: ARROW-4130 > URL: https://issues.apache.org/jira/browse/ARROW-4130 > Project: Apache Arrow > Issue Type: Bug > Components: Go >Reporter: Joshua Lapacik >Assignee: Joshua Lapacik >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > When accessing a binary array, the offset of the underlying data buffer is > not used. This affects the behavior of slicing. See > [https://github.com/apache/arrow/issues/3270] . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4266) [Python][CI] Disable ORC tests in dask integration test
[ https://issues.apache.org/jira/browse/ARROW-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4266: -- Component/s: Python Continuous Integration > [Python][CI] Disable ORC tests in dask integration test > --- > > Key: ARROW-4266 > URL: https://issues.apache.org/jira/browse/ARROW-4266 > Project: Apache Arrow > Issue Type: Task > Components: Continuous Integration, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > > https://issues.apache.org/jira/browse/ARROW-3910 changed the default value of > to_pandas: to_pandas(date_as_object=True) which breaks dask's ORC tests > [https://github.com/dask/dask/blob/e48aca49af9005c938ff4773aa05ca8b20e2e1b1/dask/dataframe/io/orc.py#L19] > > cc [~mrocklin] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4269) [Python] AttributeError: module 'pandas.core' has no attribute 'arrays'
[ https://issues.apache.org/jira/browse/ARROW-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4269: -- Component/s: Python > [Python] AttributeError: module 'pandas.core' has no attribute 'arrays' > --- > > Key: ARROW-4269 > URL: https://issues.apache.org/jira/browse/ARROW-4269 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This happens with pandas 0.22: > ``` > In [1]: import pyarrow > --- > AttributeError Traceback (most recent call last) > in () > > 1 import pyarrow > ~/arrow/python/pyarrow/__init__.py in () > 174 localfs = LocalFileSystem.get_instance() > 175 > --> 176 from pyarrow.serialization import (default_serialization_context, > 177 register_default_serialization_handlers, > 178 register_torch_serialization_handlers) > ~/arrow/python/pyarrow/serialization.py in () > 303 > 304 > --> 305 > register_default_serialization_handlers(_default_serialization_context) > ~/arrow/python/pyarrow/serialization.py in > register_default_serialization_handlers(serialization_context) > 294 custom_deserializer=_deserialize_pyarrow_table) > 295 > --> 296 _register_custom_pandas_handlers(serialization_context) > 297 > 298 > ~/arrow/python/pyarrow/serialization.py in > _register_custom_pandas_handlers(context) > 175 custom_deserializer=_load_pickle_from_buffer) > 176 > --> 177 if hasattr(pd.core.arrays, 'interval'): > 178 context.register_type( > 179 pd.core.arrays.interval.IntervalArray, > AttributeError: module 'pandas.core' has no attribute 'arrays' > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4197) [C++] Emscripten compiler fails building Arrow
[ https://issues.apache.org/jira/browse/ARROW-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4197: -- Component/s: C++ > [C++] Emscripten compiler fails building Arrow > -- > > Key: ARROW-4197 > URL: https://issues.apache.org/jira/browse/ARROW-4197 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Environment: OS X >Reporter: Timothy Paine >Assignee: Timothy Paine >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 2h > Remaining Estimate: 0h > > The emscripten compiler ([https://kripken.github.io/emscripten-site/)] fails > when compiling arrow with a few relatively minor issues: > > * there is no -ggdb flag for debug support, only -g > * there is no execinfo.h, so even if Backtrace is found it cannot be used > * when using the emscripten compiler, even on mac, you cannot pass the > -undefined dynamic_lookup argument -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4209) [Gandiva] returning IR structs causes issues with windows
[ https://issues.apache.org/jira/browse/ARROW-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4209: -- Component/s: C++ - Gandiva > [Gandiva] returning IR structs causes issues with windows > - > > Key: ARROW-4209 > URL: https://issues.apache.org/jira/browse/ARROW-4209 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva >Reporter: Pindikura Ravindra >Assignee: Pindikura Ravindra >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The decimal add fn return a struct (of high/low values). This is known to be > fragile, due to abi compatibility issues. so, fixing this to switch to > primitive types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4237) [Packaging] Fix CMAKE_INSTALL_LIBDIR in release verification script
[ https://issues.apache.org/jira/browse/ARROW-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4237: -- Component/s: Packaging > [Packaging] Fix CMAKE_INSTALL_LIBDIR in release verification script > --- > > Key: ARROW-4237 > URL: https://issues.apache.org/jira/browse/ARROW-4237 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Set to > {{-DCMAKE_INSTALL_LIBDIR=lib}} > instead of > {{-DCMAKE_INSTALL_LIBDIR=$ARROW_HOME/lib}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4156) [C++] xcodebuild failure for cmake generated project
[ https://issues.apache.org/jira/browse/ARROW-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4156: -- Component/s: C++ > [C++] xcodebuild failure for cmake generated project > > > Key: ARROW-4156 > URL: https://issues.apache.org/jira/browse/ARROW-4156 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Hatem Helal >Assignee: Uwe L. Korn >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Attachments: cmakeoutput.txt, xcodebuildOutput.txt > > Time Spent: 40m > Remaining Estimate: 0h > > Using the cmake xcode project generator fails to build using xcodebuild as > follows: > {code:java} > $ cmake .. -G Xcode -DARROW_PARQUET=ON -DPARQUET_BUILD_EXECUTABLES=ON > -DPARQUET_BUILD_EXAMPLES=ON > -DFLATBUFFERS_HOME=/usr/local/Cellar/flatbuffers/1.10.0 > -DCMAKE_BUILD_TYPE=Debug -DTHRIFT_HOME=/usr/local/Cellar/thrift/0.11.0 > -DARROW_EXTRA_ERROR_CONTEXT=ON -DARROW_BUILD_TESTS=ON > -DClangTools_PATH=/usr/local/Cellar/llvm@6/6.0.1_1 > > Libtool > xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a > normal x86_64 > cd /Users/hhelal/Documents/code/arrow/cpp > export MACOSX_DEPLOYMENT_TARGET=10.14 > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool > -static -arch_only x86_64 -syslibroot > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk > > -L/Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal > -filelist > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/x86_64/arrow_objlib.LinkFileList > -o > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Objects-normal/libarrow_objlib.a > PhaseScriptExecution CMake\ PostBuild\ Rules > xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh > cd /Users/hhelal/Documents/code/arrow/cpp > /bin/sh -c > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_objlib.build/Script-2604120B03B14AB58C2E586A.sh > echo "Depend check for xcode" > Depend check for xcode > cd /Users/hhelal/Documents/code/arrow/cpp/xcode-build && make -C > /Users/hhelal/Documents/code/arrow/cpp/xcode-build -f > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/CMakeScripts/XCODE_DEPEND_HELPER.make > PostBuild.arrow_objlib.Debug > /bin/rm -f > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib > /bin/rm -f > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.a > === BUILD TARGET arrow_shared OF PROJECT arrow WITH THE DEFAULT CONFIGURATION > (Debug) === > Check dependencies > Write auxiliary files > write-file > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh > chmod 0755 > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh > PhaseScriptExecution CMake\ PostBuild\ Rules > xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh > cd /Users/hhelal/Documents/code/arrow/cpp > /bin/sh -c > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh > echo "Creating symlinks" > Creating symlinks > /usr/local/Cellar/cmake/3.12.4/bin/cmake -E cmake_symlink_library > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.0.0.dylib > > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.12.dylib > /Users/hhelal/Documents/code/arrow/cpp/xcode-build/debug/Debug/libarrow.dylib > CMake Error: cmake_symlink_library: System Error: No such file or directory > CMake Error: cmake_symlink_library: System Error: No such file or directory > make: *** [arrow_shared_buildpart_0] Error 1 > ** BUILD FAILED ** > The following build commands failed: > PhaseScriptExecution CMake\ PostBuild\ Rules > xcode-build/src/arrow/arrow.build/Debug/arrow_shared.build/Script-9AFD4DDD88034C5F965570DF.sh > (1 failure) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4134) [Packaging] Properly setup timezone in docker tests to prevent ORC adapter's abort
[ https://issues.apache.org/jira/browse/ARROW-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4134: -- Component/s: Packaging > [Packaging] Properly setup timezone in docker tests to prevent ORC adapter's > abort > -- > > Key: ARROW-4134 > URL: https://issues.apache.org/jira/browse/ARROW-4134 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4569) [Gandiva] validate that the precision/scale are within bounds
[ https://issues.apache.org/jira/browse/ARROW-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4569: -- Component/s: C++ - Gandiva > [Gandiva] validate that the precision/scale are within bounds > - > > Key: ARROW-4569 > URL: https://issues.apache.org/jira/browse/ARROW-4569 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva >Reporter: Pindikura Ravindra >Assignee: Pindikura Ravindra >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4348) encountered error when building parquet
[ https://issues.apache.org/jira/browse/ARROW-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4348: -- Component/s: C++ > encountered error when building parquet > --- > > Key: ARROW-4348 > URL: https://issues.apache.org/jira/browse/ARROW-4348 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: lei yu >Priority: Major > > I am trying to build c++ libraries on Centos 7.5. parquet only. I followed > the instruction on github and did as below > > {code:java} > git clone https://github.com/apache/arrow.git > cd arrow/cpp > mkdir debug > cd debug > cmake .. -DARROW_PARQUET=ON -DARROW_OPTIONAL_INSTALL=ON > make parquet > {code} > > I don't have Third party libraries installed on my box, so it tries to > download thirdparties in the building process but I got error after it says > that thrift has been downloaded and installed. > {code:java} > No rule to make target thrift_ep/src/thrift_ep-install/lib/libthriftd.a', > needed bysrc/parquet/parquet_types.cpp'. Stop{code} > before the error, it says > {code:java} > [ 7%] Performing configure step for 'thrift_ep' > -- thrift_ep configure command succeeded. See also > /home/ylei/development/third_party/parquet/arrow/arrow/cpp/debug/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure-.log > [ 8%] Performing build step for 'thrift_ep' > -- thrift_ep build command succeeded. See also > /home/ylei/development/third_party/parquet/arrow/arrow/cpp/debug/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build-.log > [ 9%] Performing install step for 'thrift_ep' > -- thrift_ep install command succeeded. See also > /home/ylei/development/third_party/parquet/arrow/arrow/cpp/debug/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-install-*.log > [ 10%] Completed 'thrift_ep' > [ 10%] Built target thrift_ep > {code} > I had to build thrift separately and then I can build parquet sucessfully -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4663) [Packaging] Conda-forge build misses gflags on linux
[ https://issues.apache.org/jira/browse/ARROW-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4663: -- Component/s: Packaging > [Packaging] Conda-forge build misses gflags on linux > > > Key: ARROW-4663 > URL: https://issues.apache.org/jira/browse/ARROW-4663 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Uwe L. Korn >Priority: Major > Labels: ci-failure, pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > See build: https://travis-ci.org/kszucs/crossbow/builds/496958426 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-263) Design an initial IPC mechanism for Arrow Vectors
[ https://issues.apache.org/jira/browse/ARROW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854525#comment-16854525 ] Antoine Pitrou commented on ARROW-263: -- Should this be kept open? It looks essentially like a brain dump and no discussion took place for the last 3 years. > Design an initial IPC mechanism for Arrow Vectors > - > > Key: ARROW-263 > URL: https://issues.apache.org/jira/browse/ARROW-263 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Micah Kornfield >Priority: Major > > Prior discussion on this topic [1]. > Use-cases: > 1. User defined function (UDF) execution: One process wants to execute a > user defined function written in another language (e.g. Java executing a > function defined in python, this involves creating Arrow Arrays in java, > sending them to python and receiving a new set of Arrow Arrays produced in > python back in the java process). > 2. If a storage system and a query engine are running on the same host we > might want use IPC instead of RPC (e.g. Apache Drill querying Apache Kudu) > Assumptions: > 1. IPC mechanism should be useable from the core set of supported languages > (Java, Python, C) on POSIX and ideally windows systems. Ideally, we would > not need to add dependencies on additional libraries outside of each > languages outside of this document. > We want leverage shared memory for Arrays to avoid doubling RAM requirements > by duplicating the same Array in different memory locations. > 2. Under some circumstances shared memory might be more efficient than FIFOs > or sockets (in other scenarios they won’t see thread below). > 3. Security is not a concern for V1, we assume all processes running are > “trusted”. > Requirements: > 1.Resource management: > a. Both processes need a way of allocating memory for Arrow Arrays so > that data can be passed from one process to another. > b. There must be a mechanism to cleanup unused Arrow Arrays to limit > resource usage but avoid race conditions when processing arrays > 2. Schema negotiation - before sending data, both processes need to agree on > schema each one will produce. > Out of scope requirements: > 1. IPC channel metadata discovery is out of scope of this document. > Discovery can be provided by passing appropriate command line arguments, > configuration files or other mechanisms like RPC (in which case RPC channel > discovery is still an issue). > [1] > http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c8d5f7e3237b3ed47b84cf187bb17b666148e7...@shsmsx103.ccr.corp.intel.com%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5316) [Rust] Interfaces for gandiva bindings.
[ https://issues.apache.org/jira/browse/ARROW-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5316: -- Component/s: Rust C++ - Gandiva > [Rust] Interfaces for gandiva bindings. > --- > > Key: ARROW-5316 > URL: https://issues.apache.org/jira/browse/ARROW-5316 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ - Gandiva, Rust >Reporter: Renjie Liu >Assignee: Renjie Liu >Priority: Major > > Create interfaces to demonstrate high level design and ideas. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5315) [Rust] Gandiva binding.
[ https://issues.apache.org/jira/browse/ARROW-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5315: -- Component/s: Rust C++ - Gandiva > [Rust] Gandiva binding. > --- > > Key: ARROW-5315 > URL: https://issues.apache.org/jira/browse/ARROW-5315 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Gandiva, Rust >Reporter: Renjie Liu >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Add gandiva binding for rust. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5224) [Java] Add APIs for supporting directly serialize/deserialize ValueVector
[ https://issues.apache.org/jira/browse/ARROW-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5224: -- Component/s: Java > [Java] Add APIs for supporting directly serialize/deserialize ValueVector > - > > Key: ARROW-5224 > URL: https://issues.apache.org/jira/browse/ARROW-5224 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > There is no API to directly serialize/deserialize ValueVector. The only way > to implement this is to put a single FieldVector in VectorSchemaRoot and > convert it to ArrowRecordBatch, and the deserialize process is as well. > Provide a utility class to implement this may be better, I know all > serializations should follow IPC format so that data can be shared between > different Arrow implementations. But for users who only use Java API and want > to do some further optimization, this seem to be no problem and we could > provide them a more option. > This may take some benefits for Java user who only use ValueVector rather > than IPC series classes such as ArrowReordBatch: > * We could do some shuffle optimization such as compression and some > encoding algorithm for numerical type which could greatly improve performance. > * Do serialize/deserialize with the actual buffer size within vector since > the buffer size is power of 2 which is actually bigger than it really need. > * Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it > user-friendly. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5259) [Java] Add option for ValueVector to allocate buffers with actual size
[ https://issues.apache.org/jira/browse/ARROW-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5259: -- Summary: [Java] Add option for ValueVector to allocate buffers with actual size (was: Add option for ValueVector to allocate buffers with actual size) > [Java] Add option for ValueVector to allocate buffers with actual size > -- > > Key: ARROW-5259 > URL: https://issues.apache.org/jira/browse/ARROW-5259 > Project: Apache Arrow > Issue Type: Wish > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > > Currently in _BaseValueVector#computeCombinedBufferSize_, it calculates the > buffer size with _valueCount_ and _typeWidth_ as inputs and then allocates > memory for dataBuffer and validityBuffer. However, it always allocate memory > greater than the actual size, because of the invoke of > _BaseAllocator.nextPowerOfTwo(bufferSize)_. > For example, IntVector will allocate buffers with size 8192 with valueCount = > 1025, memory usage is almost double what it actually is. So in some cases, > there have enough memory for actual use but throws OOM when the allocated > memory is increased to next power of 2 and I think this problem is absolutely > avoidable. > Is it feasible to add option for ValueVector to allocate actual buffer size > rather than make it next power of 2 to reduce memory allocation? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5278) [C#] ArrowBuffer should either implement IEquatable correctly or not at all
[ https://issues.apache.org/jira/browse/ARROW-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5278: -- Component/s: C# > [C#] ArrowBuffer should either implement IEquatable correctly or not at all > --- > > Key: ARROW-5278 > URL: https://issues.apache.org/jira/browse/ARROW-5278 > Project: Apache Arrow > Issue Type: Bug > Components: C# >Reporter: Eric Erhardt >Priority: Major > > See the discussion > [here|https://github.com/apache/arrow/pull/3925/#discussion_r281378027]. > ArrowBuffer currently implement IEquatable, but doesn't override > `GetHashCode`. > We should either implement IEquatable correctly by overriding Equals and > GetHashCode, or remove IEquatable all together. > Looking at ArrowBuffer's [Equals > implementation|https://github.com/apache/arrow/blob/08829248fd540b7e3bd96b980e357f8a4db7970e/csharp/src/Apache.Arrow/ArrowBuffer.cs#L66-L69], > it compares each value in the buffer, which is not very efficient. Also, > this implementation is not consistent with how `Memory` implements > IEquatable - > [https://source.dot.net/#System.Private.CoreLib/shared/System/Memory.cs,500]. > If we continue implementing IEquatable on ArrowBuffer, we should consider > implementing it in the same fashion as Memory does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5259) Add option for ValueVector to allocate buffers with actual size
[ https://issues.apache.org/jira/browse/ARROW-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5259: -- Component/s: Java > Add option for ValueVector to allocate buffers with actual size > --- > > Key: ARROW-5259 > URL: https://issues.apache.org/jira/browse/ARROW-5259 > Project: Apache Arrow > Issue Type: Wish > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > > Currently in _BaseValueVector#computeCombinedBufferSize_, it calculates the > buffer size with _valueCount_ and _typeWidth_ as inputs and then allocates > memory for dataBuffer and validityBuffer. However, it always allocate memory > greater than the actual size, because of the invoke of > _BaseAllocator.nextPowerOfTwo(bufferSize)_. > For example, IntVector will allocate buffers with size 8192 with valueCount = > 1025, memory usage is almost double what it actually is. So in some cases, > there have enough memory for actual use but throws OOM when the allocated > memory is increased to next power of 2 and I think this problem is absolutely > avoidable. > Is it feasible to add option for ValueVector to allocate actual buffer size > rather than make it next power of 2 to reduce memory allocation? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5324) [Plasma] API requests
[ https://issues.apache.org/jira/browse/ARROW-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5324: -- Summary: [Plasma] API requests (was: plasma API requests) > [Plasma] API requests > - > > Key: ARROW-5324 > URL: https://issues.apache.org/jira/browse/ARROW-5324 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Darren Weber >Priority: Minor > > Copied from [https://github.com/apache/arrow/issues/4318] (it's easier to > read there, sorry hate Jira formatting) > Related to https://issues.apache.org/jira/browse/ARROW-3444 > While working with the plasma API to create/seal an object for a table, using > a custom object-ID, it would help to have a convenience API to get the size > of the table. > The following code might help to illustrate the request and notes below: > {code:java} > if not parquet_path: > parquet_path = f"./data/dataset_{size}.parquet" > if not plasma_path: > plasma_path = f"./data/dataset_{size}.plasma" > try: > plasma_client = plasma.connect(plasma_path) > except: > plasma_client = None > if plasma_client: > table_id = plasma.ObjectID(bytes(parquet_path[:20], encoding='utf8')) > try: > table = plasma_client.get(table_id, timeout_ms=4000) > if table.__name__ == 'ObjectNotAvailable': > raise ValueError('Failed to get plasma object') > except ValueError: > table = pq.read_table(parquet_path, use_threads=True) > plasma_client.create_and_seal(table_id, table) > {code} > > The use case is a workflow something like this: > - process-A > ** generate a pandas DataFrame `df` > ** save the `df` to parquet, using pyarrow.parquet, with a unique parquet > path > ** (this process will not save directly to plasma) > - process-B > ** get the data from plasma or load it into plasma from the parquet file > ** use the unique parquet path to generate a unique object-ID > Notes: > - `plasma_client.put` for the same data-table is not idempotent, it > generates unique object-ID values that are not based on any hash of the data > payload, so every put saves a new object-ID; could it use a data hash for > idempotent puts? e.g. > - > {code:java} > In : plasma_client.put(table) > ObjectID(25fcb60959d23b6bfc739f88816da29e04d6) > In : plasma_client.put(table) > ObjectID(d2a4662999db30177b090f9fc2bf6b28687d2f8d) > In : plasma_client.put(table) > ObjectID(b2928ad786de2fdb74d374055597f6e7bd97fd61) > In : hash(table) > TypeError: unhashable type: 'pyarrow.lib.Table'{code} > - In process-B, when the data is not already in plasma, it reads data from a > parquet file into a pyarrow.Table and then needs an object-ID and the table > size to use plasma `client.create_and_seal` but it's not easy to get the > table size - this might be related to github issue #2707 (#3444) - it might > be ideal if the `client.create_and_seal` accepts responsibility for the size > of the object to be created when given a pyarrow data object like a table. > - when the plasma store does not have the object, it could have a default > timeout rather than hang indefinitely, and it's a bit clumsy to return an > object that is not easily checked with `isinstance` and it could be better to > have an exception handling pattern (or something like the requests 404 > patterns and options?) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5381) [C++] Crash at arrow::internal::CountSetBits
[ https://issues.apache.org/jira/browse/ARROW-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5381: -- Summary: [C++] Crash at arrow::internal::CountSetBits (was: Crash at arrow::internal::CountSetBits) > [C++] Crash at arrow::internal::CountSetBits > > > Key: ARROW-5381 > URL: https://issues.apache.org/jira/browse/ARROW-5381 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Environment: Operating System: Windows 7 Professional 64-bit (6.1, > Build 7601) Service Pack 1(7601.win7sp1_ldr_escrow.181110-1429) > Language: English (Regional Setting: English) > System Manufacturer: SAMSUNG ELECTRONICS CO., LTD. > System Model: RV420/RV520/RV720/E3530/S3530/E3420/E3520 > BIOS: Phoenix SecureCore-Tiano(tm) NB Version 2.1 05PQ > Processor: Intel(R) Pentium(R) CPU B950 @ 2.10GHz (2 CPUs), ~2.1GHz > Memory: 2048MB RAM > Available OS Memory: 1962MB RAM > Page File: 1517MB used, 2405MB available > Windows Dir: C:\Windows > DirectX Version: DirectX 11 >Reporter: Tham >Priority: Major > > I've got a lot of crash dump from a customer's windows machine. The > stacktrace shows that it crashed at arrow::internal::CountSetBits. > > {code:java} > STACK_TEXT: > 00c9`5354a4c0 7ff7`2f2830fd : 00c9`544841c0 ` > `1e00 ` : > CortexService!arrow::internal::CountSetBits+0x16d > 00c9`5354a550 7ff7`2f2834b7 : 00c9`5337c930 ` > ` ` : > CortexService!arrow::ArrayData::GetNullCount+0x8d > 00c9`5354a580 7ff7`2f13df55 : 00c9`54476080 00c9`5354a5d8 > ` ` : > CortexService!arrow::Array::null_count+0x37 > 00c9`5354a5b0 7ff7`2f13fb68 : 00c9`5354ab40 00c9`5354a6f8 > 00c9`54476080 ` : > CortexService!parquet::arrow::`anonymous > namespace'::LevelBuilder::Visit >+0xa5 > 00c9`5354a640 7ff7`2f12fa34 : 00c9`5354a6f8 00c9`54476080 > 00c9`5354ab40 ` : > CortexService!arrow::VisitArrayInline namespace'::LevelBuilder>+0x298 > 00c9`5354a680 7ff7`2f14bf03 : 00c9`5354ab40 00c9`5354a6f8 > 00c9`54476080 ` : > CortexService!parquet::arrow::`anonymous > namespace'::LevelBuilder::VisitInline+0x44 > 00c9`5354a6c0 7ff7`2f12fe2a : 00c9`5354ab40 00c9`5354ae18 > 00c9`54476080 00c9`5354b208 : > CortexService!parquet::arrow::`anonymous > namespace'::LevelBuilder::GenerateLevels+0x93 > 00c9`5354aa00 7ff7`2f14de56 : 00c9`5354b1f8 00c9`5354afc8 > 00c9`54476080 `1e00 : > CortexService!parquet::arrow::`anonymous > namespace'::ArrowColumnWriter::Write+0x25a > 00c9`5354af20 7ff7`2f14e66b : 00c9`5354b1f8 00c9`5354b238 > 00c9`54445c20 ` : > CortexService!parquet::arrow::`anonymous > namespace'::ArrowColumnWriter::Write+0x2a6 > 00c9`5354b040 7ff7`2f12f137 : 00c9`544041f0 00c9`5354b4d8 > 00c9`5354b4a8 ` : > CortexService!parquet::arrow::FileWriter::Impl::WriteColumnChunk+0x70b > 00c9`5354b400 7ff7`2f14b4d5 : 00c9`54431180 00c9`5354b4d8 > 00c9`5354b4a8 ` : > CortexService!parquet::arrow::FileWriter::WriteColumnChunk+0x67 > 00c9`5354b450 7ff7`2f12eef1 : 00c9`5354b5d8 00c9`5354b648 > ` `1e00 : > CortexService!::operator()+0x195 > 00c9`5354b530 7ff7`2eb8e31e : 00c9`54431180 00c9`5354b760 > 00c9`54442fb0 `1e00 : > CortexService!parquet::arrow::FileWriter::WriteTable+0x521 > 00c9`5354b730 7ff7`2eb58ac5 : 00c9`5307bd88 00c9`54442fb0 > ` ` : > CortexService!Cortex::Storage::ParquetStreamWriter::writeRowGroup+0xfe > 00c9`5354b860 7ff7`2eafdce6 : 00c9`5307bd80 00c9`5354ba08 > 00c9`5354b9e0 00c9`5354b9d8 : > CortexService!Cortex::Storage::ParquetFileWriter::writeRowGroup+0x545 > 00c9`5354b9a0 7ff7`2eaf8bae : 00c9`53275600 00c9`53077220 > `fffe ` : > CortexService!Cortex::Storage::DataStreamWriteWorker::onNewData+0x1a6 > {code} > {code:java} > FAILED_INSTRUCTION_ADDRESS: > CortexService!arrow::internal::CountSetBits+16d > [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc > @ 99] > 7ff7`2f3a4e4d f3480fb800 popcnt rax,qword ptr [rax] > FOLLOWUP_IP: > CortexService!arrow::internal::CountSetBits+16d > [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc > @ 99] > 7ff7`2f3a4e4d f3480fb800 popcnt rax,qword ptr [rax] > FAULTING_SOURCE_LINE: >
[jira] [Updated] (ARROW-5324) plasma API requests
[ https://issues.apache.org/jira/browse/ARROW-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5324: -- Component/s: C++ - Plasma > plasma API requests > --- > > Key: ARROW-5324 > URL: https://issues.apache.org/jira/browse/ARROW-5324 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Darren Weber >Priority: Minor > > Copied from [https://github.com/apache/arrow/issues/4318] (it's easier to > read there, sorry hate Jira formatting) > Related to https://issues.apache.org/jira/browse/ARROW-3444 > While working with the plasma API to create/seal an object for a table, using > a custom object-ID, it would help to have a convenience API to get the size > of the table. > The following code might help to illustrate the request and notes below: > {code:java} > if not parquet_path: > parquet_path = f"./data/dataset_{size}.parquet" > if not plasma_path: > plasma_path = f"./data/dataset_{size}.plasma" > try: > plasma_client = plasma.connect(plasma_path) > except: > plasma_client = None > if plasma_client: > table_id = plasma.ObjectID(bytes(parquet_path[:20], encoding='utf8')) > try: > table = plasma_client.get(table_id, timeout_ms=4000) > if table.__name__ == 'ObjectNotAvailable': > raise ValueError('Failed to get plasma object') > except ValueError: > table = pq.read_table(parquet_path, use_threads=True) > plasma_client.create_and_seal(table_id, table) > {code} > > The use case is a workflow something like this: > - process-A > ** generate a pandas DataFrame `df` > ** save the `df` to parquet, using pyarrow.parquet, with a unique parquet > path > ** (this process will not save directly to plasma) > - process-B > ** get the data from plasma or load it into plasma from the parquet file > ** use the unique parquet path to generate a unique object-ID > Notes: > - `plasma_client.put` for the same data-table is not idempotent, it > generates unique object-ID values that are not based on any hash of the data > payload, so every put saves a new object-ID; could it use a data hash for > idempotent puts? e.g. > - > {code:java} > In : plasma_client.put(table) > ObjectID(25fcb60959d23b6bfc739f88816da29e04d6) > In : plasma_client.put(table) > ObjectID(d2a4662999db30177b090f9fc2bf6b28687d2f8d) > In : plasma_client.put(table) > ObjectID(b2928ad786de2fdb74d374055597f6e7bd97fd61) > In : hash(table) > TypeError: unhashable type: 'pyarrow.lib.Table'{code} > - In process-B, when the data is not already in plasma, it reads data from a > parquet file into a pyarrow.Table and then needs an object-ID and the table > size to use plasma `client.create_and_seal` but it's not easy to get the > table size - this might be related to github issue #2707 (#3444) - it might > be ideal if the `client.create_and_seal` accepts responsibility for the size > of the object to be created when given a pyarrow data object like a table. > - when the plasma store does not have the object, it could have a default > timeout rather than hang indefinitely, and it's a bit clumsy to return an > object that is not easily checked with `isinstance` and it could be better to > have an exception handling pattern (or something like the requests 404 > patterns and options?) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5381) Crash at arrow::internal::CountSetBits
[ https://issues.apache.org/jira/browse/ARROW-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5381: -- Component/s: C++ > Crash at arrow::internal::CountSetBits > -- > > Key: ARROW-5381 > URL: https://issues.apache.org/jira/browse/ARROW-5381 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Environment: Operating System: Windows 7 Professional 64-bit (6.1, > Build 7601) Service Pack 1(7601.win7sp1_ldr_escrow.181110-1429) > Language: English (Regional Setting: English) > System Manufacturer: SAMSUNG ELECTRONICS CO., LTD. > System Model: RV420/RV520/RV720/E3530/S3530/E3420/E3520 > BIOS: Phoenix SecureCore-Tiano(tm) NB Version 2.1 05PQ > Processor: Intel(R) Pentium(R) CPU B950 @ 2.10GHz (2 CPUs), ~2.1GHz > Memory: 2048MB RAM > Available OS Memory: 1962MB RAM > Page File: 1517MB used, 2405MB available > Windows Dir: C:\Windows > DirectX Version: DirectX 11 >Reporter: Tham >Priority: Major > > I've got a lot of crash dump from a customer's windows machine. The > stacktrace shows that it crashed at arrow::internal::CountSetBits. > > {code:java} > STACK_TEXT: > 00c9`5354a4c0 7ff7`2f2830fd : 00c9`544841c0 ` > `1e00 ` : > CortexService!arrow::internal::CountSetBits+0x16d > 00c9`5354a550 7ff7`2f2834b7 : 00c9`5337c930 ` > ` ` : > CortexService!arrow::ArrayData::GetNullCount+0x8d > 00c9`5354a580 7ff7`2f13df55 : 00c9`54476080 00c9`5354a5d8 > ` ` : > CortexService!arrow::Array::null_count+0x37 > 00c9`5354a5b0 7ff7`2f13fb68 : 00c9`5354ab40 00c9`5354a6f8 > 00c9`54476080 ` : > CortexService!parquet::arrow::`anonymous > namespace'::LevelBuilder::Visit >+0xa5 > 00c9`5354a640 7ff7`2f12fa34 : 00c9`5354a6f8 00c9`54476080 > 00c9`5354ab40 ` : > CortexService!arrow::VisitArrayInline namespace'::LevelBuilder>+0x298 > 00c9`5354a680 7ff7`2f14bf03 : 00c9`5354ab40 00c9`5354a6f8 > 00c9`54476080 ` : > CortexService!parquet::arrow::`anonymous > namespace'::LevelBuilder::VisitInline+0x44 > 00c9`5354a6c0 7ff7`2f12fe2a : 00c9`5354ab40 00c9`5354ae18 > 00c9`54476080 00c9`5354b208 : > CortexService!parquet::arrow::`anonymous > namespace'::LevelBuilder::GenerateLevels+0x93 > 00c9`5354aa00 7ff7`2f14de56 : 00c9`5354b1f8 00c9`5354afc8 > 00c9`54476080 `1e00 : > CortexService!parquet::arrow::`anonymous > namespace'::ArrowColumnWriter::Write+0x25a > 00c9`5354af20 7ff7`2f14e66b : 00c9`5354b1f8 00c9`5354b238 > 00c9`54445c20 ` : > CortexService!parquet::arrow::`anonymous > namespace'::ArrowColumnWriter::Write+0x2a6 > 00c9`5354b040 7ff7`2f12f137 : 00c9`544041f0 00c9`5354b4d8 > 00c9`5354b4a8 ` : > CortexService!parquet::arrow::FileWriter::Impl::WriteColumnChunk+0x70b > 00c9`5354b400 7ff7`2f14b4d5 : 00c9`54431180 00c9`5354b4d8 > 00c9`5354b4a8 ` : > CortexService!parquet::arrow::FileWriter::WriteColumnChunk+0x67 > 00c9`5354b450 7ff7`2f12eef1 : 00c9`5354b5d8 00c9`5354b648 > ` `1e00 : > CortexService!::operator()+0x195 > 00c9`5354b530 7ff7`2eb8e31e : 00c9`54431180 00c9`5354b760 > 00c9`54442fb0 `1e00 : > CortexService!parquet::arrow::FileWriter::WriteTable+0x521 > 00c9`5354b730 7ff7`2eb58ac5 : 00c9`5307bd88 00c9`54442fb0 > ` ` : > CortexService!Cortex::Storage::ParquetStreamWriter::writeRowGroup+0xfe > 00c9`5354b860 7ff7`2eafdce6 : 00c9`5307bd80 00c9`5354ba08 > 00c9`5354b9e0 00c9`5354b9d8 : > CortexService!Cortex::Storage::ParquetFileWriter::writeRowGroup+0x545 > 00c9`5354b9a0 7ff7`2eaf8bae : 00c9`53275600 00c9`53077220 > `fffe ` : > CortexService!Cortex::Storage::DataStreamWriteWorker::onNewData+0x1a6 > {code} > {code:java} > FAILED_INSTRUCTION_ADDRESS: > CortexService!arrow::internal::CountSetBits+16d > [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc > @ 99] > 7ff7`2f3a4e4d f3480fb800 popcnt rax,qword ptr [rax] > FOLLOWUP_IP: > CortexService!arrow::internal::CountSetBits+16d > [c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc > @ 99] > 7ff7`2f3a4e4d f3480fb800 popcnt rax,qword ptr [rax] > FAULTING_SOURCE_LINE: > c:\jenkins\workspace\cortexv2-dev-win64-service\src\thirdparty\arrow\cpp\src\arrow\util\bit-util.cc
[jira] [Updated] (ARROW-5402) [Plasma] Pin objects in plasma store
[ https://issues.apache.org/jira/browse/ARROW-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5402: -- Component/s: C++ - Plasma > [Plasma] Pin objects in plasma store > > > Key: ARROW-5402 > URL: https://issues.apache.org/jira/browse/ARROW-5402 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Zhijun Fu >Assignee: Zhijun Fu >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > [https://github.com/apache/arrow/issues/4368] > Sometimes we want to "pin" an object in plasma store - we don't want this > object to be deleted even though there's nobody that's currently referencing > it. In this case, we can specify a flag when creating the object so that it > won't be deleted by LRU cache when its refcnt drops to 0, and can only be > deleted by an explicit {{Delete()}} call. > Currently, we found that an actor FO problem. The actor creation task depends > on a plasma object put by user. After the the actor running for a long time, > the object will be deleted by plasma LRU. Then, an Actor FO happens, the > creation task cannot find the object put by user, so the FO is hanging > forever. > Would this make sense to you? > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5336) [C++] Implement arrow::Concatenate for dictionary-encoded arrays with unequal dictionaries
[ https://issues.apache.org/jira/browse/ARROW-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5336: -- Component/s: C++ > [C++] Implement arrow::Concatenate for dictionary-encoded arrays with unequal > dictionaries > -- > > Key: ARROW-5336 > URL: https://issues.apache.org/jira/browse/ARROW-5336 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > Currently (as of ARROW-3144) if any dictionary is different, an error is > returned -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5438) [JS] Utilize stream EOS in File format
[ https://issues.apache.org/jira/browse/ARROW-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5438: -- Component/s: JavaScript > [JS] Utilize stream EOS in File format > -- > > Key: ARROW-5438 > URL: https://issues.apache.org/jira/browse/ARROW-5438 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: John Muehlhausen >Priority: Minor > > We currently do not write EOS at the end of a Message stream inside the File > format. As a result, the file cannot be parsed sequentially. This change > prepares for other implementations or future reference features that parse a > File sequentially... i.e. without access to seek(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5417) [Website] http://arrow.apache.org doesn't redirect to https
[ https://issues.apache.org/jira/browse/ARROW-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5417: -- Component/s: Website > [Website] http://arrow.apache.org doesn't redirect to https > --- > > Key: ARROW-5417 > URL: https://issues.apache.org/jira/browse/ARROW-5417 > Project: Apache Arrow > Issue Type: Improvement > Components: Website >Reporter: Neal Richardson >Priority: Minor > > This should be a simple (for someone sufficiently authorized) config change > somewhere. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5476) [Java][Memory] Fix Netty ArrowBuf Slice
[ https://issues.apache.org/jira/browse/ARROW-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5476: -- Component/s: Java > [Java][Memory] Fix Netty ArrowBuf Slice > --- > > Key: ARROW-5476 > URL: https://issues.apache.org/jira/browse/ARROW-5476 > Project: Apache Arrow > Issue Type: Task > Components: Java >Affects Versions: 0.14.0 >Reporter: Praveen Kumar Desabandu >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The slice of netty arrow buf depends on arrow buf reader and writer indexes, > but arrow buf is supposed to only track memory addr + length and there are > places where the arrow buf indexes are not in sync with netty. > So slice should use the indexes in Netty Arrow Buf instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5439) [Java] Utilize stream EOS in File format
[ https://issues.apache.org/jira/browse/ARROW-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5439: -- Component/s: Java > [Java] Utilize stream EOS in File format > > > Key: ARROW-5439 > URL: https://issues.apache.org/jira/browse/ARROW-5439 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: John Muehlhausen >Priority: Minor > > We currently do not write EOS at the end of a Message stream inside the File > format. As a result, the file cannot be parsed sequentially. This change > prepares for other implementations or future reference features that parse a > File sequentially... i.e. without access to seek(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5435) [Java] IntervalYearVector#getObject should return Period with both year and month
[ https://issues.apache.org/jira/browse/ARROW-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5435: -- Component/s: Java > [Java] IntervalYearVector#getObject should return Period with both year and > month > - > > Key: ARROW-5435 > URL: https://issues.apache.org/jira/browse/ARROW-5435 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > IntervalYearVector#getObject today return Period with specific month. > However, this vector stores interval (years and months, e.g. 2 years and 3 > months is stored as 27(total months)), it should return Period with both > years and months(now only months is assigned). > As shown in the example above, now it return Period(27 months), I think it > should return Period(2 years, 3 months). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5471) [C++][Gandiva]Array offset is ignored in Gandiva projector
[ https://issues.apache.org/jira/browse/ARROW-5471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5471: -- Component/s: C++ - Gandiva > [C++][Gandiva]Array offset is ignored in Gandiva projector > -- > > Key: ARROW-5471 > URL: https://issues.apache.org/jira/browse/ARROW-5471 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva >Reporter: Zeyuan Shang >Priority: Major > > I used the test case in > [https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_gandiva.py#L25], > and found an issue when I was using the slice operator {{input_batch[1:]}}. > It seems that the offset is ignored in the Gandiva projector. > {code:java} > import pyarrow as pa > import pyarrow.gandiva as gandiva > builder = gandiva.TreeExprBuilder() > field_a = pa.field('a', pa.int32()) > field_b = pa.field('b', pa.int32()) > schema = pa.schema([field_a, field_b]) > field_result = pa.field('res', pa.int32()) > node_a = builder.make_field(field_a) > node_b = builder.make_field(field_b) > condition = builder.make_function("greater_than", [node_a, node_b], > pa.bool_()) > if_node = builder.make_if(condition, node_a, node_b, pa.int32()) > expr = builder.make_expression(if_node, field_result) > projector = gandiva.make_projector( > schema, [expr], pa.default_memory_pool()) > a = pa.array([10, 12, -20, 5], type=pa.int32()) > b = pa.array([5, 15, 15, 17], type=pa.int32()) > e = pa.array([10, 15, 15, 17], type=pa.int32()) > input_batch = pa.RecordBatch.from_arrays([a, b], names=['a', 'b']) > r, = projector.evaluate(input_batch[1:]) > print(r) > {code} > If we use the full record batch {{input_batch}}, the expected output is > {{[10, 15, 15, 17]}}. So if we use {{input_batch[1:]}}, the expected output > should be {{[15, 15, 17]}}, however this script returned {{[10, 15, 15]}}. It > seems that the projector ignores the offset and always reads from 0. > > A corresponding issue is created in GitHub as well > [https://github.com/apache/arrow/issues/4420] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5440) [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos
[ https://issues.apache.org/jira/browse/ARROW-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5440: -- Component/s: Rust > [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos > - > > Key: ARROW-5440 > URL: https://issues.apache.org/jira/browse/ARROW-5440 > Project: Apache Arrow > Issue Type: Bug > Components: Rust > Environment: CentOS Linux release 7.6.1810 (Core) >Reporter: Tenzin Rigden >Priority: Major > Attachments: parquet-test-libstd.tar.gz > > > Hello, > In the rust parquet implementation ([https://github.com/sunchao/parquet-rs]) > on centos, the binary created has a `libstd-hash.so` shared library > dependency that is causing issues since it's a shared library found in the > rustup directory. This `libstd-hash.so` dependency isn't there on any other > rust binaries I've made before. This dependency means that I can't run this > binary anywhere where rustup isn't installed with that exact libstd library. > This is not an issue on Mac. > I've attached the rust files and here is the command line output below. > {code:java|title=cli-output|borderStyle=solid} > [centos@_ parquet-test]$ cat /etc/centos-release > CentOS Linux release 7.6.1810 (Core) > [centos@_ parquet-test]$ rustc --version > rustc 1.36.0-nightly (e70d5386d 2019-05-27) > [centos@_ parquet-test]$ ldd target/release/parquet-test > linux-vdso.so.1 => (0x7ffd02fee000) > libstd-44988553032616b2.so => not found > librt.so.1 => /lib64/librt.so.1 (0x7f6ecd209000) > libpthread.so.0 => /lib64/libpthread.so.0 (0x7f6eccfed000) > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f6eccdd7000) > libc.so.6 => /lib64/libc.so.6 (0x7f6ecca0a000) > libm.so.6 => /lib64/libm.so.6 (0x7f6ecc708000) > /lib64/ld-linux-x86-64.so.2 (0x7f6ecd8b1000) > [centos@_ parquet-test]$ ls -l > ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so > -rw-r--r--. 1 centos centos 5623568 May 27 21:46 > /home/centos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5450) [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too large to convert to C long
[ https://issues.apache.org/jira/browse/ARROW-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5450: -- Component/s: Python > [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too > large to convert to C long > --- > > Key: ARROW-5450 > URL: https://issues.apache.org/jira/browse/ARROW-5450 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Tim Swast >Priority: Major > > When I attempt to roundtrip from a list of moderately large (beyond what can > be represented in nanosecond precision, but within microsecond precision) > datetime objects to pyarrow and back, I get an OverflowError: Python int too > large to convert to C long. > pyarrow version: > {noformat} > $ pip freeze | grep pyarrow > pyarrow==0.13.0{noformat} > > Reproduction: > {code:java} > import datetime > import pandas > import pyarrow > import pytz > timestamp_rows = [ > datetime.datetime(1, 1, 1, 0, 0, 0, tzinfo=pytz.utc), > None, > datetime.datetime(, 12, 31, 23, 59, 59, 99, tzinfo=pytz.utc), > datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc), > ] > timestamp_array = pyarrow.array(timestamp_rows, pyarrow.timestamp("us", > tz="UTC")) > timestamp_roundtrip = timestamp_array.to_pylist() > # --- > # OverflowError Traceback (most recent call last) > # in > # > 1 timestamp_roundtrip = timestamp_array.to_pylist() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi > in __iter__() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi > in pyarrow.lib.TimestampValue.as_py() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi > in pyarrow.lib._datetime_conversion_functions.lambda5() > # > # pandas/_libs/tslibs/timestamps.pyx in > pandas._libs.tslibs.timestamps.Timestamp.__new__() > # > # pandas/_libs/tslibs/conversion.pyx in > pandas._libs.tslibs.conversion.convert_to_tsobject() > # > # OverflowError: Python int too large to convert to C long > {code} > For good measure, I also tested with timezone-naive timestamps with the same > error: > {code:java} > naive_rows = [ > datetime.datetime(1, 1, 1, 0, 0, 0), > None, > datetime.datetime(, 12, 31, 23, 59, 59, 99), > datetime.datetime(1970, 1, 1, 0, 0, 0), > ] > naive_array = pyarrow.array(naive_rows, pyarrow.timestamp("us", tz=None)) > naive_roundtrip = naive_array.to_pylist() > # --- > # OverflowError Traceback (most recent call last) > # in > # > 1 naive_roundtrip = naive_array.to_pylist() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi > in __iter__() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi > in pyarrow.lib.TimestampValue.as_py() > # > # > ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi > in pyarrow.lib._datetime_conversion_functions.lambda5() > # > # pandas/_libs/tslibs/timestamps.pyx in > pandas._libs.tslibs.timestamps.Timestamp.__new__() > # > # pandas/_libs/tslibs/conversion.pyx in > pandas._libs.tslibs.conversion.convert_to_tsobject() > # > # OverflowError: Python int too large to convert to C long > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5410) [C++] Crash at arrow::internal::FileWrite
[ https://issues.apache.org/jira/browse/ARROW-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5410: -- Component/s: C++ > [C++] Crash at arrow::internal::FileWrite > - > > Key: ARROW-5410 > URL: https://issues.apache.org/jira/browse/ARROW-5410 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Environment: Windows version 10.0.14393.0 (rs1_release.160715-1616) >Reporter: Tham >Priority: Major > Labels: parquet > > My application is writing a bunch of parquet files and it often crashes. Most > of the time it crashes when writing the first file, sometimes it can write > the first file and crashing at the 2nd file. The file can always be opened. > It only crashes at writeTable. > As I tested, my application crashes when build with release mode, but don't > crash with debug mode. It crashed only on one Windows machine, not others. > Here is stack trace from dump file: > {code:java} > STACK_TEXT: > 001e`10efd840 7ffc`0333d53f : ` 001e`10efe230 > `0033 7ffc`032dbe21 : > CortexSync!google_breakpad::ExceptionHandler::HandleInvalidParameter+0x1a0 > 001e`10efe170 7ffc`0333d559 : `ff02 7ffc`032da63d > `0033 `0033 : ucrtbase!invalid_parameter+0x13f > 001e`10efe1b0 7ffc`03318664 : 7ff7`7f7c8489 `ff02 > 001e`10efe230 `0033 : ucrtbase!invalid_parameter_noinfo+0x9 > 001e`10efe1f0 7ffc`032d926d : ` `0140 > `0005 0122`bbe61e30 : > ucrtbase!_acrt_uninitialize_command_line+0x6fd4 > 001e`10efe250 7ff7`7f66585e : 0010`0005 ` > 001e`10efe560 0122`b2337b88 : ucrtbase!write+0x8d > 001e`10efe2a0 7ff7`7f632785 : 7ff7` 7ff7`7f7bb153 > 0122`bbe890e0 001e`10efe634 : > CortexSync!arrow::internal::FileWrite+0x5e > 001e`10efe360 7ff7`7f632442 : `348a `0004 > 733f`5e86f38c 0122`bbe14c40 : > CortexSync!arrow::io::OSFile::Write+0x1d5 > 001e`10efe510 7ff7`7f71c1b9 : 001e`10efe738 7ff7`7f665522 > 0122`bbffe6e0 ` : > CortexSync!arrow::io::FileOutputStream::Write+0x12 > 001e`10efe540 7ff7`7f79cb2f : 0122`bbe61e30 0122`bbffe6e0 > `0013 001e`10efe730 : > CortexSync!parquet::ArrowOutputStream::Write+0x39 > 001e`10efe6e0 7ff7`7f7abbaf : 7ff7`7fd75b78 7ff7`7fd75b78 > 001e`10efe9c0 ` : > CortexSync!parquet::ThriftSerializer::Serialize+0x11f > 001e`10efe8c0 7ff7`7f7aaf93 : ` 0122`bbe3f450 > `0002 0122`bc0218d0 : > CortexSync!parquet::SerializedPageWriter::WriteDictionaryPage+0x44f > 001e`10efee20 7ff7`7f7a3707 : 0122`bbe3f450 001e`10eff250 > ` 0122`b168 : > CortexSync!parquet::TypedColumnWriterImpl > >::WriteDictionaryPage+0x143 > 001e`10efeed0 7ff7`7f710480 : 001e`10eff1c0 ` > 0122`bbe3f540 0122`b2439998 : > CortexSync!parquet::ColumnWriterImpl::Close+0x47 > 001e`10efef60 7ff7`7f7154da : 0122`bbec3cd0 001e`10eff1c0 > 0122`bbec4bb0 0122`b2439998 : > CortexSync!parquet::arrow::FileWriter::Impl::`vector deleting > destructor'+0x100 > 001e`10efefa0 7ff7`7f71619c : ` 001e`10eff1c0 > 0122`bbe89390 ` : > CortexSync!parquet::arrow::FileWriter::Impl::WriteColumnChunk+0x6fa > 001e`10eff150 7ff7`7f202de9 : `0001 001e`10eff430 > `000f ` : > CortexSync!parquet::arrow::FileWriter::WriteTable+0x6cc > 001e`10eff410 7ff7`7f18baf3 : 0122`bbec39b0 0122`b24c53f8 > `3f80 ` : > CortexSync!Cortex::Storage::ParquetStreamWriter::writeRowGroup+0x49{code} > I tried a lot of ways to find out the root cause, but failed. Can anyone here > give me some information/advice please, so that I can investigate more? > Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4906) [Format] Fix document to describe that SparseMatrixIndexCSR assumes indptr is sorted for each row
[ https://issues.apache.org/jira/browse/ARROW-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4906: -- Component/s: Format > [Format] Fix document to describe that SparseMatrixIndexCSR assumes indptr is > sorted for each row > - > > Key: ARROW-4906 > URL: https://issues.apache.org/jira/browse/ARROW-4906 > Project: Apache Arrow > Issue Type: Bug > Components: Format >Reporter: Kenta Murata >Assignee: Kenta Murata >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4831) [C++] CMAKE_AR is not passed to ZSTD thirdparty dependency
[ https://issues.apache.org/jira/browse/ARROW-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4831: -- Component/s: C++ > [C++] CMAKE_AR is not passed to ZSTD thirdparty dependency > --- > > Key: ARROW-4831 > URL: https://issues.apache.org/jira/browse/ARROW-4831 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 50m > Remaining Estimate: 0h > > ZSTD_CMAKE_ARGS should utilize > https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L359 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4994) [website] Update Details for ptgoetz
[ https://issues.apache.org/jira/browse/ARROW-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4994: -- Component/s: Website > [website] Update Details for ptgoetz > > > Key: ARROW-4994 > URL: https://issues.apache.org/jira/browse/ARROW-4994 > Project: Apache Arrow > Issue Type: Task > Components: Website >Reporter: P. Taylor Goetz >Assignee: P. Taylor Goetz >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > I'm no longer with Hortonworks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4950) [C++] Thirdparty CMake error get_target_property() called with non-existent target LZ4::lz4
[ https://issues.apache.org/jira/browse/ARROW-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4950: -- Component/s: C++ > [C++] Thirdparty CMake error get_target_property() called with non-existent > target LZ4::lz4 > --- > > Key: ARROW-4950 > URL: https://issues.apache.org/jira/browse/ARROW-4950 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Krisztian Szucs >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > > With CMake 3.2 https://travis-ci.org/kszucs/crossbow/builds/507811485 > {code} > docker-compose build cpp-cmake32 > docker-compose run --rm cpp-cmake32 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4988) [JS] Bump required node version to 11.12
[ https://issues.apache.org/jira/browse/ARROW-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4988: -- Component/s: JavaScript > [JS] Bump required node version to 11.12 > > > Key: ARROW-4988 > URL: https://issues.apache.org/jira/browse/ARROW-4988 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.1 > > Time Spent: 20m > Remaining Estimate: 0h > > The cause of ARROW-4948 and > http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C5ce620e0-0063-4bee-8ad6-a41301ac08c4%40www.fastmail.com%3E > was actually a regression in node v11.11, resolved in v11.12 see > https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V11.md#2019-03-15-version-11120-current-bridgear > and https://github.com/nodejs/node/pull/26488 > Bump requirement up to 11.12 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5010) [Release] Fix release script with llvm-7
[ https://issues.apache.org/jira/browse/ARROW-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5010: -- Component/s: Developer Tools > [Release] Fix release script with llvm-7 > > > Key: ARROW-5010 > URL: https://issues.apache.org/jira/browse/ARROW-5010 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Source release script fails to compile gandiva because it requires llvm-7 and > only llvm-6 is available in the ubuntu18 docker image. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5011) [Release] Add support in the source release script for custom hash
[ https://issues.apache.org/jira/browse/ARROW-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5011: -- Component/s: Developer Tools > [Release] Add support in the source release script for custom hash > -- > > Key: ARROW-5011 > URL: https://issues.apache.org/jira/browse/ARROW-5011 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Trivial > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is a minor feature to help debugging said script on a by overriding the > git-archive hash instead of the hash inferred from the release tag. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5447) [CI] [Ruby] CI is failed on AppVeyor
[ https://issues.apache.org/jira/browse/ARROW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854510#comment-16854510 ] Antoine Pitrou commented on ARROW-5447: --- It seems the error is non-deterministic. Another instance: https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/24998213 > [CI] [Ruby] CI is failed on AppVeyor > > > Key: ARROW-5447 > URL: https://issues.apache.org/jira/browse/ARROW-5447 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Ruby >Reporter: Yosuke Shiro >Priority: Major > > This happens sometimes. > {code:java} > Error: test: csv.gz(TableTest::#save and .load::path:::format::load: auto > detect): Arrow::Error::Io: [csv-reader][read]: IOError: zlib inflate failed: > invalid distance too far back > c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in > `invoke' > c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in > `invoke' > c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:533:in > `block in define_method' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:158:in `block (2 > levels) in load_from_path' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:147:in `block (2 > levels) in wrap_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:140:in > `open_encoding_convert_stream' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:146:in `block in > wrap_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:125:in `block in > open_decompress_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:124:in > `open_decompress_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:145:in `wrap_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:157:in `block in > load_from_path' > C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:156:in > `load_from_path' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:39:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:26:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:158:in > `load_as_csv' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:50:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:22:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table.rb:27:in `load' > C:/projects/arrow/ruby/red-arrow/test/test-table.rb:503:in `block (5 levels) > in ' > === > {code} > > https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/24813909/job/kkc98r3e4ltxeor3#L2328 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5447) [CI] [Ruby] CI is failed on AppVeyor
[ https://issues.apache.org/jira/browse/ARROW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854511#comment-16854511 ] Antoine Pitrou commented on ARROW-5447: --- [~kou] > [CI] [Ruby] CI is failed on AppVeyor > > > Key: ARROW-5447 > URL: https://issues.apache.org/jira/browse/ARROW-5447 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Ruby >Reporter: Yosuke Shiro >Priority: Major > > This happens sometimes. > {code:java} > Error: test: csv.gz(TableTest::#save and .load::path:::format::load: auto > detect): Arrow::Error::Io: [csv-reader][read]: IOError: zlib inflate failed: > invalid distance too far back > c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in > `invoke' > c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in > `invoke' > c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:533:in > `block in define_method' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:158:in `block (2 > levels) in load_from_path' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:147:in `block (2 > levels) in wrap_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:140:in > `open_encoding_convert_stream' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:146:in `block in > wrap_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:125:in `block in > open_decompress_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:124:in > `open_decompress_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:145:in `wrap_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:157:in `block in > load_from_path' > C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:156:in > `load_from_path' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:39:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:26:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:158:in > `load_as_csv' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:50:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:22:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table.rb:27:in `load' > C:/projects/arrow/ruby/red-arrow/test/test-table.rb:503:in `block (5 levels) > in ' > === > {code} > > https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/24813909/job/kkc98r3e4ltxeor3#L2328 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5447) [CI] [Ruby] CI is failed on AppVeyor
[ https://issues.apache.org/jira/browse/ARROW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5447: -- Summary: [CI] [Ruby] CI is failed on AppVeyor (was: [CI] [Ruby] CI is failued on AppVeyor) > [CI] [Ruby] CI is failed on AppVeyor > > > Key: ARROW-5447 > URL: https://issues.apache.org/jira/browse/ARROW-5447 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Ruby >Reporter: Yosuke Shiro >Priority: Major > > This happens sometimes. > {code:java} > Error: test: csv.gz(TableTest::#save and .load::path:::format::load: auto > detect): Arrow::Error::Io: [csv-reader][read]: IOError: zlib inflate failed: > invalid distance too far back > c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in > `invoke' > c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:616:in > `invoke' > c:/Ruby26-x64/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.6/lib/gobject-introspection/loader.rb:533:in > `block in define_method' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:158:in `block (2 > levels) in load_from_path' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:147:in `block (2 > levels) in wrap_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:140:in > `open_encoding_convert_stream' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:146:in `block in > wrap_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:125:in `block in > open_decompress_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:124:in > `open_decompress_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:145:in `wrap_input' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:157:in `block in > load_from_path' > C:/projects/arrow/ruby/red-arrow/lib/arrow/block-closable.rb:25:in `open' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:156:in > `load_from_path' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:39:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/csv-loader.rb:26:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:158:in > `load_as_csv' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:50:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table-loader.rb:22:in `load' > C:/projects/arrow/ruby/red-arrow/lib/arrow/table.rb:27:in `load' > C:/projects/arrow/ruby/red-arrow/test/test-table.rb:503:in `block (5 levels) > in ' > === > {code} > > https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/24813909/job/kkc98r3e4ltxeor3#L2328 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3676) [Go] implement Decimal128 array
[ https://issues.apache.org/jira/browse/ARROW-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854415#comment-16854415 ] Sebastien Binet commented on ARROW-3676: one possible package to leverage and implement this (w/o reaching for, say, `math/big.Int`) could be: [https://github.com/lukechampine/uint128] or just piggyback on what was implemented in C++: - [https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/basic_decimal.h] > [Go] implement Decimal128 array > --- > > Key: ARROW-3676 > URL: https://issues.apache.org/jira/browse/ARROW-3676 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Reporter: Sebastien Binet >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5488) [R] Workaround when C++ lib not available
Romain François created ARROW-5488: -- Summary: [R] Workaround when C++ lib not available Key: ARROW-5488 URL: https://issues.apache.org/jira/browse/ARROW-5488 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Romain François As a way to get to CRAN, we need some way for the package still compile and install and test (although do nothing useful) even when the c++ lib is not available. -- This message was sent by Atlassian JIRA (v7.6.3#76005)