[jira] [Commented] (ARROW-2813) [C++] Strip uninformative lcov output from Travis CI logs
[ https://issues.apache.org/jira/browse/ARROW-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17659838#comment-17659838 ] Rok Mihevc commented on ARROW-2813: --- This issue has been migrated to [issue #19191|https://github.com/apache/arrow/issues/19191] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [C++] Strip uninformative lcov output from Travis CI logs > - > > Key: ARROW-2813 > URL: https://issues.apache.org/jira/browse/ARROW-2813 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We have about 650 lines of the type: > {code} > geninfo: WARNING: no data found for /usr/include/c++/4.8/istream > {code} > We should maybe pipe all the lcov output to a file and print any lines that > reference things other than /usr/include -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4982) [GLib][CI] Run tests on AppVeyor
[ https://issues.apache.org/jira/browse/ARROW-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17662004#comment-17662004 ] Rok Mihevc commented on ARROW-4982: --- This issue has been migrated to [issue #16036|https://github.com/apache/arrow/issues/16036] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [GLib][CI] Run tests on AppVeyor > > > Key: ARROW-4982 > URL: https://issues.apache.org/jira/browse/ARROW-4982 > Project: Apache Arrow > Issue Type: Test > Components: Continuous Integration, GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-2209) [Python] Partition columns are not correctly loaded in schema of ParquetDataset
[ https://issues.apache.org/jira/browse/ARROW-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17659237#comment-17659237 ] Rok Mihevc commented on ARROW-2209: --- This issue has been migrated to [issue #18173|https://github.com/apache/arrow/issues/18173] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] Partition columns are not correctly loaded in schema of > ParquetDataset > --- > > Key: ARROW-2209 > URL: https://issues.apache.org/jira/browse/ARROW-2209 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Currently the partition columns are not included in the schema of a > ParquetDataset. We correctly write them out in the {{_common_metadata}} file > but we fail to load this file correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-2202) [JS] Add DataFrame.toJSON
[ https://issues.apache.org/jira/browse/ARROW-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17659230#comment-17659230 ] Rok Mihevc commented on ARROW-2202: --- This issue has been migrated to [issue #18166|https://github.com/apache/arrow/issues/18166] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [JS] Add DataFrame.toJSON > - > > Key: ARROW-2202 > URL: https://issues.apache.org/jira/browse/ARROW-2202 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Brian Hulette >Priority: Major > > Currently, {{CountByResult}} has its own [{{toJSON}} > method|https://github.com/apache/arrow/blob/master/js/src/table.ts#L282], but > there should be a more general one for every {{DataFrame}}. > {{CountByResult.toJSON}} returns: > {code:json} > { > "keyA": 10, > "keyB": 10, > ... > }{code} > A more general {{toJSON}} could just return a list of objects with an entry > for each column. For the above {{CountByResult}}, the output would look like: > {code:json} > [ > {value: "keyA", count: 10}, > {value: "keyB", count: 10}, > ... > ]{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-2185) Remove CI directives from squashed commit messages
[ https://issues.apache.org/jira/browse/ARROW-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-2185: -- External issue URL: https://github.com/apache/arrow/issues/18150 > Remove CI directives from squashed commit messages > -- > > Key: ARROW-2185 > URL: https://issues.apache.org/jira/browse/ARROW-2185 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > In our PR squash tool, we are potentially picking up CI directives like > {{[skip appveyor]}} from intermediate commits. We should regex these away and > instead use directives in the PR title if we wish the commit to master to > behave in a certain way -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-1968) [Python] Unit testing setup for ORC files
[ https://issues.apache.org/jira/browse/ARROW-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-1968: -- External issue URL: https://github.com/apache/arrow/issues/17955 > [Python] Unit testing setup for ORC files > - > > Key: ARROW-1968 > URL: https://issues.apache.org/jira/browse/ARROW-1968 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > ORC does not have a production ready C++ writer yet, so we will need to > figure out another way to generate test data files to probe all of the > corners of our ORC reader -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-1921) [Doc] Build API docs on a per-release basis
[ https://issues.apache.org/jira/browse/ARROW-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-1921: -- External issue URL: https://github.com/apache/arrow/issues/17912 > [Doc] Build API docs on a per-release basis > --- > > Key: ARROW-1921 > URL: https://issues.apache.org/jira/browse/ARROW-1921 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Uwe Korn >Priority: Major > > Currently we build the docs from time to time manually from master. We should > also build them per release so that you can have a look at the latest > released API version. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4975) [C++] Support concatenation of UnionArrays
[ https://issues.apache.org/jira/browse/ARROW-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4975: -- External issue URL: https://github.com/apache/arrow/issues/16035 > [C++] Support concatenation of UnionArrays > -- > > Key: ARROW-4975 > URL: https://issues.apache.org/jira/browse/ARROW-4975 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Ben Kietzman >Assignee: Matthijs Brobbel >Priority: Minor > Labels: good-first-issue, good-second-issue, > pull-request-available > Fix For: 7.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > https://github.com/apache/arrow/pull/3746 adds support for concatenation of > arrays, but UnionArrays are not supported. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4972) [Go] Array equality
[ https://issues.apache.org/jira/browse/ARROW-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661994#comment-17661994 ] Rok Mihevc commented on ARROW-4972: --- This issue has been migrated to [issue #21476|https://github.com/apache/arrow/issues/21476] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Go] Array equality > --- > > Key: ARROW-4972 > URL: https://issues.apache.org/jira/browse/ARROW-4972 > Project: Apache Arrow > Issue Type: Sub-task > Components: Go >Reporter: Alexandre Crayssac >Assignee: Sebastien Binet >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4976) [JS] RecordBatchReader should reset its Node/DOM streams
[ https://issues.apache.org/jira/browse/ARROW-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4976: -- External issue URL: https://github.com/apache/arrow/issues/21479 > [JS] RecordBatchReader should reset its Node/DOM streams > > > Key: ARROW-4976 > URL: https://issues.apache.org/jira/browse/ARROW-4976 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > > RecordBatchReaders should reset their internal platform streams on reset so > they can be piped to separate output streams when reset. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4975) [C++] Support concatenation of UnionArrays
[ https://issues.apache.org/jira/browse/ARROW-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661997#comment-17661997 ] Rok Mihevc commented on ARROW-4975: --- This issue has been migrated to [issue #16035|https://github.com/apache/arrow/issues/16035] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [C++] Support concatenation of UnionArrays > -- > > Key: ARROW-4975 > URL: https://issues.apache.org/jira/browse/ARROW-4975 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Ben Kietzman >Assignee: Matthijs Brobbel >Priority: Minor > Labels: good-first-issue, good-second-issue, > pull-request-available > Fix For: 7.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > https://github.com/apache/arrow/pull/3746 adds support for concatenation of > arrays, but UnionArrays are not supported. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4507) [Format] Create outline and introduction for new document.
[ https://issues.apache.org/jira/browse/ARROW-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4507: -- External issue URL: https://github.com/apache/arrow/issues/21058 > [Format] Create outline and introduction for new document. > -- > > Key: ARROW-4507 > URL: https://issues.apache.org/jira/browse/ARROW-4507 > Project: Apache Arrow > Issue Type: Sub-task > Components: Format >Reporter: Micah Kornfield >Assignee: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > This will ensure the document has a good flow, other subtasks on the parent > will handle moving content from each of the documents. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4479) [Plasma] Add S3 as external store for Plasma
[ https://issues.apache.org/jira/browse/ARROW-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4479: -- External issue URL: https://github.com/apache/arrow/issues/21035 > [Plasma] Add S3 as external store for Plasma > > > Key: ARROW-4479 > URL: https://issues.apache.org/jira/browse/ARROW-4479 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Plasma >Affects Versions: 0.12.0 >Reporter: Anurag Khandelwal >Assignee: Anurag Khandelwal >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Adding S3 as an external store will allow objects to be evicted to S3 when > Plasma runs out of memory capacity. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4497) [C++] Determine how we want to handle hashing of floating point edge cases
[ https://issues.apache.org/jira/browse/ARROW-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661519#comment-17661519 ] Rok Mihevc commented on ARROW-4497: --- This issue has been migrated to [issue #21050|https://github.com/apache/arrow/issues/21050] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [C++] Determine how we want to handle hashing of floating point edge cases > -- > > Key: ARROW-4497 > URL: https://issues.apache.org/jira/browse/ARROW-4497 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Micah Kornfield >Priority: Minor > Labels: analytics > > We should document expected behavior or implement improvements to hashing > floating point code: > 1. -0.0 and 0.0 (should these be collapsed to 0.0) > 2. NaN (Should we reduce to a single canonical version). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4566) [C++][Flight] Add option to run arrow-flight-benchmark against a perf server running on a different host
[ https://issues.apache.org/jira/browse/ARROW-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4566: -- External issue URL: https://github.com/apache/arrow/issues/21112 > [C++][Flight] Add option to run arrow-flight-benchmark against a perf server > running on a different host > > > Key: ARROW-4566 > URL: https://issues.apache.org/jira/browse/ARROW-4566 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, FlightRPC >Reporter: Wes McKinney >Assignee: David Li >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the assumption is that both processes are running on localhost. > While also interesting (to see how fast things can go taking network IO out > of the equation) it is not very realistic. It would be good to both establish > a baseline network IO benchmark between two hosts and then see how close a > Flight stream can get to that -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4475) [Python] Serializing objects that contain themselves
[ https://issues.apache.org/jira/browse/ARROW-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4475: -- External issue URL: https://github.com/apache/arrow/issues/21032 > [Python] Serializing objects that contain themselves > > > Key: ARROW-4475 > URL: https://issues.apache.org/jira/browse/ARROW-4475 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This is a regression from [https://github.com/apache/arrow/pull/3423] > The following segfaults: > {code:java} > import pyarrow as pa > lst = [] > lst.append(lst) > pa.serialize(lst){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4584) [Python] Add built wheel to manylinux1 dockerignore.
[ https://issues.apache.org/jira/browse/ARROW-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4584: -- External issue URL: https://github.com/apache/arrow/issues/21128 > [Python] Add built wheel to manylinux1 dockerignore. > > > Key: ARROW-4584 > URL: https://issues.apache.org/jira/browse/ARROW-4584 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging, Python >Affects Versions: 0.12.0 >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Currently we add them to the docker context while we don't need them. This > shrinks the context for me down to 55k instead of hundreds MiBs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4474) [Flight] FlightInfo should use signed integer types for payload size
[ https://issues.apache.org/jira/browse/ARROW-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661495#comment-17661495 ] Rok Mihevc commented on ARROW-4474: --- This issue has been migrated to [issue #21031|https://github.com/apache/arrow/issues/21031] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Flight] FlightInfo should use signed integer types for payload size > > > Key: ARROW-4474 > URL: https://issues.apache.org/jira/browse/ARROW-4474 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC >Reporter: David Li >Assignee: David Li >Priority: Major > Labels: flight, pull-request-available > Fix For: 0.13.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > The de-facto practice is to use -1 in FlightInfo to indicate that the number > of records/size of the payload is unknown, looking at the Java > implementation. However, the Protobuf definition uses an unsigned integer > type, as does the C++ implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4271) [Rust] Move Parquet specific info to Parquet Readme
[ https://issues.apache.org/jira/browse/ARROW-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661295#comment-17661295 ] Rok Mihevc commented on ARROW-4271: --- This issue has been migrated to [issue #20847|https://github.com/apache/arrow/issues/20847] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Rust] Move Parquet specific info to Parquet Readme > --- > > Key: ARROW-4271 > URL: https://issues.apache.org/jira/browse/ARROW-4271 > Project: Apache Arrow > Issue Type: Task > Components: Rust >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Trivial > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The arrow readme contains parquet specific info that was copied over from the > top level readme, it should be moved to the parquet readme. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4254) [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt
[ https://issues.apache.org/jira/browse/ARROW-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4254: -- External issue URL: https://github.com/apache/arrow/issues/20831 > [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt > -- > > Key: ARROW-4254 > URL: https://issues.apache.org/jira/browse/ARROW-4254 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > > These tests use an API that was not available in the Boost in Ubuntu 14.04; > we can change them to use the more compatible API > {code} > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc: > In member function ‘virtual void > gandiva::TestLruCache_TestLruBehavior_Test::TestBody()’: > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:188: > error: ‘class boost::optional >’ has no member named > ‘value’ >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > > ^ > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:203: > error: template argument 1 is invalid >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > >^ > /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:294: > error: ‘class boost::optional >’ has no member named > ‘value’ >ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello"); > > > > ^ > make[2]: *** > [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/lru_cache_test.cc.o] Error > 1 > make[1]: *** [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/all] Error 2 > make[1]: *** Waiting for unfinished jobs > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4523) [JS] Add row proxy generation benchmark
[ https://issues.apache.org/jira/browse/ARROW-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661545#comment-17661545 ] Rok Mihevc commented on ARROW-4523: --- This issue has been migrated to [issue #21073|https://github.com/apache/arrow/issues/21073] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [JS] Add row proxy generation benchmark > --- > > Key: ARROW-4523 > URL: https://issues.apache.org/jira/browse/ARROW-4523 > Project: Apache Arrow > Issue Type: Test > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4567) [C++] Convert Scalar values to Array values with length 1
[ https://issues.apache.org/jira/browse/ARROW-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4567: -- External issue URL: https://github.com/apache/arrow/issues/21113 > [C++] Convert Scalar values to Array values with length 1 > - > > Key: ARROW-4567 > URL: https://issues.apache.org/jira/browse/ARROW-4567 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > > A common approach to performing operations on both scalar and array values is > to treat a Scalar as an array of length 1. For example, we cannot currently > use our Cast kernels to cast a Scalar. It would be senseless to create > separate kernel implementations specialized for a single value, and much > easier to promote a scalar to an Array, execute the kernel, then unbox the > result back into a Scalar -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3898) parquet-arrow example has compilation errors
[ https://issues.apache.org/jira/browse/ARROW-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660922#comment-17660922 ] Rok Mihevc commented on ARROW-3898: --- This issue has been migrated to [issue #20511|https://github.com/apache/arrow/issues/20511] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > parquet-arrow example has compilation errors > > > Key: ARROW-3898 > URL: https://issues.apache.org/jira/browse/ARROW-3898 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Miao Wang >Assignee: Miao Wang >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > When compiling example, reader-writer.cc, it shows following compilation > errors: > *no member named 'cout' in namespace 'std'* > PARQUET_THROW_NOT_OK(arrow::PrettyPrint(*array, 4, ::cout)); > in multiple places. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-3884) [Python] Add LLVM6 to manylinux1 base image
[ https://issues.apache.org/jira/browse/ARROW-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-3884: -- External issue URL: https://github.com/apache/arrow/issues/20478 > [Python] Add LLVM6 to manylinux1 base image > --- > > Key: ARROW-3884 > URL: https://issues.apache.org/jira/browse/ARROW-3884 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This is necessary to be able to build and bundle libgandiva with the 0.12 > release > This (epic!) build definition in Apache Kudu may be useful for building only > the pieces that we need for linking the Gandiva libraries, which may help > keep the image size minimal > https://github.com/apache/kudu/blob/master/thirdparty/build-definitions.sh#L175 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4562) [C++][Flight] Create outgoing composite grpc::ByteBuffer instead of allocating contiguous slice and copying IpcPayload into it
[ https://issues.apache.org/jira/browse/ARROW-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661584#comment-17661584 ] Rok Mihevc commented on ARROW-4562: --- This issue has been migrated to [issue #21108|https://github.com/apache/arrow/issues/21108] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [C++][Flight] Create outgoing composite grpc::ByteBuffer instead of > allocating contiguous slice and copying IpcPayload into it > -- > > Key: ARROW-4562 > URL: https://issues.apache.org/jira/browse/ARROW-4562 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, FlightRPC >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > See discussion in https://github.com/apache/arrow/pull/3633 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4518) [JS] add jsdelivr to package.json
[ https://issues.apache.org/jira/browse/ARROW-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661540#comment-17661540 ] Rok Mihevc commented on ARROW-4518: --- This issue has been migrated to [issue #21069|https://github.com/apache/arrow/issues/21069] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [JS] add jsdelivr to package.json > - > > Key: ARROW-4518 > URL: https://issues.apache.org/jira/browse/ARROW-4518 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Dominik Moritz >Assignee: Dominik Moritz >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4538) [PYTHON] Remove index column from subschema in write_to_dataframe
[ https://issues.apache.org/jira/browse/ARROW-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4538: -- External issue URL: https://github.com/apache/arrow/issues/21087 > [PYTHON] Remove index column from subschema in write_to_dataframe > - > > Key: ARROW-4538 > URL: https://issues.apache.org/jira/browse/ARROW-4538 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0 >Reporter: Christian Thiel >Assignee: Christian Thiel >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h > Remaining Estimate: 0h > > When using {{pa.Table.from_pandas()}} with preserve_index=True and > dataframe.index.name!=None the prefix {{__index_level_}} is not added to the > respective schema name. This breaks {{write_to_dataset}} with active > partition columns. > {code} > import pyarrow as pa > import pyarrow.parquet as pq > import os > import shutil > import pandas as pd > import numpy as np > PATH_PYARROW_MANUAL = '/tmp/pyarrow_manual.pa/' > if os.path.exists(PATH_PYARROW_MANUAL): > shutil.rmtree(PATH_PYARROW_MANUAL) > os.mkdir(PATH_PYARROW_MANUAL) > arrays = np.array([np.array([0, 1, 2]), np.array([3, 4]), np.nan, np.nan]) > df = pd.DataFrame([0, 0, 1, 1], columns=['partition_column']) > df['arrays'] = pd.Series(arrays) > df.index.name='ID' > table = pa.Table.from_pandas(df, preserve_index=True) > print(table.schema.names) > pq.write_to_dataset(table, root_path=PATH_PYARROW_MANUAL, > partition_cols=['partition_column'], > preserve_index=True >) > {code} > Removing {{df.index.name='ID'}} works. Also disabling {{partition_cols}} in > {{write_to_dataset}} works. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3916) [Python] Support caller-provided filesystem in `ParquetWriter` constructor
[ https://issues.apache.org/jira/browse/ARROW-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660940#comment-17660940 ] Rok Mihevc commented on ARROW-3916: --- This issue has been migrated to [issue #20528|https://github.com/apache/arrow/issues/20528] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] Support caller-provided filesystem in `ParquetWriter` constructor > -- > > Key: ARROW-3916 > URL: https://issues.apache.org/jira/browse/ARROW-3916 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Affects Versions: 0.11.1 >Reporter: Mackenzie >Assignee: Mackenzie >Priority: Major > Labels: parquet, pull-request-available > Fix For: 0.12.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Currently to write files incrementally to S3, the following pattern appears > necessary: > {{def write_dfs_to_s3(dfs, fname):}} > {{ first_df = dfs[0]}} > {{ table = pa.Table.from_pandas(first_df, preserve_index=False)}} > {{ fs = s3fs.S3FileSystem()}} > {{ fh = fs.open(fname, 'wb')}} > {{ with pq.ParquetWriter(fh, table.schema) as writer:}} > {{ # set file handle on writer so writer manages closing it when it > is itself closed}} > {{ writer.file_handle = fh}} > {{ writer.write_table(table=table)}} > {{ for df in dfs[1:]:}} > {{ table = pa.Table.from_pandas(df, preserve_index=False)}} > {{ writer.write_table(table=table)}} > This works as expected, but is quite roundabout. It would be much easier if > `ParquetWriter` supported `filesystem` as a keyword argument in its > constructor, in which case the `_get_fs_from_path` would be overriden by the > usual pattern of using the kwarg after ensuring it is a proper file system > with `_ensure_filesystem`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4295) [Plasma] Incorrect log message when evicting objects
[ https://issues.apache.org/jira/browse/ARROW-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4295: -- External issue URL: https://github.com/apache/arrow/issues/20868 > [Plasma] Incorrect log message when evicting objects > > > Key: ARROW-4295 > URL: https://issues.apache.org/jira/browse/ARROW-4295 > Project: Apache Arrow > Issue Type: Bug > Components: C++, C++ - Plasma >Affects Versions: 0.11.1 >Reporter: Anurag Khandelwal >Assignee: Anurag Khandelwal >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When Plasma evicts objects on running out of memory, it prints log messages > of the form: > {quote}There is not enough space to create this object, so evicting x objects > to free up y bytes. The number of bytes in use (before this eviction) is > z.{quote} > However, the reported number of bytes in use (before this eviction) actually > reports the number of bytes *after* the eviction. A straightforward fix is to > simply replace z with (y+z). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4264) [C++] Document why DCHECKs are used in kernels
[ https://issues.apache.org/jira/browse/ARROW-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661288#comment-17661288 ] Rok Mihevc commented on ARROW-4264: --- This issue has been migrated to [issue #20841|https://github.com/apache/arrow/issues/20841] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [C++] Document why DCHECKs are used in kernels > -- > > Key: ARROW-4264 > URL: https://issues.apache.org/jira/browse/ARROW-4264 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h > Remaining Estimate: 0h > > DCHECKs seem to be used where Status::Invalid might be considered more > appropriate (so programs don't crash). See conversation on > [https://github.com/apache/arrow/pull/3287/files] > based on conversation on this Jira and on the CL it seems DCHECKS are in fact > desired but we should document appropriate use for them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4240) [Packaging] Documents for Plasma GLib and Gandiva GLib are missing in source archive
[ https://issues.apache.org/jira/browse/ARROW-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4240: -- External issue URL: https://github.com/apache/arrow/issues/20819 > [Packaging] Documents for Plasma GLib and Gandiva GLib are missing in source > archive > > > Key: ARROW-4240 > URL: https://issues.apache.org/jira/browse/ARROW-4240 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4478) Re-enable homebrew formula
[ https://issues.apache.org/jira/browse/ARROW-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4478: -- External issue URL: https://github.com/apache/arrow/issues/21034 > Re-enable homebrew formula > -- > > Key: ARROW-4478 > URL: https://issues.apache.org/jira/browse/ARROW-4478 > Project: Apache Arrow > Issue Type: Task > Components: Packaging >Reporter: Aki Ariga >Priority: Minor > > Since apache-arrow [formula on homebrew has been > removed|https://github.com/Homebrew/homebrew-core/pull/36063], brew > recommendation in R installation error message should be removed. > [https://github.com/apache/arrow/blob/master/r/configure#L90] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4486) [Python][CUDA] pyarrow.cuda.Context.foreign_buffer should have a `base=None` argument
[ https://issues.apache.org/jira/browse/ARROW-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661508#comment-17661508 ] Rok Mihevc commented on ARROW-4486: --- This issue has been migrated to [issue #21041|https://github.com/apache/arrow/issues/21041] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python][CUDA] pyarrow.cuda.Context.foreign_buffer should have a `base=None` > argument > - > > Key: ARROW-4486 > URL: https://issues.apache.org/jira/browse/ARROW-4486 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Pearu Peterson >Assignee: Pearu Peterson >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 2.5h > Remaining Estimate: 1.5h > > Similar to `pyarrow.foreign_buffer`, we need to keep the owner of cuda memory > alive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3926) [Python] Add Gandiva bindings to Python wheels
[ https://issues.apache.org/jira/browse/ARROW-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660950#comment-17660950 ] Rok Mihevc commented on ARROW-3926: --- This issue has been migrated to [issue #20536|https://github.com/apache/arrow/issues/20536] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] Add Gandiva bindings to Python wheels > -- > > Key: ARROW-3926 > URL: https://issues.apache.org/jira/browse/ARROW-3926 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Depends on adding LLVM6 to the build toolchain -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4493) [Rust] [DataFusion] make accumulate_scalar somewhat exhaustive and easier to read
[ https://issues.apache.org/jira/browse/ARROW-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4493: -- External issue URL: https://github.com/apache/arrow/issues/15965 > [Rust] [DataFusion] make accumulate_scalar somewhat exhaustive and easier to > read > - > > Key: ARROW-4493 > URL: https://issues.apache.org/jira/browse/ARROW-4493 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Affects Versions: 0.12.0 >Reporter: Tupshin Harper >Assignee: Andy Grove >Priority: Trivial > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > make accumulate_scalar somewhat exhaustive and easier to read > > The current implementation doesn't leverage any of the exhaustiveness > checking of matching.This can be made simpler and partially exhaustive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4283) [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?
[ https://issues.apache.org/jira/browse/ARROW-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4283: -- External issue URL: https://github.com/apache/arrow/issues/20858 > [Python] Should RecordBatchStreamReader/Writer be AsyncIterable? > > > Key: ARROW-4283 > URL: https://issues.apache.org/jira/browse/ARROW-4283 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Paul Taylor >Priority: Minor > > Filing this issue after a discussion today with [~xhochy] about how to > implement streaming pyarrow http services. I had attempted to use both Flask > and [aiohttp|https://aiohttp.readthedocs.io/en/stable/streams.html]'s > streaming interfaces because they seemed familiar, but no dice. I have no > idea how hard this would be to add -- supporting all the asynciterable > primitives in JS was non-trivial. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4272) [Python] Illegal hardware instruction on pyarrow import
[ https://issues.apache.org/jira/browse/ARROW-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661296#comment-17661296 ] Rok Mihevc commented on ARROW-4272: --- This issue has been migrated to [issue #20848|https://github.com/apache/arrow/issues/20848] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] Illegal hardware instruction on pyarrow import > --- > > Key: ARROW-4272 > URL: https://issues.apache.org/jira/browse/ARROW-4272 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.11.1 > Environment: Python 3.6.7 > PySpark 2.4.0 > PyArrow: 0.11.1 > Pandas: 0.23.4 > NumPy: 1.15.4 > OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 > x86_64 x86_64 GNU/Linux >Reporter: Elchin >Priority: Critical > Attachments: core > > > I can't import pyarrow, it crashes: > {code:java} > >>> import pyarrow as pa > [1] 31441 illegal hardware instruction (core dumped) python3{code} > Core dump is attached to issue, it can help you to understand what is the > problem. > The environment is: > Python 3.6.7 > PySpark 2.4.0 > PyArrow: 0.11.1 > Pandas: 0.23.4 > NumPy: 1.15.4 > OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 > x86_64 x86_64 x86_64 GNU/Linux -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-3894) [Python] Error reading IPC file with no record batches
[ https://issues.apache.org/jira/browse/ARROW-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-3894: -- External issue URL: https://github.com/apache/arrow/issues/20503 > [Python] Error reading IPC file with no record batches > -- > > Key: ARROW-3894 > URL: https://issues.apache.org/jira/browse/ARROW-3894 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.11.1 >Reporter: Rik Coenders >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > When using the RecordBatchFileWriter without actually writing a record batch. > The magic byte at the beginning of the file is not written. This causes the > exception File is smaller than indicated metadata size when reading that file > with the RecordBatchFileReader. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3921) [CI][GLib] Log Homebrew output
[ https://issues.apache.org/jira/browse/ARROW-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660945#comment-17660945 ] Rok Mihevc commented on ARROW-3921: --- This issue has been migrated to [issue #20532|https://github.com/apache/arrow/issues/20532] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [CI][GLib] Log Homebrew output > -- > > Key: ARROW-3921 > URL: https://issues.apache.org/jira/browse/ARROW-3921 > Project: Apache Arrow > Issue Type: Sub-task > Components: GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We need more information to fix {{brew update}} problem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4485) [CI] Determine maintenance approach to pinned conda-forge binutils package
[ https://issues.apache.org/jira/browse/ARROW-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4485: -- External issue URL: https://github.com/apache/arrow/issues/21040 > [CI] Determine maintenance approach to pinned conda-forge binutils package > -- > > Key: ARROW-4485 > URL: https://issues.apache.org/jira/browse/ARROW-4485 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Uwe Korn >Priority: Major > Fix For: 0.13.0 > > > In ARROW-4469 https://github.com/apache/arrow/pull/3554 we pinned binutils > 2.31 because the 2.32 release broke builds on Ubuntu Xenial. We aren't sure > what will be our path going forward to rely on the conda-forge toolchain > because of this -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4164) [Rust] ArrayBuilder should not have associated types
[ https://issues.apache.org/jira/browse/ARROW-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4164: -- External issue URL: https://github.com/apache/arrow/issues/20749 > [Rust] ArrayBuilder should not have associated types > > > Key: ARROW-4164 > URL: https://issues.apache.org/jira/browse/ARROW-4164 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Priority: Minor > > When dealing with arrays at runtime it is possible to represent them with > ArrayRef, allowing for things like Vec and then downcasting arrays > as needed based on schema meta data. > I am now trying to do the same thing with ArrayBuilder but because this trait > has an associated type, I cannot create a Vec. This makes it > difficult in some cases to dynamic build arrays. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4557) [JS] Add Table/Schema/RecordBatch `selectAt(...indices)` method
[ https://issues.apache.org/jira/browse/ARROW-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4557: -- External issue URL: https://github.com/apache/arrow/issues/21104 > [JS] Add Table/Schema/RecordBatch `selectAt(...indices)` method > --- > > Key: ARROW-4557 > URL: https://issues.apache.org/jira/browse/ARROW-4557 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.1 > > > Presently Table, Schema, and RecordBatch have basic {{select(...colNames)}} > implementations. Having an easy {{selectAt(...colIndices)}} impl would be a > nice complement, especially when there are duplicate column names. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store
[ https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4294: -- External issue URL: https://github.com/apache/arrow/issues/20867 > [Plasma] Add support for evicting objects to external store > --- > > Key: ARROW-4294 > URL: https://issues.apache.org/jira/browse/ARROW-4294 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, C++ - Plasma >Affects Versions: 0.11.1 >Reporter: Anurag Khandelwal >Assignee: Anurag Khandelwal >Priority: Minor > Labels: features, pull-request-available > Fix For: 0.13.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Currently, when Plasma needs storage space for additional objects, it evicts > objects by deleting them from the Plasma store. This is a problem when it > isn't possible to reconstruct the object or reconstructing it is expensive. > Adding support for a pluggable external store that Plasma can evict objects > to will address this issue. > My proposal is described below. > *Requirements* > * Objects in Plasma should be evicted to a external store rather than being > removed altogether > * Communication to the external storage service should be through a very > thin, shim interface. At the same time, the interface should be general > enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.) > * Should be pluggable (e.g., it should be simple to add in or remove the > external storage service for eviction, switch between different remote > services, etc.) and easy to implement > *Assumptions/Non-Requirements* > * The external store has practically infinite storage > * The external store's write operation is idempotent and atomic; this is > needed ensure there are no race conditions due to multiple concurrent > evictions of the same object. > *Proposed Implementation* > * Define a ExternalStore interface with a Connect call. The call returns an > ExternalStoreHandle, that exposes Put and Get calls. Any external store that > needs to be supported has to have this interface implemented. > * In order to read or write data to the external store in a thread-safe > manner, one ExternalStoreHandle should be created per-thread. While the > ExternalStoreHandle itself is not required to be thread-safe, multiple > ExternalStoreHandles across multiple threads should be able to modify the > external store in a thread-safe manner. These handles are most likely going > to be wrappers around the external store client interfaces. > * Replace the DeleteObjects method in the Plasma Store with an EvictObjects > method. If an external store is specified for the Plasma store, the > EvictObjects method would mark the object state as PLASMA_EVICTED, write the > object data to the external store (via the ExternalStoreHandle) and reclaim > the memory associated with the object data/metadata rather than remove the > entry from the Object Table altogether. In case there is no valid external > store, the eviction path would remain the same (i.e., the object entry is > still deleted from the Object Table). > * The Get method in Plasma Store now tries to fetch the object from external > store if it is not found locally and there is an external store associated > with the Plasma Store. The method tries to offload this to an external worker > thread pool with a fire-and-forget model, but may need to do this > synchronously if there are too many requests already enqueued. > * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, > which can be appended to with implementations of the ExternalStore and > ExternalStoreHandle interfaces, which will then be compiled into the > plasma_store_server executable. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4290) [C++/Gandiva] Support detecting correct LLVM version in Homebrew
[ https://issues.apache.org/jira/browse/ARROW-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661314#comment-17661314 ] Rok Mihevc commented on ARROW-4290: --- This issue has been migrated to [issue #20864|https://github.com/apache/arrow/issues/20864] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [C++/Gandiva] Support detecting correct LLVM version in Homebrew > > > Key: ARROW-4290 > URL: https://issues.apache.org/jira/browse/ARROW-4290 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, C++ - Gandiva >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 40m > Remaining Estimate: 0h > > We should also search in homebrew for the matching LLVM version for Gandiva > on OSX. You can install it via {{brew install llvm@6}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4592) [GLib] Stop configure immediately when GLib isn't available
[ https://issues.apache.org/jira/browse/ARROW-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661615#comment-17661615 ] Rok Mihevc commented on ARROW-4592: --- This issue has been migrated to [issue #21134|https://github.com/apache/arrow/issues/21134] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [GLib] Stop configure immediately when GLib isn't available > --- > > Key: ARROW-4592 > URL: https://issues.apache.org/jira/browse/ARROW-4592 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4503) [C#] ArrowStreamReader allocates and copies data excessively
[ https://issues.apache.org/jira/browse/ARROW-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661525#comment-17661525 ] Rok Mihevc commented on ARROW-4503: --- This issue has been migrated to [issue #21055|https://github.com/apache/arrow/issues/21055] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [C#] ArrowStreamReader allocates and copies data excessively > > > Key: ARROW-4503 > URL: https://issues.apache.org/jira/browse/ARROW-4503 > Project: Apache Arrow > Issue Type: Improvement > Components: C# >Reporter: Eric Erhardt >Assignee: Eric Erhardt >Priority: Major > Labels: performance, pull-request-available > Fix For: 0.14.0 > > Original Estimate: 48h > Time Spent: 12h 10m > Remaining Estimate: 35h 50m > > When reading `RecordBatch` instances using the `ArrowStreamReader` class, it > is currently allocating and copying memory 3 times for the data. > # It is allocating memory in order to [read the data from the > Stream|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/Ipc/ArrowStreamReader.cs#L72-L74], > and then reading from the Stream. (This should be the only allocation that > is necessary.) > # It then [creates a new > `ArrowBuffer.Builder`|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/Ipc/ArrowStreamReader.cs#L227-L228], > which allocates another `byte[]`, and calls `Append` on it, which copies the > values to the new `byte[]`. > # Finally, it then calls `.Build()` on the `ArrowBuffer.Builder`, which > [allocates memory from the MemoryPool, and then copies the intermediate > buffer|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/ArrowBuffer.Builder.cs#L112-L121] > into it. > > We should reduce this overhead to only allocating a single time (from the > MemoryPool), and not copying the data more times than necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4260) [Python] test_serialize_deserialize_pandas is failing in multiple build entries
[ https://issues.apache.org/jira/browse/ARROW-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661284#comment-17661284 ] Rok Mihevc commented on ARROW-4260: --- This issue has been migrated to [issue #20837|https://github.com/apache/arrow/issues/20837] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] test_serialize_deserialize_pandas is failing in multiple build > entries > --- > > Key: ARROW-4260 > URL: https://issues.apache.org/jira/browse/ARROW-4260 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Hatem Helal >Assignee: Krisztian Szucs >Priority: Blocker > Labels: ci-failure, pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > See > [https://travis-ci.org/apache/arrow/jobs/479378190#L2427] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-3926) [Python] Add Gandiva bindings to Python wheels
[ https://issues.apache.org/jira/browse/ARROW-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-3926: -- External issue URL: https://github.com/apache/arrow/issues/20536 > [Python] Add Gandiva bindings to Python wheels > -- > > Key: ARROW-3926 > URL: https://issues.apache.org/jira/browse/ARROW-3926 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Depends on adding LLVM6 to the build toolchain -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4492) [Python] Failure reading Parquet column as pandas Categorical in 0.12
[ https://issues.apache.org/jira/browse/ARROW-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4492: -- External issue URL: https://github.com/apache/arrow/issues/21046 > [Python] Failure reading Parquet column as pandas Categorical in 0.12 > - > > Key: ARROW-4492 > URL: https://issues.apache.org/jira/browse/ARROW-4492 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0 >Reporter: George Sakkis >Priority: Major > Labels: Parquet > Fix For: 0.12.1 > > Attachments: slug.pq > > > On pyarrow 0.12.0 some (but not all) columns cannot be read as category > dtype. Attached is an extracted failing sample. > {noformat} > import dask.dataframe as dd > df = dd.read_parquet('slug.pq', categories=['slug'], > engine='pyarrow').compute() > print(len(df['slug'].dtype.categories)) > {noformat} > This works on pyarrow 0.11.1 (and fastparquet 0.2.1). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4594) [Ruby] Arrow::StructArray#[] returns Arrow::Struct instead of Arrow::Array
[ https://issues.apache.org/jira/browse/ARROW-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4594: -- External issue URL: https://github.com/apache/arrow/issues/21136 > [Ruby] Arrow::StructArray#[] returns Arrow::Struct instead of Arrow::Array > -- > > Key: ARROW-4594 > URL: https://issues.apache.org/jira/browse/ARROW-4594 > Project: Apache Arrow > Issue Type: Improvement > Components: Ruby >Affects Versions: 0.12.0 >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4519) [JS] Publish JS API Docs for v0.4.0
[ https://issues.apache.org/jira/browse/ARROW-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4519: -- External issue URL: https://github.com/apache/arrow/issues/21070 > [JS] Publish JS API Docs for v0.4.0 > --- > > Key: ARROW-4519 > URL: https://issues.apache.org/jira/browse/ARROW-4519 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4268) [C++] Add C primitive to Arrow:Type compile time in TypeTraits
[ https://issues.apache.org/jira/browse/ARROW-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4268: -- External issue URL: https://github.com/apache/arrow/issues/20844 > [C++] Add C primitive to Arrow:Type compile time in TypeTraits > -- > > Key: ARROW-4268 > URL: https://issues.apache.org/jira/browse/ARROW-4268 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Original Estimate: 1h > Time Spent: 1h 40m > Remaining Estimate: 0h > > The user would use something like > {code:c++} > ... > using ArrowType = CTypeTraits::ArrowType; > using ArrayType = CTypeTraits::ArrayType; > auto type = CTypeTraits::type_singleton(); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4539) [Java]List vector child value count not set correctly
[ https://issues.apache.org/jira/browse/ARROW-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4539: -- External issue URL: https://github.com/apache/arrow/issues/21088 > [Java]List vector child value count not set correctly > - > > Key: ARROW-4539 > URL: https://issues.apache.org/jira/browse/ARROW-4539 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Praveen Kumar >Assignee: Praveen Kumar >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We are not correctly processing list vectors that could have null values. The > child value count would be off there by losing data in variable width vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4470) [Python] Pyarrow using considerable more memory when reading partitioned Parquet file
[ https://issues.apache.org/jira/browse/ARROW-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661491#comment-17661491 ] Rok Mihevc commented on ARROW-4470: --- This issue has been migrated to [issue #21027|https://github.com/apache/arrow/issues/21027] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] Pyarrow using considerable more memory when reading partitioned > Parquet file > - > > Key: ARROW-4470 > URL: https://issues.apache.org/jira/browse/ARROW-4470 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0 >Reporter: Ivan SPM >Priority: Major > Labels: dataset, datasets, parquet > Fix For: 0.16.0 > > > Hi, > I have a partitioned Parquet table in Impala in HDFS, using Hive metastore, > with the following structure: > {{/data/myparquettable/year=2016}}{{/data/myparquettable/year=2016/myfile_1.prt}} > {{/data/myparquettable/year=2016/myfile_2.prt}} > {{/data/myparquettable/year=2016/myfile_3.prt}} > {{/data/myparquettable/year=2017}} > {{/data/myparquettable/year=2017/myfile_1.prt}} > {{/data/myparquettable/year=2017/myfile_2.prt}} > {{/data/myparquettable/year=2017/myfile_3.prt}} > and so on. I need to work with one partition, so I copied one partition to a > local filesystem: > {{hdfs fs -get /data/myparquettable/year=2017 /local/}} > so now I have some data on the local disk: > {{/local/year=2017/myfile_1.prt }}{{/local/year=2017/myfile_2.prt }} > etc.I tried to read it using Pyarrow: > {{import pyarrow.parquet as pq}}{{pq.read_parquet('/local/year=2017')}} > and it starts reading. The problem is that the local Parquet files are around > 15GB total, and I blew up my machine memory a couple of times because when > reading these files, Pyarrow is using more than 60GB of RAM, and I'm not sure > how much it will take because it never finishes. Is this expected? Is there a > workaround? > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660957#comment-17660957 ] Rok Mihevc commented on ARROW-3933: --- This issue has been migrated to [issue #20542|https://github.com/apache/arrow/issues/20542] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] Segfault reading Parquet files from GNOMAD > --- > > Key: ARROW-3933 > URL: https://issues.apache.org/jira/browse/ARROW-3933 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Ubuntu 18.04 or Mac OS X >Reporter: David Konerding >Assignee: Wes McKinney >Priority: Minor > Labels: parquet, pull-request-available > Fix For: 0.15.0 > > Attachments: > part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > > Time Spent: 0.5h > Remaining Estimate: 0h > > I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). > Error also occurs out of box on Mac OS X. > $ sudo snap install --classic google-cloud-sdk > $ gsutil cp > gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > . > $ conda install pyarrow > $ python test.py > Segmentation fault (core dumped) > test.py: > import pyarrow.parquet as pq > path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet" > pq.read_table(path) > gdb output: > Thread 3 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffdf199700 (LWP 13703)] > 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, > unsigned long*) () from > /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11 > I tested fastparquet, it reads the file just fine. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4512) [R] Stream reader/writer API that takes socket stream
[ https://issues.apache.org/jira/browse/ARROW-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661534#comment-17661534 ] Rok Mihevc commented on ARROW-4512: --- This issue has been migrated to [issue #21063|https://github.com/apache/arrow/issues/21063] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [R] Stream reader/writer API that takes socket stream > - > > Key: ARROW-4512 > URL: https://issues.apache.org/jira/browse/ARROW-4512 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Affects Versions: 0.12.0, 0.14.1, 1.0.0 >Reporter: Hyukjin Kwon >Assignee: Dewey Dunnington >Priority: Major > Fix For: 8.0.0 > > > I have been working on Spark integration with Arrow. > I realised that there are no ways to use socket as input to use Arrow stream > format. For instance, > I want to something like: > {code} > connStream <- socketConnection(port = , blocking = TRUE, open = "wb") > rdf_slices <- # a list of data frames. > stream_writer <- NULL > tryCatch({ > for (rdf_slice in rdf_slices) { > batch <- record_batch(rdf_slice) > if (is.null(stream_writer)) { > stream_writer <- RecordBatchStreamWriter(connStream, batch$schema) # > Here, looks there's no way to use socket. > } > stream_writer$write_batch(batch) > } > }, > finally = { > if (!is.null(stream_writer)) { > stream_writer$close() > } > }) > {code} > Likewise, I cannot find a way to iterate the stream batch by batch > {code} > RecordBatchStreamReader(connStream)$batches() # Here, looks there's no way > to use socket. > {code} > This looks easily possible in Python side but looks missing in R APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4508) [Format] Copy content from Layout.rst to new document.
[ https://issues.apache.org/jira/browse/ARROW-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661530#comment-17661530 ] Rok Mihevc commented on ARROW-4508: --- This issue has been migrated to [issue #21059|https://github.com/apache/arrow/issues/21059] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Format] Copy content from Layout.rst to new document. > -- > > Key: ARROW-4508 > URL: https://issues.apache.org/jira/browse/ARROW-4508 > Project: Apache Arrow > Issue Type: Sub-task > Components: Format >Reporter: Micah Kornfield >Assignee: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4515) [C++, lint] Use clang-format more efficiently in `check-format` target
[ https://issues.apache.org/jira/browse/ARROW-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4515: -- External issue URL: https://github.com/apache/arrow/issues/21066 > [C++, lint] Use clang-format more efficiently in `check-format` target > -- > > Key: ARROW-4515 > URL: https://issues.apache.org/jira/browse/ARROW-4515 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Ben Kietzman >Priority: Minor > Labels: good-first-issue > > `clang-format` supports command line option `-output-replacements-xml` which > (in the case of no required changes) outputs: > ``` > > > > ``` > Using this option during `check-format` instead of using python to compute a > diff between formatted and on-disk should speed up that target significantly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4554) [JS] Implement logic for combining Vectors with different lengths/chunksizes
[ https://issues.apache.org/jira/browse/ARROW-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4554: -- External issue URL: https://github.com/apache/arrow/issues/21102 > [JS] Implement logic for combining Vectors with different lengths/chunksizes > > > Key: ARROW-4554 > URL: https://issues.apache.org/jira/browse/ARROW-4554 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.1 > > > We should add logic to combine and possibly slice/re-chunk and uniformly > partition chunks into separate RecordBatches. This will make it easier to > create Tables or RecordBatches from Vectors of different lengths. This is > also necessary for {{Table#assign()}}. PR incoming. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4508) [Format] Copy content from Layout.rst to new document.
[ https://issues.apache.org/jira/browse/ARROW-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4508: -- External issue URL: https://github.com/apache/arrow/issues/21059 > [Format] Copy content from Layout.rst to new document. > -- > > Key: ARROW-4508 > URL: https://issues.apache.org/jira/browse/ARROW-4508 > Project: Apache Arrow > Issue Type: Sub-task > Components: Format >Reporter: Micah Kornfield >Assignee: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4276) [Release] Remove needless Bintray authentication from binaries verify script
[ https://issues.apache.org/jira/browse/ARROW-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661300#comment-17661300 ] Rok Mihevc commented on ARROW-4276: --- This issue has been migrated to [issue #20851|https://github.com/apache/arrow/issues/20851] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Release] Remove needless Bintray authentication from binaries verify script > > > Key: ARROW-4276 > URL: https://issues.apache.org/jira/browse/ARROW-4276 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-3906) [C++] Break builder.cc into multiple compilation units
[ https://issues.apache.org/jira/browse/ARROW-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-3906: -- External issue URL: https://github.com/apache/arrow/issues/15888 > [C++] Break builder.cc into multiple compilation units > -- > > Key: ARROW-3906 > URL: https://issues.apache.org/jira/browse/ARROW-3906 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.11.1 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > To improve readability I suggest splitting {{builder.cc}} into independent > compilation units. Concrete builder classes are generally independent of each > other. The only concern is whether inlining some of the base class > implementations is important for performance. > This would also make incremental compilation faster when changing one of the > concrete classes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4579) [JS] Add more interop with BigInt/BigInt64Array/BigUint64Array
[ https://issues.apache.org/jira/browse/ARROW-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661602#comment-17661602 ] Rok Mihevc commented on ARROW-4579: --- This issue has been migrated to [issue #21123|https://github.com/apache/arrow/issues/21123] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [JS] Add more interop with BigInt/BigInt64Array/BigUint64Array > -- > > Key: ARROW-4579 > URL: https://issues.apache.org/jira/browse/ARROW-4579 > Project: Apache Arrow > Issue Type: New Feature > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.1 > > > We should use or return the new native [BigInt > types|https://developers.google.com/web/updates/2018/05/bigint] whenever it's > available. > * Use the native {{BigInt}} to convert/stringify i64s/u64s > * Support the {{BigInt}} type in element comparator and {{indexOf()}} > * Add zero-copy {{toBigInt64Array()}} and {{toBigUint64Array()}} methods to > {{Int64Vector}} and {{Uint64Vector}}, respectively -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4585) [C++] Dependency of Flight C++ sources on generated protobuf is not respected
[ https://issues.apache.org/jira/browse/ARROW-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4585: -- External issue URL: https://github.com/apache/arrow/issues/15977 > [C++] Dependency of Flight C++ sources on generated protobuf is not respected > - > > Key: ARROW-4585 > URL: https://issues.apache.org/jira/browse/ARROW-4585 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Seems like we have a race condition somewhere as we frequently run into > {code:java} > [82/273] Building CXX object > src/arrow/flight/CMakeFiles/arrow_flight_testing_objlib.dir/test-util.cc.o > FAILED: > src/arrow/flight/CMakeFiles/arrow_flight_testing_objlib.dir/test-util.cc.o > /usr/local/bin/ccache > /Users/ukorn/miniconda3/envs/pyarrow-dev-2/bin/x86_64-apple-darwin13.4.0-clang++ > -DARROW_JEMALLOC > -DARROW_JEMALLOC_INCLUDE_DIR=/Users/ukorn/Development/arrow-repos-2/arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep/dist//include > -DARROW_USE_GLOG -DARROW_USE_SIMD -DARROW_WITH_BROTLI -DARROW_WITH_LZ4 > -DARROW_WITH_SNAPPY -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -Isrc -I../src > -isystem /Users/ukorn/miniconda3/envs/pyarrow-dev-2/include -isystem > gbenchmark_ep/src/gbenchmark_ep-install/include -isystem > jemalloc_ep-prefix/src -isystem ../thirdparty/hadoop/include -isystem > /Users/ukorn/miniconda3/envs/pyarrow-dev-2/include/thrift -march=core2 > -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong > -O2 -pipe -stdlib=libc++ -fvisibility-inlines-hidden -std=c++14 > -fmessage-length=0 -Qunused-arguments -O3 -DNDEBUG -Wall > -Wno-unknown-warning-option -msse4.2 -stdlib=libc++ -O3 -DNDEBUG -fPIC > -std=gnu++11 -MD -MT > src/arrow/flight/CMakeFiles/arrow_flight_testing_objlib.dir/test-util.cc.o > -MF > src/arrow/flight/CMakeFiles/arrow_flight_testing_objlib.dir/test-util.cc.o.d > -o src/arrow/flight/CMakeFiles/arrow_flight_testing_objlib.dir/test-util.cc.o > -c ../src/arrow/flight/test-util.cc > In file included from ../src/arrow/flight/test-util.cc:35: > In file included from ../src/arrow/flight/internal.h:29: > ../src/arrow/flight/protocol-internal.h:22:10: fatal error: > 'arrow/flight/Flight.grpc.pb.h' file not found > #include "arrow/flight/Flight.grpc.pb.h" > ^~~ > 1 error generated. > [87/273] Building CXX object > src/arrow/python/CMakeFiles/arrow_python_objlib.dir/python_to_arrow.cc.o > ninja: build stopped: subcommand failed. > ninja 672,82s user 33,40s system 196% cpu 5:59,62 total{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4481) [Website] Instructions for publishing web site are missing a step
[ https://issues.apache.org/jira/browse/ARROW-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4481: -- External issue URL: https://github.com/apache/arrow/issues/21037 > [Website] Instructions for publishing web site are missing a step > - > > Key: ARROW-4481 > URL: https://issues.apache.org/jira/browse/ARROW-4481 > Project: Apache Arrow > Issue Type: Improvement > Components: Website >Reporter: Andy Grove >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The instructions for publishing the web site say to run the > "scripts/sync_format_docs.sh" which copies the top level "format" directory > to the "_docs" directory under site. > Existing files in "_docs" have references to markdown files in the > "_docs/format" directory. For example, the IPC.md contains: > {code:java} > {% include_relative format/IPC.md %}{code} > However my top level format directory does not contain this IPC.md, so I get > errors when running jekyll and I have had to create some dummy markdown files > as a workaround. > I investigated this a bit and I think there is some prerequisite step that > isn't documented that would cause Sphinx to run and generate docs? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4480) [Python] Drive letter removed when writing parquet file
[ https://issues.apache.org/jira/browse/ARROW-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661502#comment-17661502 ] Rok Mihevc commented on ARROW-4480: --- This issue has been migrated to [issue #21036|https://github.com/apache/arrow/issues/21036] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] Drive letter removed when writing parquet file > > > Key: ARROW-4480 > URL: https://issues.apache.org/jira/browse/ARROW-4480 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0 >Reporter: Seb Fru >Assignee: Antoine Pitrou >Priority: Blocker > Labels: parquet > Fix For: 0.13.0 > > > Hi everyone, > > importing this from Github: > > I encountered a problem while working with pyarrow: I am working on Windows > 10. When I want to save a table using pq.write_table(tab, > r'E:\parquetfiles\file1.parquet'), I get the Error "No such file or > directory". > After searching a bit, i found out that the drive letter is getting removed > while parsing the where string, but I could not find a way solve my problem: > I can write the files on my C:\ drive without problems, but I am not able to > write a parquet file on another drive than C:. > Am I doing something wrong or is this just how it works? I would really > appreciate any help, because I just cannot fit my files on C: drive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4510) [Format] copy content from IPC.rst to new document.
[ https://issues.apache.org/jira/browse/ARROW-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4510: -- External issue URL: https://github.com/apache/arrow/issues/21061 > [Format] copy content from IPC.rst to new document. > --- > > Key: ARROW-4510 > URL: https://issues.apache.org/jira/browse/ARROW-4510 > Project: Apache Arrow > Issue Type: Sub-task > Components: Format >Reporter: Micah Kornfield >Assignee: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4275) [C++] gandiva-decimal_single_test extremely slow
[ https://issues.apache.org/jira/browse/ARROW-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4275: -- External issue URL: https://github.com/apache/arrow/issues/15940 > [C++] gandiva-decimal_single_test extremely slow > > > Key: ARROW-4275 > URL: https://issues.apache.org/jira/browse/ARROW-4275 > Project: Apache Arrow > Issue Type: Bug > Components: C++, C++ - Gandiva, Continuous Integration >Affects Versions: 0.11.1 >Reporter: Antoine Pitrou >Assignee: Pindikura Ravindra >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > {{gandiva-decimal_single_test}} is extremely slow on CI builds with Valgrind: > {code} > 99/100 Test #128: gandiva-decimal_single_test ... Passed > 397.11 sec > 100/100 Test #130: gandiva-decimal_single_test_static Passed > 338.97 sec > {code} > (full log: https://travis-ci.org/apache/arrow/jobs/480198116#L2707) > Something should be done to make it faster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package
[ https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4526: -- External issue URL: https://github.com/apache/arrow/issues/21076 > [Java] Remove Netty references from ArrowBuf and move Allocator out of vector > package > - > > Key: ARROW-4526 > URL: https://issues.apache.org/jira/browse/ARROW-4526 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Jacques Nadeau >Assignee: Liya Fan >Priority: Critical > Fix For: 1.0.0 > > > Arrow currently has a hard dependency on Netty and exposes this in public > APIs. This shouldn't be the case. There could be many allocator > implementations with Netty as one possible option. We should remove hard > dependency between arrow-vector and Netty, instead creating a trivial > allocator. ArrowBuf should probably expose an T unwrap(Class clazz) > method instead to allow inner providers availability without a hard > reference. This should also include drastically reducing the number of > methods on ArrowBuf as right now it includes every method from ByteBuf but > many of those are not very useful, appropriate. > This work should come after we do the simpler ARROW-3191 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4559) [Python] pyarrow can't read/write filenames with special characters
[ https://issues.apache.org/jira/browse/ARROW-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4559: -- External issue URL: https://github.com/apache/arrow/issues/21106 > [Python] pyarrow can't read/write filenames with special characters > --- > > Key: ARROW-4559 > URL: https://issues.apache.org/jira/browse/ARROW-4559 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0 > Environment: $ python3 --version > Python 3.6.6 > $ pip3 freeze | grep -Ei 'pyarrow|pandas' > pandas==0.24.1 > pyarrow==0.12.0 >Reporter: Jean-Christophe Petkovich >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When writing or reading files to or from paths that have special characters > in them, (e.g., "#"), pyarrow returns an error: > {code:python} > OSError: Passed non-file path... > {code} > This is a consequence of the following line: > https://github.com/apache/arrow/blob/master/python/pyarrow/filesystem.py#L416 > File-paths will be parsed as URIs, which will give strange results for > filepaths like: "bad # actor.parquet": > ParseResult(scheme='', netloc='', path='/tmp/bad ', params='', query='', > fragment='actor.parquet') > This is trivial to reproduce with the following code which uses the > `pd.to_parquet` and `pd.read_parquet` interfaces: > {code:python} > import pandas as pd > x = pd.DataFrame({"a": [1,2,3]}) > x.to_parquet("bad # actor.parquet") > x.read_parquet("bad # actor.parquet") > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3958) [Plasma] Reduce number of IPCs
[ https://issues.apache.org/jira/browse/ARROW-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660981#comment-17660981 ] Rok Mihevc commented on ARROW-3958: --- This issue has been migrated to [issue #15897|https://github.com/apache/arrow/issues/15897] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Plasma] Reduce number of IPCs > -- > > Key: ARROW-3958 > URL: https://issues.apache.org/jira/browse/ARROW-3958 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Affects Versions: 0.11.1 >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently we ship file descriptors of objects from the store to the client > every time an object is created or gotten. There is relatively few distinct > file descriptors, so caching them can get rid of one IPC in the majority of > cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-2837) [C++] ArrayBuilder::null_bitmap returns PoolBuffer
[ https://issues.apache.org/jira/browse/ARROW-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-2837: -- External issue URL: https://github.com/apache/arrow/issues/19213 > [C++] ArrayBuilder::null_bitmap returns PoolBuffer > -- > > Key: ARROW-2837 > URL: https://issues.apache.org/jira/browse/ARROW-2837 > Project: Apache Arrow > Issue Type: Task > Components: C++ >Affects Versions: 0.9.0 >Reporter: Dimitri Vorona >Assignee: Dimitri Vorona >Priority: Major > Fix For: 0.10.0 > > > A simple buffer (like in case of ArrayBuilder::Data) seem to be enough to me, > and it doesn't break anything. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-3996) [C++] Insufficient description on build
[ https://issues.apache.org/jira/browse/ARROW-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-3996: -- External issue URL: https://github.com/apache/arrow/issues/15900 > [C++] Insufficient description on build > --- > > Key: ARROW-3996 > URL: https://issues.apache.org/jira/browse/ARROW-3996 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.11.1 > Environment: Ubuntu Linux(Include Ubuntu 18.04 LTS on Windows 10 WSL) >Reporter: Kunihisa Abukawa >Assignee: Kunihisa Abukawa >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > C / C ++ version of Ubuntu Linux environment requires less library / > component description. > Requirement > * g ++ > * autoconf > * Jemalloc > In accordance with the above, you need to add the following to the library / > component you install with apt-get install. > * libboost-regex-dev > * libjemalloc-dev > * autotools-dev -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-2857) [Python] Expose integration test JSON read/write in Python API
[ https://issues.apache.org/jira/browse/ARROW-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17659882#comment-17659882 ] Rok Mihevc commented on ARROW-2857: --- This issue has been migrated to [issue #19230|https://github.com/apache/arrow/issues/19230] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] Expose integration test JSON read/write in Python API > -- > > Key: ARROW-2857 > URL: https://issues.apache.org/jira/browse/ARROW-2857 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Wes McKinney >Priority: Major > Labels: beginner > > This should be clearly marked to not be used for persistence -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-2838) [Python] Speed up null testing with Pandas semantics
[ https://issues.apache.org/jira/browse/ARROW-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-2838: -- External issue URL: https://github.com/apache/arrow/issues/19214 > [Python] Speed up null testing with Pandas semantics > > > Key: ARROW-2838 > URL: https://issues.apache.org/jira/browse/ARROW-2838 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The {{PandasObjectIsNull}} helper function can be a significant contributor > when converting a Pandas dataframe to Arrow format (e.g. when writing a > dataframe to feather format). We can try to speed up the type checks in that > function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-2831) [Plasma] MemoryError in teardown
[ https://issues.apache.org/jira/browse/ARROW-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-2831: -- External issue URL: https://github.com/apache/arrow/issues/19208 > [Plasma] MemoryError in teardown > > > Key: ARROW-2831 > URL: https://issues.apache.org/jira/browse/ARROW-2831 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Plasma >Reporter: Uwe Korn >Priority: Major > Fix For: 0.12.0 > > > There seems to be some flakiness in Plasma tests, e.g. see: > https://api.travis-ci.org/v3/job/402544643/log.txt > {code} > ERRORS > > _ ERROR at teardown of TestPlasmaClient.test_subscribe > _ > self = > test_method = > > [1mdef teardown_method(self, test_method):[0m > [1mtry:[0m > [1m# Check that the Plasma store is still alive.[0m > [1massert self.p.poll() is None[0m > [1m# Ensure Valgrind and/or coverage have a clean exit[0m > [1mself.p.send_signal(signal.SIGTERM)[0m > [1mif sys.version_info >= (3, 3):[0m > [1mself.p.wait(timeout=5)[0m > [1melse:[0m > [1mself.p.wait()[0m > [1m> assert self.p.returncode == 0[0m > [1m[31mE assert 1 == 0[0m > [1m[31mE+ where 1 = 0x7f9a3dcd5850>.returncode[0m > [1m[31mE+where > = .p[0m > [1m[31mpyarrow/tests/test_plasma.py[0m:132: AssertionError > Captured stderr setup > - > ==20909== Memcheck, a memory error detector > ==20909== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. > ==20909== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info > ==20909== Command: > /home/travis/build/apache/arrow/python/pyarrow/plasma_store -s > /tmp/test_plasma-Dzj8IQ/plasma.sock -m 1 > ==20909== > Allowing the Plasma store to use up to 0.1GB of memory. > Connection to IPC socket failed for pathname > /tmp/test_plasma-Dzj8IQ/plasma.sock, retrying 50 more times > Starting object store with directory /dev/shm and huge page support disabled > Connection to IPC socket failed for pathname > /tmp/test_plasma-Dzj8IQ/plasma.sock, retrying 49 more times > --- Captured stderr teardown > --- > ==20909== Invalid free() / delete / delete[] / realloc() > ==20909==at 0x4C2C83C: operator delete[](void*) (in > /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) > ==20909==by 0x4A870A: std::default_delete []>::operator()(unsigned char*) const (unique_ptr.h:99) > ==20909==by 0x4A0D6C: std::unique_ptr std::default_delete >::~unique_ptr() (unique_ptr.h:377) > ==20909==by 0x4C3AE1: void std::_Destroy [], std::default_delete > >(std::unique_ptr [], std::default_delete >*) (stl_construct.h:93) > ==20909==by 0x4C2DD4: void > std::_Destroy_aux::__destroy std::default_delete >*>(std::unique_ptr std::default_delete >*, std::unique_ptr std::default_delete >*) (stl_construct.h:103) > ==20909==by 0x4C1999: void std::_Destroy [], std::default_delete >*>(std::unique_ptr [], std::default_delete >*, std::unique_ptr [], std::default_delete >*) (stl_construct.h:126) > ==20909==by 0x4BF460: void std::_Destroy [], std::default_delete >*, std::unique_ptr [], std::default_delete > >(std::unique_ptr [], std::default_delete >*, std::unique_ptr [], std::default_delete >*, > std::allocator char []> > >&) (stl_construct.h:151) > ==20909==by 0x4B9D41: std::deque std::default_delete >, > std::allocator char []> > > > >::_M_destroy_data_aux(std::_Deque_iterator std::default_delete >, std::unique_ptr std::default_delete >&, std::unique_ptr std::default_delete >*>, > std::_Deque_iterator std::default_delete >, std::unique_ptr std::default_delete >&, std::unique_ptr std::default_delete >*>) (deque.tcc:806) > ==20909==by 0x4B17DA: std::deque std::default_delete >, > std::allocator char []> > > >::_M_destroy_data(std::_Deque_iterator char [], std::default_delete >, std::unique_ptr char [], std::default_delete >&, std::unique_ptr char [], std::default_delete >*>, > std::_Deque_iterator std::default_delete >, std::unique_ptr std::default_delete >&, std::unique_ptr std::default_delete >*>, > std::allocator char []> > > const&) (
[jira] [Updated] (ARROW-4008) [C++] Integration test executable failure
[ https://issues.apache.org/jira/browse/ARROW-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4008: -- External issue URL: https://github.com/apache/arrow/issues/20610 > [C++] Integration test executable failure > - > > Key: ARROW-4008 > URL: https://issues.apache.org/jira/browse/ARROW-4008 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: ci-failure, pull-request-available > Fix For: 0.12.0 > > Time Spent: 40m > Remaining Estimate: 0h > > See consistent CI failures: > {code} > -- Creating binary inputs > /home/travis/build/apache/arrow/cpp-build/debug/arrow-json-integration-test > --integration > --arrow=/tmp/tmpx3363_ef/ffd61e898f66410093235f7b3edc0b8f_struct_example.json_to_arrow > --json=/home/travis/build/apache/arrow/integration/data/struct_example.json > --mode=JSON_TO_ARROW > -- Validating file > /home/travis/build/apache/arrow/cpp-build/debug/arrow-json-integration-test > --integration > --arrow=/tmp/tmpx3363_ef/ffd61e898f66410093235f7b3edc0b8f_struct_example.json_to_arrow > --json=/home/travis/build/apache/arrow/integration/data/struct_example.json > --mode=VALIDATE > -- Validating stream > /home/travis/build/apache/arrow/cpp-build/debug/file-to-stream > /tmp/tmpx3363_ef/ffd61e898f66410093235f7b3edc0b8f_struct_example.json_to_arrow > > > /tmp/tmpx3363_ef/d9968e28006f42ee9df239f50c71231a_struct_example.arrow_to_stream > cat > /tmp/tmpx3363_ef/d9968e28006f42ee9df239f50c71231a_struct_example.arrow_to_stream > | /home/travis/build/apache/arrow/cpp-build/debug/stream-to-file > > /tmp/tmpx3363_ef/eaf061ea34e64a08bf38800d4fe5a9be_struct_example.stream_to_arrow > /home/travis/build/apache/arrow/cpp-build/debug/arrow-json-integration-test > --integration > --arrow=/tmp/tmpx3363_ef/eaf061ea34e64a08bf38800d4fe5a9be_struct_example.stream_to_arrow > --json=/home/travis/build/apache/arrow/integration/data/struct_example.json > --mode=VALIDATE > Command failed: > /home/travis/build/apache/arrow/cpp-build/debug/arrow-json-integration-test > --integration > --arrow=/tmp/tmpx3363_ef/eaf061ea34e64a08bf38800d4fe5a9be_struct_example.stream_to_arrow > --json=/home/travis/build/apache/arrow/integration/data/struct_example.json > --mode=VALIDATE > With output: > -- > Error message: Invalid: > /home/travis/build/apache/arrow/cpp/src/arrow/ipc/json-integration-test.cc:151 > code: RecordBatchFileReader::Open(arrow_file.get(), _reader) > /home/travis/build/apache/arrow/cpp/src/arrow/ipc/reader.cc:624 code: > ReadFooter() > File is too small: 0 > {code} > https://travis-ci.org/apache/arrow/jobs/467092615#L2567 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4038) [Rust] Add array_ops methods for boolean AND, OR, NOT
[ https://issues.apache.org/jira/browse/ARROW-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661061#comment-17661061 ] Rok Mihevc commented on ARROW-4038: --- This issue has been migrated to [issue #20638|https://github.com/apache/arrow/issues/20638] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Rust] Add array_ops methods for boolean AND, OR, NOT > - > > Key: ARROW-4038 > URL: https://issues.apache.org/jira/browse/ARROW-4038 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We have math and comparison operations and I would now like to add boolean > unary and binary operators such as AND, OR, NOT for use in predicates against > arrow data. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3892) [JS] Remove any dependency on compromised NPM flatmap-stream package
[ https://issues.apache.org/jira/browse/ARROW-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660916#comment-17660916 ] Rok Mihevc commented on ARROW-3892: --- This issue has been migrated to [issue #20499|https://github.com/apache/arrow/issues/20499] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [JS] Remove any dependency on compromised NPM flatmap-stream package > > > Key: ARROW-3892 > URL: https://issues.apache.org/jira/browse/ARROW-3892 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Wes McKinney >Assignee: Brian Hulette >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We are erroring out as the result of > https://github.com/dominictarr/event-stream/issues/116 > {code} > npm ERR! code ENOVERSIONS > npm ERR! No valid versions available for flatmap-stream > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4276) [Release] Remove needless Bintray authentication from binaries verify script
[ https://issues.apache.org/jira/browse/ARROW-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4276: -- External issue URL: https://github.com/apache/arrow/issues/20851 > [Release] Remove needless Bintray authentication from binaries verify script > > > Key: ARROW-4276 > URL: https://issues.apache.org/jira/browse/ARROW-4276 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3948) [CI][GLib] Set timeout to Homebrew
[ https://issues.apache.org/jira/browse/ARROW-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660972#comment-17660972 ] Rok Mihevc commented on ARROW-3948: --- This issue has been migrated to [issue #20555|https://github.com/apache/arrow/issues/20555] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [CI][GLib] Set timeout to Homebrew > -- > > Key: ARROW-3948 > URL: https://issues.apache.org/jira/browse/ARROW-3948 > Project: Apache Arrow > Issue Type: Sub-task > Components: Continuous Integration, GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h > Remaining Estimate: 0h > > We can't get Homebrew log when Travis detects hanged Homebrew process and > kill the CI job. > We need to detect hanged Homebrew process and kill the Homebrew process to > get Homebrew log. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-2858) [Packaging] Add unit tests for crossbow
[ https://issues.apache.org/jira/browse/ARROW-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17659883#comment-17659883 ] Rok Mihevc commented on ARROW-2858: --- This issue has been migrated to [issue #19231|https://github.com/apache/arrow/issues/19231] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Packaging] Add unit tests for crossbow > --- > > Key: ARROW-2858 > URL: https://issues.apache.org/jira/browse/ARROW-2858 > Project: Apache Arrow > Issue Type: Task > Components: Packaging >Reporter: Phillip Cloud >Priority: Major > > As this code grows we should start adding unit tests to make sure we can make > changes safely. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4048) [GLib] Return ChunkedArray instead of Array in gparquet_arrow_file_reader_read_column
[ https://issues.apache.org/jira/browse/ARROW-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4048: -- External issue URL: https://github.com/apache/arrow/issues/20647 > [GLib] Return ChunkedArray instead of Array in > gparquet_arrow_file_reader_read_column > - > > Key: ARROW-4048 > URL: https://issues.apache.org/jira/browse/ARROW-4048 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Yosuke Shiro >Assignee: Yosuke Shiro >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Because FileReader::ReadColumn(int i, std::shared_ptr* out) is > deprecated since 0.12. > [https://github.com/apache/arrow/pull/3171/files#diff-e6f49fbed784ef27e3ab0075e01a5871R132] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4164) [Rust] ArrayBuilder should not have associated types
[ https://issues.apache.org/jira/browse/ARROW-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661188#comment-17661188 ] Rok Mihevc commented on ARROW-4164: --- This issue has been migrated to [issue #20749|https://github.com/apache/arrow/issues/20749] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Rust] ArrayBuilder should not have associated types > > > Key: ARROW-4164 > URL: https://issues.apache.org/jira/browse/ARROW-4164 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Priority: Minor > > When dealing with arrays at runtime it is possible to represent them with > ArrayRef, allowing for things like Vec and then downcasting arrays > as needed based on schema meta data. > I am now trying to do the same thing with ArrayBuilder but because this trait > has an associated type, I cannot create a Vec. This makes it > difficult in some cases to dynamic build arrays. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4580) [JS] Accept Iterables in IntVector/FloatVector from() signatures
[ https://issues.apache.org/jira/browse/ARROW-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661603#comment-17661603 ] Rok Mihevc commented on ARROW-4580: --- This issue has been migrated to [issue #21124|https://github.com/apache/arrow/issues/21124] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [JS] Accept Iterables in IntVector/FloatVector from() signatures > > > Key: ARROW-4580 > URL: https://issues.apache.org/jira/browse/ARROW-4580 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Affects Versions: JS-0.4.0 >Reporter: Paul Taylor >Assignee: Paul Taylor >Priority: Major > Fix For: JS-0.4.1 > > > Right now {{IntVector.from()}} and {{FloatVector.from()}} expect the data is > already in typed-array form. But if we know the desired Vector type before > hand (e.g. if {{Int32Vector.from()}} is called), we can accept any JS > iterable of the values. > In order to do this, we should ensure {{Float16Vector.from()}} properly > clamps incoming f32/f64 values to u16s, in case the source is a vanilla > 64-bit JS float. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3972) [C++] Update to LLVM and Clang bits to 7.0
[ https://issues.apache.org/jira/browse/ARROW-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660995#comment-17660995 ] Rok Mihevc commented on ARROW-3972: --- This issue has been migrated to [issue #20575|https://github.com/apache/arrow/issues/20575] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [C++] Update to LLVM and Clang bits to 7.0 > -- > > Key: ARROW-3972 > URL: https://issues.apache.org/jira/browse/ARROW-3972 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, C++ - Gandiva >Reporter: Uwe Korn >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > As {{llvmlite}}, the other package in the Python ecosystem moved to LLVM 7, > we should follow along to avoid problems when we use it in the same Python > environment as Gandiva. > Reference: https://github.com/numba/llvmlite/pull/412 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3989) [Rust] CSV reader should handle case sensitivity for boolean values
[ https://issues.apache.org/jira/browse/ARROW-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661012#comment-17661012 ] Rok Mihevc commented on ARROW-3989: --- This issue has been migrated to [issue #20592|https://github.com/apache/arrow/issues/20592] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Rust] CSV reader should handle case sensitivity for boolean values > --- > > Key: ARROW-3989 > URL: https://issues.apache.org/jira/browse/ARROW-3989 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.11.1 >Reporter: Neville Dipale >Assignee: Neville Dipale >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Excel saves booleans in CSV in upper case, Pandas uses Proper case. > Our CSV reader doesn't recognise (True, False, TRUE, FALSE). I noticed this > when making boolean schema inference case insensitive. > > I would propose that we convert Boolean strings to lower-case before casting > them to Rust's bool type. [~andygrove], what do you think? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3964) [Go] More readable example for csv.Reader
[ https://issues.apache.org/jira/browse/ARROW-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660987#comment-17660987 ] Rok Mihevc commented on ARROW-3964: --- This issue has been migrated to [issue #20567|https://github.com/apache/arrow/issues/20567] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Go] More readable example for csv.Reader > - > > Key: ARROW-3964 > URL: https://issues.apache.org/jira/browse/ARROW-3964 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Reporter: Masashi Shibata >Assignee: Masashi Shibata >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Example of godoc doesn't include input file(testdata/simple.csv). So it's > hard to understand the output. > [https://godoc.org/github.com/apache/arrow/go/arrow/csv] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-3864) [GLib] Add support for allow-float-truncate cast option
[ https://issues.apache.org/jira/browse/ARROW-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660887#comment-17660887 ] Rok Mihevc commented on ARROW-3864: --- This issue has been migrated to [issue #20429|https://github.com/apache/arrow/issues/20429] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [GLib] Add support for allow-float-truncate cast option > --- > > Key: ARROW-3864 > URL: https://issues.apache.org/jira/browse/ARROW-3864 > Project: Apache Arrow > Issue Type: New Feature > Components: GLib >Affects Versions: 0.11.1 >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4540) [Rust] Add basic JSON reader
[ https://issues.apache.org/jira/browse/ARROW-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4540: -- External issue URL: https://github.com/apache/arrow/issues/21089 > [Rust] Add basic JSON reader > > > Key: ARROW-4540 > URL: https://issues.apache.org/jira/browse/ARROW-4540 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Neville Dipale >Assignee: Neville Dipale >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > This is the first step in getting a JSON reader working in Rust -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4049) [C++] Arrow never use glog even though glog is linked.
[ https://issues.apache.org/jira/browse/ARROW-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661072#comment-17661072 ] Rok Mihevc commented on ARROW-4049: --- This issue has been migrated to [issue #20648|https://github.com/apache/arrow/issues/20648] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [C++] Arrow never use glog even though glog is linked. > -- > > Key: ARROW-4049 > URL: https://issues.apache.org/jira/browse/ARROW-4049 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.12.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The following is a part of arrow/util/logging.cc. > {code} > #ifdef ARROW_USE_GLOG > typedef google::LogMessage LoggingProvider; > #else > typedef CerrLog LoggingProvider; > #endif > {code} > As you see, when ARROW_USE_GLOG is defined, glog is intended to be used but > it's not never defined and glog is never used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4569) [Gandiva] validate that the precision/scale are within bounds
[ https://issues.apache.org/jira/browse/ARROW-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4569: -- External issue URL: https://github.com/apache/arrow/issues/21115 > [Gandiva] validate that the precision/scale are within bounds > - > > Key: ARROW-4569 > URL: https://issues.apache.org/jira/browse/ARROW-4569 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva >Reporter: Pindikura Ravindra >Assignee: Pindikura Ravindra >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-3904) [C++/Python] Validate scale and precision of decimal128 type
[ https://issues.apache.org/jira/browse/ARROW-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-3904: -- External issue URL: https://github.com/apache/arrow/issues/20518 > [C++/Python] Validate scale and precision of decimal128 type > > > Key: ARROW-3904 > URL: https://issues.apache.org/jira/browse/ARROW-3904 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Krisztian Szucs >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.14.0 > > > Do We have a specification for it? > Currently I can create `pa.decimal128(-2147483648, 2147483647)`, which seems > wrong. > References: > - > https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-2017 > - > https://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#CNCPT1832 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-3929) [Go] improve memory usage of CSV reader to improve runtime performances
[ https://issues.apache.org/jira/browse/ARROW-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-3929: -- External issue URL: https://github.com/apache/arrow/issues/20539 > [Go] improve memory usage of CSV reader to improve runtime performances > --- > > Key: ARROW-3929 > URL: https://issues.apache.org/jira/browse/ARROW-3929 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Reporter: Sebastien Binet >Assignee: Sebastien Binet >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-3898) parquet-arrow example has compilation errors
[ https://issues.apache.org/jira/browse/ARROW-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-3898: -- External issue URL: https://github.com/apache/arrow/issues/20511 > parquet-arrow example has compilation errors > > > Key: ARROW-3898 > URL: https://issues.apache.org/jira/browse/ARROW-3898 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Miao Wang >Assignee: Miao Wang >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > When compiling example, reader-writer.cc, it shows following compilation > errors: > *no member named 'cout' in namespace 'std'* > PARQUET_THROW_NOT_OK(arrow::PrettyPrint(*array, 4, ::cout)); > in multiple places. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-3923) [Java] JDBC-to-Arrow Conversion: Unnecessary Calendar Requirement
[ https://issues.apache.org/jira/browse/ARROW-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-3923: -- External issue URL: https://github.com/apache/arrow/issues/20534 > [Java] JDBC-to-Arrow Conversion: Unnecessary Calendar Requirement > - > > Key: ARROW-3923 > URL: https://issues.apache.org/jira/browse/ARROW-3923 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Michael Pigott >Assignee: Michael Pigott >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > While I was going through the JDBC Adapter source code, I noticed a Calendar > was required to create the Arrow Schema (for any Timestamp fields), and also > needed for converting a JDBC ResultSet to an ArrowVector (for Date, Time, and > Timestamp fields). > However, Arrow Timestamps do not require a time zone, and none of the JDBC > getters for Date, Time, and Timestamp require a Calendar. > I am proposing a change to make the Schema creator and ResultSet converter > support null Calendars. If a Calendar is available, it will be used, and if > not, it will not be used. > The existing SureFire plugin configuration uses a UTC calendar for the > database, which is the default Calendar in the existing code. Likewise, no > changes to the unit tests are required to provide adequate coverage for the > change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-3921) [CI][GLib] Log Homebrew output
[ https://issues.apache.org/jira/browse/ARROW-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-3921: -- External issue URL: https://github.com/apache/arrow/issues/20532 > [CI][GLib] Log Homebrew output > -- > > Key: ARROW-3921 > URL: https://issues.apache.org/jira/browse/ARROW-3921 > Project: Apache Arrow > Issue Type: Sub-task > Components: GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We need more information to fix {{brew update}} problem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4512) [R] Stream reader/writer API that takes socket stream
[ https://issues.apache.org/jira/browse/ARROW-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4512: -- External issue URL: https://github.com/apache/arrow/issues/21063 > [R] Stream reader/writer API that takes socket stream > - > > Key: ARROW-4512 > URL: https://issues.apache.org/jira/browse/ARROW-4512 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Affects Versions: 0.12.0, 0.14.1, 1.0.0 >Reporter: Hyukjin Kwon >Assignee: Dewey Dunnington >Priority: Major > Fix For: 8.0.0 > > > I have been working on Spark integration with Arrow. > I realised that there are no ways to use socket as input to use Arrow stream > format. For instance, > I want to something like: > {code} > connStream <- socketConnection(port = , blocking = TRUE, open = "wb") > rdf_slices <- # a list of data frames. > stream_writer <- NULL > tryCatch({ > for (rdf_slice in rdf_slices) { > batch <- record_batch(rdf_slice) > if (is.null(stream_writer)) { > stream_writer <- RecordBatchStreamWriter(connStream, batch$schema) # > Here, looks there's no way to use socket. > } > stream_writer$write_batch(batch) > } > }, > finally = { > if (!is.null(stream_writer)) { > stream_writer$close() > } > }) > {code} > Likewise, I cannot find a way to iterate the stream batch by batch > {code} > RecordBatchStreamReader(connStream)$batches() # Here, looks there's no way > to use socket. > {code} > This looks easily possible in Python side but looks missing in R APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-4297) [C++] Fix build for 32-bit MSYS2
[ https://issues.apache.org/jira/browse/ARROW-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rok Mihevc updated ARROW-4297: -- External issue URL: https://github.com/apache/arrow/issues/20870 > [C++] Fix build for 32-bit MSYS2 > > > Key: ARROW-4297 > URL: https://issues.apache.org/jira/browse/ARROW-4297 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Javier Luraschi >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 10h > Remaining Estimate: 0h > > see mailing list thread > https://lists.apache.org/thread.html/0fc5f7ffe4eb4e4aea931f0b26902d27305cf98829fb5c4a7375f2ab@%3Cdev.arrow.apache.org%3E -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-4566) [C++][Flight] Add option to run arrow-flight-benchmark against a perf server running on a different host
[ https://issues.apache.org/jira/browse/ARROW-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661588#comment-17661588 ] Rok Mihevc commented on ARROW-4566: --- This issue has been migrated to [issue #21112|https://github.com/apache/arrow/issues/21112] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [C++][Flight] Add option to run arrow-flight-benchmark against a perf server > running on a different host > > > Key: ARROW-4566 > URL: https://issues.apache.org/jira/browse/ARROW-4566 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, FlightRPC >Reporter: Wes McKinney >Assignee: David Li >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the assumption is that both processes are running on localhost. > While also interesting (to see how fast things can go taking network IO out > of the equation) it is not very realistic. It would be good to both establish > a baseline network IO benchmark between two hosts and then see how close a > Flight stream can get to that -- This message was sent by Atlassian Jira (v8.20.10#820010)