[jira] [Commented] (ARROW-2813) [C++] Strip uninformative lcov output from Travis CI logs

2023-01-11 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17659838#comment-17659838
 ] 

Rok Mihevc commented on ARROW-2813:
---

This issue has been migrated to [issue 
#19191|https://github.com/apache/arrow/issues/19191] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C++] Strip uninformative lcov output from Travis CI logs
> -
>
> Key: ARROW-2813
> URL: https://issues.apache.org/jira/browse/ARROW-2813
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We have about 650 lines of the type:
> {code}
> geninfo: WARNING: no data found for /usr/include/c++/4.8/istream
> {code}
> We should maybe pipe all the lcov output to a file and print any lines that 
> reference things other than /usr/include



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4982) [GLib][CI] Run tests on AppVeyor

2023-01-11 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17662004#comment-17662004
 ] 

Rok Mihevc commented on ARROW-4982:
---

This issue has been migrated to [issue 
#16036|https://github.com/apache/arrow/issues/16036] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [GLib][CI] Run tests on AppVeyor
> 
>
> Key: ARROW-4982
> URL: https://issues.apache.org/jira/browse/ARROW-4982
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Continuous Integration, GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-2209) [Python] Partition columns are not correctly loaded in schema of ParquetDataset

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17659237#comment-17659237
 ] 

Rok Mihevc commented on ARROW-2209:
---

This issue has been migrated to [issue 
#18173|https://github.com/apache/arrow/issues/18173] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] Partition columns are not correctly loaded in schema of 
> ParquetDataset
> ---
>
> Key: ARROW-2209
> URL: https://issues.apache.org/jira/browse/ARROW-2209
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently the partition columns are not included in the schema of a 
> ParquetDataset. We correctly write them out in the {{_common_metadata}} file 
> but we fail to load this file correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-2202) [JS] Add DataFrame.toJSON

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17659230#comment-17659230
 ] 

Rok Mihevc commented on ARROW-2202:
---

This issue has been migrated to [issue 
#18166|https://github.com/apache/arrow/issues/18166] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [JS] Add DataFrame.toJSON
> -
>
> Key: ARROW-2202
> URL: https://issues.apache.org/jira/browse/ARROW-2202
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
>
> Currently, {{CountByResult}} has its own [{{toJSON}} 
> method|https://github.com/apache/arrow/blob/master/js/src/table.ts#L282], but 
> there should be a more general one for every {{DataFrame}}.
> {{CountByResult.toJSON}} returns:
> {code:json}
> {
>   "keyA": 10,
>   "keyB": 10,
>   ...
> }{code}
> A more general {{toJSON}} could just return a list of objects with an entry 
> for each column. For the above {{CountByResult}}, the output would look like:
> {code:json}
> [
>   {value: "keyA", count: 10},
>   {value: "keyB", count: 10},
>   ...
> ]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-2185) Remove CI directives from squashed commit messages

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-2185:
--
External issue URL: https://github.com/apache/arrow/issues/18150

> Remove CI directives from squashed commit messages
> --
>
> Key: ARROW-2185
> URL: https://issues.apache.org/jira/browse/ARROW-2185
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> In our PR squash tool, we are potentially picking up CI directives like 
> {{[skip appveyor]}} from intermediate commits. We should regex these away and 
> instead use directives in the PR title if we wish the commit to master to 
> behave in a certain way



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-1968) [Python] Unit testing setup for ORC files

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-1968:
--
External issue URL: https://github.com/apache/arrow/issues/17955

> [Python] Unit testing setup for ORC files
> -
>
> Key: ARROW-1968
> URL: https://issues.apache.org/jira/browse/ARROW-1968
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> ORC does not have a production ready C++ writer yet, so we will need to 
> figure out another way to generate test data files to probe all of the 
> corners of our ORC reader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-1921) [Doc] Build API docs on a per-release basis

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-1921:
--
External issue URL: https://github.com/apache/arrow/issues/17912

> [Doc] Build API docs on a per-release basis
> ---
>
> Key: ARROW-1921
> URL: https://issues.apache.org/jira/browse/ARROW-1921
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Uwe Korn
>Priority: Major
>
> Currently we build the docs from time to time manually from master. We should 
> also build them per release so that you can have a look at the latest 
> released API version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4975) [C++] Support concatenation of UnionArrays

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4975:
--
External issue URL: https://github.com/apache/arrow/issues/16035

> [C++] Support concatenation of UnionArrays
> --
>
> Key: ARROW-4975
> URL: https://issues.apache.org/jira/browse/ARROW-4975
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Matthijs Brobbel
>Priority: Minor
>  Labels: good-first-issue, good-second-issue, 
> pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/pull/3746 adds support for concatenation of 
> arrays, but UnionArrays are not supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4972) [Go] Array equality

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661994#comment-17661994
 ] 

Rok Mihevc commented on ARROW-4972:
---

This issue has been migrated to [issue 
#21476|https://github.com/apache/arrow/issues/21476] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Go] Array equality
> ---
>
> Key: ARROW-4972
> URL: https://issues.apache.org/jira/browse/ARROW-4972
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Alexandre Crayssac
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4976) [JS] RecordBatchReader should reset its Node/DOM streams

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4976:
--
External issue URL: https://github.com/apache/arrow/issues/21479

> [JS] RecordBatchReader should reset its Node/DOM streams
> 
>
> Key: ARROW-4976
> URL: https://issues.apache.org/jira/browse/ARROW-4976
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> RecordBatchReaders should reset their internal platform streams on reset so 
> they can be piped to separate output streams when reset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4975) [C++] Support concatenation of UnionArrays

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661997#comment-17661997
 ] 

Rok Mihevc commented on ARROW-4975:
---

This issue has been migrated to [issue 
#16035|https://github.com/apache/arrow/issues/16035] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C++] Support concatenation of UnionArrays
> --
>
> Key: ARROW-4975
> URL: https://issues.apache.org/jira/browse/ARROW-4975
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Matthijs Brobbel
>Priority: Minor
>  Labels: good-first-issue, good-second-issue, 
> pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/pull/3746 adds support for concatenation of 
> arrays, but UnionArrays are not supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4507) [Format] Create outline and introduction for new document.

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4507:
--
External issue URL: https://github.com/apache/arrow/issues/21058

> [Format] Create outline and introduction for new document.
> --
>
> Key: ARROW-4507
> URL: https://issues.apache.org/jira/browse/ARROW-4507
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> This will ensure the document has a good flow, other subtasks on the parent 
> will handle moving content from each of the documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4479) [Plasma] Add S3 as external store for Plasma

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4479:
--
External issue URL: https://github.com/apache/arrow/issues/21035

> [Plasma] Add S3 as external store for Plasma
> 
>
> Key: ARROW-4479
> URL: https://issues.apache.org/jira/browse/ARROW-4479
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Plasma
>Affects Versions: 0.12.0
>Reporter: Anurag Khandelwal
>Assignee: Anurag Khandelwal
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Adding S3 as an external store will allow objects to be evicted to S3 when 
> Plasma runs out of memory capacity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4497) [C++] Determine how we want to handle hashing of floating point edge cases

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661519#comment-17661519
 ] 

Rok Mihevc commented on ARROW-4497:
---

This issue has been migrated to [issue 
#21050|https://github.com/apache/arrow/issues/21050] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C++] Determine how we want to handle hashing of floating point edge cases
> --
>
> Key: ARROW-4497
> URL: https://issues.apache.org/jira/browse/ARROW-4497
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Micah Kornfield
>Priority: Minor
>  Labels: analytics
>
> We should document expected behavior or implement improvements to hashing 
> floating point code:
> 1.  -0.0 and 0.0 (should these be collapsed to 0.0)
> 2. NaN (Should we reduce to a single canonical version).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4566) [C++][Flight] Add option to run arrow-flight-benchmark against a perf server running on a different host

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4566:
--
External issue URL: https://github.com/apache/arrow/issues/21112

> [C++][Flight] Add option to run arrow-flight-benchmark against a perf server 
> running on a different host
> 
>
> Key: ARROW-4566
> URL: https://issues.apache.org/jira/browse/ARROW-4566
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the assumption is that both processes are running on localhost. 
> While also interesting (to see how fast things can go taking network IO out 
> of the equation) it is not very realistic. It would be good to both establish 
> a baseline network IO benchmark between two hosts and then see how close a 
> Flight stream can get to that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4475) [Python] Serializing objects that contain themselves

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4475:
--
External issue URL: https://github.com/apache/arrow/issues/21032

> [Python] Serializing objects that contain themselves
> 
>
> Key: ARROW-4475
> URL: https://issues.apache.org/jira/browse/ARROW-4475
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a regression from [https://github.com/apache/arrow/pull/3423]
> The following segfaults:
> {code:java}
> import pyarrow as pa
> lst = []
> lst.append(lst)
> pa.serialize(lst){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4584) [Python] Add built wheel to manylinux1 dockerignore.

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4584:
--
External issue URL: https://github.com/apache/arrow/issues/21128

> [Python] Add built wheel to manylinux1 dockerignore.
> 
>
> Key: ARROW-4584
> URL: https://issues.apache.org/jira/browse/ARROW-4584
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging, Python
>Affects Versions: 0.12.0
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently we add them to the docker context while we don't need them. This 
> shrinks the context for me down to 55k instead of hundreds MiBs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4474) [Flight] FlightInfo should use signed integer types for payload size

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661495#comment-17661495
 ] 

Rok Mihevc commented on ARROW-4474:
---

This issue has been migrated to [issue 
#21031|https://github.com/apache/arrow/issues/21031] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Flight] FlightInfo should use signed integer types for payload size
> 
>
> Key: ARROW-4474
> URL: https://issues.apache.org/jira/browse/ARROW-4474
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC
>Reporter: David Li
>Assignee: David Li
>Priority: Major
>  Labels: flight, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The de-facto practice is to use -1 in FlightInfo to indicate that the number 
> of records/size of the payload is unknown, looking at the Java 
> implementation. However, the Protobuf definition uses an unsigned integer 
> type, as does the C++ implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4271) [Rust] Move Parquet specific info to Parquet Readme

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661295#comment-17661295
 ] 

Rok Mihevc commented on ARROW-4271:
---

This issue has been migrated to [issue 
#20847|https://github.com/apache/arrow/issues/20847] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Rust] Move Parquet specific info to Parquet Readme
> ---
>
> Key: ARROW-4271
> URL: https://issues.apache.org/jira/browse/ARROW-4271
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The arrow readme contains parquet specific info that was copied over from the 
> top level readme, it should be moved to the parquet readme.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4254) [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4254:
--
External issue URL: https://github.com/apache/arrow/issues/20831

> [C++] Gandiva tests fail to compile with Boost in Ubuntu 14.04 apt
> --
>
> Key: ARROW-4254
> URL: https://issues.apache.org/jira/browse/ARROW-4254
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> These tests use an API that was not available in the Boost in Ubuntu 14.04; 
> we can change them to use the more compatible API
> {code}
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:
>  In member function ‘virtual void 
> gandiva::TestLruCache_TestLruBehavior_Test::TestBody()’:
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:188:
>  error: ‘class boost::optional >’ has no member named 
> ‘value’
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
> ^
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:203:
>  error: template argument 1 is invalid
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
>^
> /tmp/arrow-0.12.0.BFPHN/apache-arrow-0.12.0/cpp/src/gandiva/lru_cache_test.cc:62:294:
>  error: ‘class boost::optional >’ has no member named 
> ‘value’
>ASSERT_EQ(cache_.get(TestCacheKey(1)).value(), "hello");
>   
>   
>   
> ^
> make[2]: *** 
> [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/lru_cache_test.cc.o] Error 
> 1
> make[1]: *** [src/gandiva/CMakeFiles/gandiva-lru_cache_test.dir/all] Error 2
> make[1]: *** Waiting for unfinished jobs
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4523) [JS] Add row proxy generation benchmark

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661545#comment-17661545
 ] 

Rok Mihevc commented on ARROW-4523:
---

This issue has been migrated to [issue 
#21073|https://github.com/apache/arrow/issues/21073] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [JS] Add row proxy generation benchmark
> ---
>
> Key: ARROW-4523
> URL: https://issues.apache.org/jira/browse/ARROW-4523
> Project: Apache Arrow
>  Issue Type: Test
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4567) [C++] Convert Scalar values to Array values with length 1

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4567:
--
External issue URL: https://github.com/apache/arrow/issues/21113

> [C++] Convert Scalar values to Array values with length 1
> -
>
> Key: ARROW-4567
> URL: https://issues.apache.org/jira/browse/ARROW-4567
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> A common approach to performing operations on both scalar and array values is 
> to treat a Scalar as an array of length 1. For example, we cannot currently 
> use our Cast kernels to cast a Scalar. It would be senseless to create 
> separate kernel implementations specialized for a single value, and much 
> easier to promote a scalar to an Array, execute the kernel, then unbox the 
> result back into a Scalar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3898) parquet-arrow example has compilation errors

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660922#comment-17660922
 ] 

Rok Mihevc commented on ARROW-3898:
---

This issue has been migrated to [issue 
#20511|https://github.com/apache/arrow/issues/20511] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> parquet-arrow example has compilation errors
> 
>
> Key: ARROW-3898
> URL: https://issues.apache.org/jira/browse/ARROW-3898
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Miao Wang
>Assignee: Miao Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When compiling example, reader-writer.cc, it shows following compilation 
> errors:
>  *no member named 'cout' in namespace 'std'*
>   PARQUET_THROW_NOT_OK(arrow::PrettyPrint(*array, 4, ::cout));
> in multiple places. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-3884) [Python] Add LLVM6 to manylinux1 base image

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3884:
--
External issue URL: https://github.com/apache/arrow/issues/20478

> [Python] Add LLVM6 to manylinux1 base image
> ---
>
> Key: ARROW-3884
> URL: https://issues.apache.org/jira/browse/ARROW-3884
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is necessary to be able to build and bundle libgandiva with the 0.12 
> release
> This (epic!) build definition in Apache Kudu may be useful for building only 
> the pieces that we need for linking the Gandiva libraries, which may help 
> keep the image size minimal
> https://github.com/apache/kudu/blob/master/thirdparty/build-definitions.sh#L175



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4562) [C++][Flight] Create outgoing composite grpc::ByteBuffer instead of allocating contiguous slice and copying IpcPayload into it

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661584#comment-17661584
 ] 

Rok Mihevc commented on ARROW-4562:
---

This issue has been migrated to [issue 
#21108|https://github.com/apache/arrow/issues/21108] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C++][Flight] Create outgoing composite grpc::ByteBuffer instead of 
> allocating contiguous slice and copying IpcPayload into it
> --
>
> Key: ARROW-4562
> URL: https://issues.apache.org/jira/browse/ARROW-4562
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> See discussion in https://github.com/apache/arrow/pull/3633



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4518) [JS] add jsdelivr to package.json

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661540#comment-17661540
 ] 

Rok Mihevc commented on ARROW-4518:
---

This issue has been migrated to [issue 
#21069|https://github.com/apache/arrow/issues/21069] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [JS] add jsdelivr to package.json
> -
>
> Key: ARROW-4518
> URL: https://issues.apache.org/jira/browse/ARROW-4518
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4538) [PYTHON] Remove index column from subschema in write_to_dataframe

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4538:
--
External issue URL: https://github.com/apache/arrow/issues/21087

> [PYTHON] Remove index column from subschema in write_to_dataframe
> -
>
> Key: ARROW-4538
> URL: https://issues.apache.org/jira/browse/ARROW-4538
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
>Reporter: Christian Thiel
>Assignee: Christian Thiel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When using {{pa.Table.from_pandas()}} with preserve_index=True and 
> dataframe.index.name!=None the prefix {{__index_level_}} is not added to the 
> respective schema name. This breaks {{write_to_dataset}} with active 
> partition columns.
> {code}
> import pyarrow as pa
> import pyarrow.parquet as pq
> import os
> import shutil
> import pandas as pd
> import numpy as np
> PATH_PYARROW_MANUAL = '/tmp/pyarrow_manual.pa/'
> if os.path.exists(PATH_PYARROW_MANUAL):
> shutil.rmtree(PATH_PYARROW_MANUAL)
> os.mkdir(PATH_PYARROW_MANUAL)
> arrays = np.array([np.array([0, 1, 2]), np.array([3, 4]), np.nan, np.nan])
> df = pd.DataFrame([0, 0, 1, 1], columns=['partition_column'])
> df['arrays'] = pd.Series(arrays)
> df.index.name='ID'
> table = pa.Table.from_pandas(df, preserve_index=True)
> print(table.schema.names)
> pq.write_to_dataset(table, root_path=PATH_PYARROW_MANUAL, 
> partition_cols=['partition_column'],
> preserve_index=True
>)
> {code}
> Removing {{df.index.name='ID'}} works. Also disabling {{partition_cols}} in 
> {{write_to_dataset}} works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3916) [Python] Support caller-provided filesystem in `ParquetWriter` constructor

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660940#comment-17660940
 ] 

Rok Mihevc commented on ARROW-3916:
---

This issue has been migrated to [issue 
#20528|https://github.com/apache/arrow/issues/20528] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] Support caller-provided filesystem in `ParquetWriter` constructor
> --
>
> Key: ARROW-3916
> URL: https://issues.apache.org/jira/browse/ARROW-3916
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Affects Versions: 0.11.1
>Reporter: Mackenzie
>Assignee: Mackenzie
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently to write files incrementally to S3, the following pattern appears 
> necessary:
> {{def write_dfs_to_s3(dfs, fname):}}
> {{    first_df = dfs[0]}}
> {{    table = pa.Table.from_pandas(first_df, preserve_index=False)}}
> {{    fs = s3fs.S3FileSystem()}}
> {{    fh = fs.open(fname, 'wb')}}
> {{    with pq.ParquetWriter(fh, table.schema) as writer:}}
> {{         # set file handle on writer so writer manages closing it when it 
> is itself closed}}
> {{        writer.file_handle = fh}}
> {{        writer.write_table(table=table)}}
> {{        for df in dfs[1:]:}}
> {{            table = pa.Table.from_pandas(df, preserve_index=False)}}
> {{            writer.write_table(table=table)}}
> This works as expected, but is quite roundabout. It would be much easier if 
> `ParquetWriter` supported `filesystem` as a keyword argument in its 
> constructor, in which case the `_get_fs_from_path` would be overriden by the 
> usual pattern of using the kwarg after ensuring it is a proper file system 
> with `_ensure_filesystem`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4295) [Plasma] Incorrect log message when evicting objects

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4295:
--
External issue URL: https://github.com/apache/arrow/issues/20868

> [Plasma] Incorrect log message when evicting objects
> 
>
> Key: ARROW-4295
> URL: https://issues.apache.org/jira/browse/ARROW-4295
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Plasma
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Assignee: Anurag Khandelwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When Plasma evicts objects on running out of memory, it prints log messages 
> of the form:
> {quote}There is not enough space to create this object, so evicting x objects 
> to free up y bytes. The number of bytes in use (before this eviction) is 
> z.{quote}
> However, the reported number of bytes in use (before this eviction) actually 
> reports the number of bytes *after* the eviction. A straightforward fix is to 
> simply replace z with (y+z).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4264) [C++] Document why DCHECKs are used in kernels

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661288#comment-17661288
 ] 

Rok Mihevc commented on ARROW-4264:
---

This issue has been migrated to [issue 
#20841|https://github.com/apache/arrow/issues/20841] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C++] Document why DCHECKs are used in kernels
> --
>
> Key: ARROW-4264
> URL: https://issues.apache.org/jira/browse/ARROW-4264
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> DCHECKs seem to be used where Status::Invalid might be considered more 
> appropriate (so programs don't crash).  See conversation on 
> [https://github.com/apache/arrow/pull/3287/files]
> based on conversation on this Jira and on the CL it seems DCHECKS are in fact 
> desired but we should document appropriate use for them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4240) [Packaging] Documents for Plasma GLib and Gandiva GLib are missing in source archive

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4240:
--
External issue URL: https://github.com/apache/arrow/issues/20819

> [Packaging] Documents for Plasma GLib and Gandiva GLib are missing in source 
> archive
> 
>
> Key: ARROW-4240
> URL: https://issues.apache.org/jira/browse/ARROW-4240
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4478) Re-enable homebrew formula

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4478:
--
External issue URL: https://github.com/apache/arrow/issues/21034

> Re-enable homebrew formula
> --
>
> Key: ARROW-4478
> URL: https://issues.apache.org/jira/browse/ARROW-4478
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Reporter: Aki Ariga
>Priority: Minor
>
> Since apache-arrow [formula on homebrew has been 
> removed|https://github.com/Homebrew/homebrew-core/pull/36063], brew 
> recommendation in R installation error message should be removed. 
> [https://github.com/apache/arrow/blob/master/r/configure#L90]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4486) [Python][CUDA] pyarrow.cuda.Context.foreign_buffer should have a `base=None` argument

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661508#comment-17661508
 ] 

Rok Mihevc commented on ARROW-4486:
---

This issue has been migrated to [issue 
#21041|https://github.com/apache/arrow/issues/21041] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python][CUDA] pyarrow.cuda.Context.foreign_buffer should have a `base=None` 
> argument
> -
>
> Key: ARROW-4486
> URL: https://issues.apache.org/jira/browse/ARROW-4486
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Pearu Peterson
>Assignee: Pearu Peterson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 1.5h
>
> Similar to `pyarrow.foreign_buffer`, we need to keep the owner of cuda memory 
> alive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3926) [Python] Add Gandiva bindings to Python wheels

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660950#comment-17660950
 ] 

Rok Mihevc commented on ARROW-3926:
---

This issue has been migrated to [issue 
#20536|https://github.com/apache/arrow/issues/20536] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] Add Gandiva bindings to Python wheels
> --
>
> Key: ARROW-3926
> URL: https://issues.apache.org/jira/browse/ARROW-3926
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Depends on adding LLVM6 to the build toolchain



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4493) [Rust] [DataFusion] make accumulate_scalar somewhat exhaustive and easier to read

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4493:
--
External issue URL: https://github.com/apache/arrow/issues/15965

> [Rust] [DataFusion] make accumulate_scalar somewhat exhaustive and easier to 
> read
> -
>
> Key: ARROW-4493
> URL: https://issues.apache.org/jira/browse/ARROW-4493
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Affects Versions: 0.12.0
>Reporter: Tupshin Harper
>Assignee: Andy Grove
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> make accumulate_scalar somewhat exhaustive and easier to read
>  
> The current implementation doesn't leverage any of the exhaustiveness 
> checking of matching.This can be made simpler and partially exhaustive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4283) [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4283:
--
External issue URL: https://github.com/apache/arrow/issues/20858

> [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?
> 
>
> Key: ARROW-4283
> URL: https://issues.apache.org/jira/browse/ARROW-4283
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Paul Taylor
>Priority: Minor
>
> Filing this issue after a discussion today with [~xhochy] about how to 
> implement streaming pyarrow http services. I had attempted to use both Flask 
> and [aiohttp|https://aiohttp.readthedocs.io/en/stable/streams.html]'s 
> streaming interfaces because they seemed familiar, but no dice. I have no 
> idea how hard this would be to add -- supporting all the asynciterable 
> primitives in JS was non-trivial.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4272) [Python] Illegal hardware instruction on pyarrow import

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661296#comment-17661296
 ] 

Rok Mihevc commented on ARROW-4272:
---

This issue has been migrated to [issue 
#20848|https://github.com/apache/arrow/issues/20848] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] Illegal hardware instruction on pyarrow import
> ---
>
> Key: ARROW-4272
> URL: https://issues.apache.org/jira/browse/ARROW-4272
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1
> Environment: Python 3.6.7
> PySpark 2.4.0
> PyArrow: 0.11.1
> Pandas: 0.23.4
> NumPy: 1.15.4
> OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
> x86_64 x86_64 GNU/Linux
>Reporter: Elchin
>Priority: Critical
> Attachments: core
>
>
> I can't import pyarrow, it crashes:
> {code:java}
> >>> import pyarrow as pa
> [1]    31441 illegal hardware instruction (core dumped)  python3{code}
> Core dump is attached to issue, it can help you to understand what is the 
> problem.
> The environment is:
> Python 3.6.7
>  PySpark 2.4.0
>  PyArrow: 0.11.1
>  Pandas: 0.23.4
>  NumPy: 1.15.4
>  OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 
> x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-3894) [Python] Error reading IPC file with no record batches

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3894:
--
External issue URL: https://github.com/apache/arrow/issues/20503

> [Python] Error reading IPC file with no record batches
> --
>
> Key: ARROW-3894
> URL: https://issues.apache.org/jira/browse/ARROW-3894
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.11.1
>Reporter: Rik Coenders
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When using the RecordBatchFileWriter without actually writing a record batch. 
> The magic byte at the beginning of the file is not written. This causes the 
> exception File is smaller than indicated metadata size when reading that file 
> with the RecordBatchFileReader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3921) [CI][GLib] Log Homebrew output

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660945#comment-17660945
 ] 

Rok Mihevc commented on ARROW-3921:
---

This issue has been migrated to [issue 
#20532|https://github.com/apache/arrow/issues/20532] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [CI][GLib] Log Homebrew output
> --
>
> Key: ARROW-3921
> URL: https://issues.apache.org/jira/browse/ARROW-3921
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We need more information to fix {{brew update}} problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4485) [CI] Determine maintenance approach to pinned conda-forge binutils package

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4485:
--
External issue URL: https://github.com/apache/arrow/issues/21040

> [CI] Determine maintenance approach to pinned conda-forge binutils package
> --
>
> Key: ARROW-4485
> URL: https://issues.apache.org/jira/browse/ARROW-4485
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> In ARROW-4469 https://github.com/apache/arrow/pull/3554 we pinned binutils 
> 2.31 because the 2.32 release broke builds on Ubuntu Xenial. We aren't sure 
> what will be our path going forward to rely on the conda-forge toolchain 
> because of this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4164) [Rust] ArrayBuilder should not have associated types

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4164:
--
External issue URL: https://github.com/apache/arrow/issues/20749

> [Rust] ArrayBuilder should not have associated types
> 
>
> Key: ARROW-4164
> URL: https://issues.apache.org/jira/browse/ARROW-4164
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Priority: Minor
>
> When dealing with arrays at runtime it is possible to represent them with 
> ArrayRef, allowing for things like Vec and then downcasting arrays 
> as needed based on schema meta data.
> I am now trying to do the same thing with ArrayBuilder but because this trait 
> has an associated type, I cannot create a Vec. This makes it 
> difficult in some cases to dynamic build arrays.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4557) [JS] Add Table/Schema/RecordBatch `selectAt(...indices)` method

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4557:
--
External issue URL: https://github.com/apache/arrow/issues/21104

> [JS] Add Table/Schema/RecordBatch `selectAt(...indices)` method
> ---
>
> Key: ARROW-4557
> URL: https://issues.apache.org/jira/browse/ARROW-4557
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> Presently Table, Schema, and RecordBatch have basic {{select(...colNames)}} 
> implementations. Having an easy {{selectAt(...colIndices)}} impl would be a 
> nice complement, especially when there are duplicate column names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4294:
--
External issue URL: https://github.com/apache/arrow/issues/20867

> [Plasma] Add support for evicting objects to external store
> ---
>
> Key: ARROW-4294
> URL: https://issues.apache.org/jira/browse/ARROW-4294
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, C++ - Plasma
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Assignee: Anurag Khandelwal
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Currently, when Plasma needs storage space for additional objects, it evicts 
> objects by deleting them from the Plasma store. This is a problem when it 
> isn't possible to reconstruct the object or reconstructing it is expensive. 
> Adding support for a pluggable external store that Plasma can evict objects 
> to will address this issue. 
> My proposal is described below.
> *Requirements*
>  * Objects in Plasma should be evicted to a external store rather than being 
> removed altogether
>  * Communication to the external storage service should be through a very 
> thin, shim interface. At the same time, the interface should be general 
> enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
>  * Should be pluggable (e.g., it should be simple to add in or remove the 
> external storage service for eviction, switch between different remote 
> services, etc.) and easy to implement
> *Assumptions/Non-Requirements*
>  * The external store has practically infinite storage
>  * The external store's write operation is idempotent and atomic; this is 
> needed ensure there are no race conditions due to multiple concurrent 
> evictions of the same object.
> *Proposed Implementation*
>  * Define a ExternalStore interface with a Connect call. The call returns an 
> ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
> needs to be supported has to have this interface implemented.
>  * In order to read or write data to the external store in a thread-safe 
> manner, one ExternalStoreHandle should be created per-thread. While the 
> ExternalStoreHandle itself is not required to be thread-safe, multiple 
> ExternalStoreHandles across multiple threads should be able to modify the 
> external store in a thread-safe manner. These handles are most likely going 
> to be wrappers around the external store client interfaces.
>  * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
> method. If an external store is specified for the Plasma store, the 
> EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
> object data to the external store (via the ExternalStoreHandle) and reclaim 
> the memory associated with the object data/metadata rather than remove the 
> entry from the Object Table altogether. In case there is no valid external 
> store, the eviction path would remain the same (i.e., the object entry is 
> still deleted from the Object Table).
>  * The Get method in Plasma Store now tries to fetch the object from external 
> store if it is not found locally and there is an external store associated 
> with the Plasma Store. The method tries to offload this to an external worker 
> thread pool with a fire-and-forget model, but may need to do this 
> synchronously if there are too many requests already enqueued.
>  * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, 
> which can be appended to with implementations of the ExternalStore and 
> ExternalStoreHandle interfaces, which will then be compiled into the 
> plasma_store_server executable.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4290) [C++/Gandiva] Support detecting correct LLVM version in Homebrew

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661314#comment-17661314
 ] 

Rok Mihevc commented on ARROW-4290:
---

This issue has been migrated to [issue 
#20864|https://github.com/apache/arrow/issues/20864] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C++/Gandiva] Support detecting correct LLVM version in Homebrew
> 
>
> Key: ARROW-4290
> URL: https://issues.apache.org/jira/browse/ARROW-4290
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, C++ - Gandiva
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should also search in homebrew for the matching LLVM version for Gandiva 
> on OSX. You can install it via {{brew install llvm@6}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4592) [GLib] Stop configure immediately when GLib isn't available

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661615#comment-17661615
 ] 

Rok Mihevc commented on ARROW-4592:
---

This issue has been migrated to [issue 
#21134|https://github.com/apache/arrow/issues/21134] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [GLib] Stop configure immediately when GLib isn't available
> ---
>
> Key: ARROW-4592
> URL: https://issues.apache.org/jira/browse/ARROW-4592
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4503) [C#] ArrowStreamReader allocates and copies data excessively

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661525#comment-17661525
 ] 

Rok Mihevc commented on ARROW-4503:
---

This issue has been migrated to [issue 
#21055|https://github.com/apache/arrow/issues/21055] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C#] ArrowStreamReader allocates and copies data excessively
> 
>
> Key: ARROW-4503
> URL: https://issues.apache.org/jira/browse/ARROW-4503
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C#
>Reporter: Eric Erhardt
>Assignee: Eric Erhardt
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 0.14.0
>
>   Original Estimate: 48h
>  Time Spent: 12h 10m
>  Remaining Estimate: 35h 50m
>
> When reading `RecordBatch` instances using the `ArrowStreamReader` class, it 
> is currently allocating and copying memory 3 times for the data.
>  # It is allocating memory in order to [read the data from the 
> Stream|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/Ipc/ArrowStreamReader.cs#L72-L74],
>  and then reading from the Stream.  (This should be the only allocation that 
> is necessary.)
>  # It then [creates a new 
> `ArrowBuffer.Builder`|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/Ipc/ArrowStreamReader.cs#L227-L228],
>  which allocates another `byte[]`, and calls `Append` on it, which copies the 
> values to the new `byte[]`.
>  # Finally, it then calls `.Build()` on the `ArrowBuffer.Builder`, which 
> [allocates memory from the MemoryPool, and then copies the intermediate 
> buffer|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/ArrowBuffer.Builder.cs#L112-L121]
>  into it.
>  
> We should reduce this overhead to only allocating a single time (from the 
> MemoryPool), and not copying the data more times than necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4260) [Python] test_serialize_deserialize_pandas is failing in multiple build entries

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661284#comment-17661284
 ] 

Rok Mihevc commented on ARROW-4260:
---

This issue has been migrated to [issue 
#20837|https://github.com/apache/arrow/issues/20837] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] test_serialize_deserialize_pandas is failing in multiple build 
> entries
> ---
>
> Key: ARROW-4260
> URL: https://issues.apache.org/jira/browse/ARROW-4260
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Hatem Helal
>Assignee: Krisztian Szucs
>Priority: Blocker
>  Labels: ci-failure, pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> See 
>  [https://travis-ci.org/apache/arrow/jobs/479378190#L2427]
>   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-3926) [Python] Add Gandiva bindings to Python wheels

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3926:
--
External issue URL: https://github.com/apache/arrow/issues/20536

> [Python] Add Gandiva bindings to Python wheels
> --
>
> Key: ARROW-3926
> URL: https://issues.apache.org/jira/browse/ARROW-3926
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Depends on adding LLVM6 to the build toolchain



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4492) [Python] Failure reading Parquet column as pandas Categorical in 0.12

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4492:
--
External issue URL: https://github.com/apache/arrow/issues/21046

> [Python] Failure reading Parquet column as pandas Categorical in 0.12
> -
>
> Key: ARROW-4492
> URL: https://issues.apache.org/jira/browse/ARROW-4492
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
>Reporter: George Sakkis
>Priority: Major
>  Labels: Parquet
> Fix For: 0.12.1
>
> Attachments: slug.pq
>
>
> On pyarrow 0.12.0 some (but not all) columns cannot be read as category 
> dtype. Attached is an extracted failing sample.
>  {noformat}
> import dask.dataframe as dd
> df = dd.read_parquet('slug.pq', categories=['slug'], 
> engine='pyarrow').compute()
> print(len(df['slug'].dtype.categories))
>  {noformat}
> This works on pyarrow 0.11.1 (and fastparquet 0.2.1).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4594) [Ruby] Arrow::StructArray#[] returns Arrow::Struct instead of Arrow::Array

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4594:
--
External issue URL: https://github.com/apache/arrow/issues/21136

> [Ruby] Arrow::StructArray#[] returns Arrow::Struct instead of Arrow::Array
> --
>
> Key: ARROW-4594
> URL: https://issues.apache.org/jira/browse/ARROW-4594
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Ruby
>Affects Versions: 0.12.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4519) [JS] Publish JS API Docs for v0.4.0

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4519:
--
External issue URL: https://github.com/apache/arrow/issues/21070

> [JS] Publish JS API Docs for v0.4.0
> ---
>
> Key: ARROW-4519
> URL: https://issues.apache.org/jira/browse/ARROW-4519
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4268) [C++] Add C primitive to Arrow:Type compile time in TypeTraits

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4268:
--
External issue URL: https://github.com/apache/arrow/issues/20844

> [C++] Add C primitive to Arrow:Type compile time in TypeTraits
> --
>
> Key: ARROW-4268
> URL: https://issues.apache.org/jira/browse/ARROW-4268
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>   Original Estimate: 1h
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The user would use something like
> {code:c++}
> ...
> using ArrowType = CTypeTraits::ArrowType;
> using ArrayType = CTypeTraits::ArrayType;
> auto type = CTypeTraits::type_singleton();
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4539) [Java]List vector child value count not set correctly

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4539:
--
External issue URL: https://github.com/apache/arrow/issues/21088

> [Java]List vector child value count not set correctly
> -
>
> Key: ARROW-4539
> URL: https://issues.apache.org/jira/browse/ARROW-4539
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Praveen Kumar
>Assignee: Praveen Kumar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We are not correctly processing list vectors that could have null values. The 
> child value count would be off there by losing data in variable width vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4470) [Python] Pyarrow using considerable more memory when reading partitioned Parquet file

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661491#comment-17661491
 ] 

Rok Mihevc commented on ARROW-4470:
---

This issue has been migrated to [issue 
#21027|https://github.com/apache/arrow/issues/21027] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] Pyarrow using considerable more memory when reading partitioned 
> Parquet file
> -
>
> Key: ARROW-4470
> URL: https://issues.apache.org/jira/browse/ARROW-4470
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
>Reporter: Ivan SPM
>Priority: Major
>  Labels: dataset, datasets, parquet
> Fix For: 0.16.0
>
>
> Hi,
> I have a partitioned Parquet table in Impala in HDFS, using Hive metastore, 
> with the following structure:
> {{/data/myparquettable/year=2016}}{{/data/myparquettable/year=2016/myfile_1.prt}}
> {{/data/myparquettable/year=2016/myfile_2.prt}}
> {{/data/myparquettable/year=2016/myfile_3.prt}}
> {{/data/myparquettable/year=2017}}
> {{/data/myparquettable/year=2017/myfile_1.prt}}
> {{/data/myparquettable/year=2017/myfile_2.prt}}
> {{/data/myparquettable/year=2017/myfile_3.prt}}
> and so on. I need to work with one partition, so I copied one partition to a 
> local filesystem:
> {{hdfs fs -get /data/myparquettable/year=2017 /local/}}
> so now I have some data on the local disk:
> {{/local/year=2017/myfile_1.prt }}{{/local/year=2017/myfile_2.prt }}
> etc.I tried to read it using Pyarrow:
> {{import pyarrow.parquet as pq}}{{pq.read_parquet('/local/year=2017')}}
> and it starts reading. The problem is that the local Parquet files are around 
> 15GB total, and I blew up my machine memory a couple of times because when 
> reading these files, Pyarrow is using more than 60GB of RAM, and I'm not sure 
> how much it will take because it never finishes. Is this expected? Is there a 
> workaround?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660957#comment-17660957
 ] 

Rok Mihevc commented on ARROW-3933:
---

This issue has been migrated to [issue 
#20542|https://github.com/apache/arrow/issues/20542] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] Segfault reading Parquet files from GNOMAD
> ---
>
> Key: ARROW-3933
> URL: https://issues.apache.org/jira/browse/ARROW-3933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
> Environment: Ubuntu 18.04 or Mac OS X
>Reporter: David Konerding
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: parquet, pull-request-available
> Fix For: 0.15.0
>
> Attachments: 
> part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). 
> Error also occurs out of box on Mac OS X.
> $ sudo snap install --classic google-cloud-sdk
> $ gsutil cp 
> gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>  .
> $ conda install pyarrow
> $ python test.py
> Segmentation fault (core dumped)
> test.py:
> import pyarrow.parquet as pq
> path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
> pq.read_table(path)
> gdb output:
> Thread 3 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdf199700 (LWP 13703)]
> 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, 
> unsigned long*) () from 
> /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11
> I tested fastparquet, it reads the file just fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4512) [R] Stream reader/writer API that takes socket stream

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661534#comment-17661534
 ] 

Rok Mihevc commented on ARROW-4512:
---

This issue has been migrated to [issue 
#21063|https://github.com/apache/arrow/issues/21063] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [R] Stream reader/writer API that takes socket stream
> -
>
> Key: ARROW-4512
> URL: https://issues.apache.org/jira/browse/ARROW-4512
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.12.0, 0.14.1, 1.0.0
>Reporter: Hyukjin Kwon
>Assignee: Dewey Dunnington
>Priority: Major
> Fix For: 8.0.0
>
>
> I have been working on Spark integration with Arrow.
> I realised that there are no ways to use socket as input to use Arrow stream 
> format. For instance,
> I want to something like:
> {code}
> connStream <- socketConnection(port = , blocking = TRUE, open = "wb")
> rdf_slices <- # a list of data frames.
> stream_writer <- NULL
> tryCatch({
>   for (rdf_slice in rdf_slices) {
> batch <- record_batch(rdf_slice)
> if (is.null(stream_writer)) {
>   stream_writer <- RecordBatchStreamWriter(connStream, batch$schema)  # 
> Here, looks there's no way to use socket.
> }
> stream_writer$write_batch(batch)
>   }
> },
> finally = {
>   if (!is.null(stream_writer)) {
> stream_writer$close()
>   }
> })
> {code}
> Likewise, I cannot find a way to iterate the stream batch by batch
> {code}
> RecordBatchStreamReader(connStream)$batches()  # Here, looks there's no way 
> to use socket.
> {code}
> This looks easily possible in Python side but looks missing in R APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4508) [Format] Copy content from Layout.rst to new document.

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661530#comment-17661530
 ] 

Rok Mihevc commented on ARROW-4508:
---

This issue has been migrated to [issue 
#21059|https://github.com/apache/arrow/issues/21059] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Format] Copy content from Layout.rst to new document.
> --
>
> Key: ARROW-4508
> URL: https://issues.apache.org/jira/browse/ARROW-4508
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4515) [C++, lint] Use clang-format more efficiently in `check-format` target

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4515:
--
External issue URL: https://github.com/apache/arrow/issues/21066

> [C++, lint] Use clang-format more efficiently in `check-format` target
> --
>
> Key: ARROW-4515
> URL: https://issues.apache.org/jira/browse/ARROW-4515
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Priority: Minor
>  Labels: good-first-issue
>
> `clang-format` supports command line option `-output-replacements-xml` which 
> (in the case of no required changes) outputs:
> ```
> 
> 
> 
> ```
> Using this option during `check-format` instead of using python to compute a 
> diff between formatted and on-disk should speed up that target significantly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4554) [JS] Implement logic for combining Vectors with different lengths/chunksizes

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4554:
--
External issue URL: https://github.com/apache/arrow/issues/21102

> [JS] Implement logic for combining Vectors with different lengths/chunksizes
> 
>
> Key: ARROW-4554
> URL: https://issues.apache.org/jira/browse/ARROW-4554
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> We should add logic to combine and possibly slice/re-chunk and uniformly 
> partition chunks into separate RecordBatches. This will make it easier to 
> create Tables or RecordBatches from Vectors of different lengths. This is 
> also necessary for {{Table#assign()}}. PR incoming.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4508) [Format] Copy content from Layout.rst to new document.

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4508:
--
External issue URL: https://github.com/apache/arrow/issues/21059

> [Format] Copy content from Layout.rst to new document.
> --
>
> Key: ARROW-4508
> URL: https://issues.apache.org/jira/browse/ARROW-4508
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4276) [Release] Remove needless Bintray authentication from binaries verify script

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661300#comment-17661300
 ] 

Rok Mihevc commented on ARROW-4276:
---

This issue has been migrated to [issue 
#20851|https://github.com/apache/arrow/issues/20851] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Release] Remove needless Bintray authentication from binaries verify script
> 
>
> Key: ARROW-4276
> URL: https://issues.apache.org/jira/browse/ARROW-4276
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-3906) [C++] Break builder.cc into multiple compilation units

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3906:
--
External issue URL: https://github.com/apache/arrow/issues/15888

> [C++] Break builder.cc into multiple compilation units
> --
>
> Key: ARROW-3906
> URL: https://issues.apache.org/jira/browse/ARROW-3906
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> To improve readability I suggest splitting {{builder.cc}} into independent 
> compilation units. Concrete builder classes are generally independent of each 
> other. The only concern is whether inlining some of the base class 
> implementations is important for performance.
> This would also make incremental compilation faster when changing one of the 
> concrete classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4579) [JS] Add more interop with BigInt/BigInt64Array/BigUint64Array

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661602#comment-17661602
 ] 

Rok Mihevc commented on ARROW-4579:
---

This issue has been migrated to [issue 
#21123|https://github.com/apache/arrow/issues/21123] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [JS] Add more interop with BigInt/BigInt64Array/BigUint64Array
> --
>
> Key: ARROW-4579
> URL: https://issues.apache.org/jira/browse/ARROW-4579
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> We should use or return the new native [BigInt 
> types|https://developers.google.com/web/updates/2018/05/bigint] whenever it's 
> available.
> * Use the native {{BigInt}} to convert/stringify i64s/u64s
> * Support the {{BigInt}} type in element comparator and {{indexOf()}}
> * Add zero-copy {{toBigInt64Array()}} and {{toBigUint64Array()}} methods to 
> {{Int64Vector}} and {{Uint64Vector}}, respectively



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4585) [C++] Dependency of Flight C++ sources on generated protobuf is not respected

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4585:
--
External issue URL: https://github.com/apache/arrow/issues/15977

> [C++] Dependency of Flight C++ sources on generated protobuf is not respected
> -
>
> Key: ARROW-4585
> URL: https://issues.apache.org/jira/browse/ARROW-4585
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Seems like we have a race condition somewhere as we frequently run into
> {code:java}
> [82/273] Building CXX object 
> src/arrow/flight/CMakeFiles/arrow_flight_testing_objlib.dir/test-util.cc.o
> FAILED: 
> src/arrow/flight/CMakeFiles/arrow_flight_testing_objlib.dir/test-util.cc.o
> /usr/local/bin/ccache 
> /Users/ukorn/miniconda3/envs/pyarrow-dev-2/bin/x86_64-apple-darwin13.4.0-clang++
>  -DARROW_JEMALLOC 
> -DARROW_JEMALLOC_INCLUDE_DIR=/Users/ukorn/Development/arrow-repos-2/arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep/dist//include
>  -DARROW_USE_GLOG -DARROW_USE_SIMD -DARROW_WITH_BROTLI -DARROW_WITH_LZ4 
> -DARROW_WITH_SNAPPY -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -Isrc -I../src 
> -isystem /Users/ukorn/miniconda3/envs/pyarrow-dev-2/include -isystem 
> gbenchmark_ep/src/gbenchmark_ep-install/include -isystem 
> jemalloc_ep-prefix/src -isystem ../thirdparty/hadoop/include -isystem 
> /Users/ukorn/miniconda3/envs/pyarrow-dev-2/include/thrift -march=core2 
> -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong 
> -O2 -pipe -stdlib=libc++ -fvisibility-inlines-hidden -std=c++14 
> -fmessage-length=0 -Qunused-arguments -O3 -DNDEBUG -Wall 
> -Wno-unknown-warning-option -msse4.2 -stdlib=libc++ -O3 -DNDEBUG -fPIC 
> -std=gnu++11 -MD -MT 
> src/arrow/flight/CMakeFiles/arrow_flight_testing_objlib.dir/test-util.cc.o 
> -MF 
> src/arrow/flight/CMakeFiles/arrow_flight_testing_objlib.dir/test-util.cc.o.d 
> -o src/arrow/flight/CMakeFiles/arrow_flight_testing_objlib.dir/test-util.cc.o 
> -c ../src/arrow/flight/test-util.cc
> In file included from ../src/arrow/flight/test-util.cc:35:
> In file included from ../src/arrow/flight/internal.h:29:
> ../src/arrow/flight/protocol-internal.h:22:10: fatal error: 
> 'arrow/flight/Flight.grpc.pb.h' file not found
> #include "arrow/flight/Flight.grpc.pb.h"
> ^~~
> 1 error generated.
> [87/273] Building CXX object 
> src/arrow/python/CMakeFiles/arrow_python_objlib.dir/python_to_arrow.cc.o
> ninja: build stopped: subcommand failed.
> ninja 672,82s user 33,40s system 196% cpu 5:59,62 total{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4481) [Website] Instructions for publishing web site are missing a step

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4481:
--
External issue URL: https://github.com/apache/arrow/issues/21037

> [Website] Instructions for publishing web site are missing a step
> -
>
> Key: ARROW-4481
> URL: https://issues.apache.org/jira/browse/ARROW-4481
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Andy Grove
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The instructions for publishing the web site say to run the 
> "scripts/sync_format_docs.sh" which copies the top level "format" directory 
> to the "_docs" directory under site.
> Existing files in "_docs" have references to markdown files in the 
> "_docs/format" directory. For example, the IPC.md contains:
> {code:java}
> {% include_relative format/IPC.md %}{code}
> However my top level format directory does not contain this IPC.md, so I get 
> errors when running jekyll and I have had to create some dummy markdown files 
> as a workaround.
> I investigated this a bit and I think there is some prerequisite step that 
> isn't documented that would cause Sphinx to run and generate docs?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4480) [Python] Drive letter removed when writing parquet file

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661502#comment-17661502
 ] 

Rok Mihevc commented on ARROW-4480:
---

This issue has been migrated to [issue 
#21036|https://github.com/apache/arrow/issues/21036] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] Drive letter removed when writing parquet file 
> 
>
> Key: ARROW-4480
> URL: https://issues.apache.org/jira/browse/ARROW-4480
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
>Reporter: Seb Fru
>Assignee: Antoine Pitrou
>Priority: Blocker
>  Labels: parquet
> Fix For: 0.13.0
>
>
> Hi everyone,
>   
>  importing this from Github:
>   
>  I encountered a problem while working with pyarrow: I am working on Windows 
> 10. When I want to save a table using pq.write_table(tab, 
> r'E:\parquetfiles\file1.parquet'), I get the Error "No such file or 
> directory".
>   After searching a bit, i found out that the drive letter is getting removed 
> while parsing the where string, but I could not find a way solve my problem: 
> I can write the files on my C:\ drive without problems, but I am not able to 
> write a parquet file on another drive than C:.
>  Am I doing something wrong or is this just how it works? I would really 
> appreciate any help, because I just cannot fit my files on C: drive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4510) [Format] copy content from IPC.rst to new document.

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4510:
--
External issue URL: https://github.com/apache/arrow/issues/21061

> [Format] copy content from IPC.rst to new document.
> ---
>
> Key: ARROW-4510
> URL: https://issues.apache.org/jira/browse/ARROW-4510
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Format
>Reporter: Micah Kornfield
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4275) [C++] gandiva-decimal_single_test extremely slow

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4275:
--
External issue URL: https://github.com/apache/arrow/issues/15940

> [C++] gandiva-decimal_single_test extremely slow
> 
>
> Key: ARROW-4275
> URL: https://issues.apache.org/jira/browse/ARROW-4275
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Gandiva, Continuous Integration
>Affects Versions: 0.11.1
>Reporter: Antoine Pitrou
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {{gandiva-decimal_single_test}} is extremely slow on CI builds with Valgrind:
> {code}
>  99/100 Test #128: gandiva-decimal_single_test ...   Passed  
> 397.11 sec
> 100/100 Test #130: gandiva-decimal_single_test_static    Passed  
> 338.97 sec
> {code}
> (full log: https://travis-ci.org/apache/arrow/jobs/480198116#L2707)
> Something should be done to make it faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4526) [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4526:
--
External issue URL: https://github.com/apache/arrow/issues/21076

> [Java] Remove Netty references from ArrowBuf and move Allocator out of vector 
> package
> -
>
> Key: ARROW-4526
> URL: https://issues.apache.org/jira/browse/ARROW-4526
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Jacques Nadeau
>Assignee: Liya Fan
>Priority: Critical
> Fix For: 1.0.0
>
>
> Arrow currently has a hard dependency on Netty and exposes this in public 
> APIs. This shouldn't be the case. There could be many allocator 
> implementations with Netty as one possible option. We should remove hard 
> dependency between arrow-vector and Netty, instead creating a trivial 
> allocator. ArrowBuf should probably expose an  T unwrap(Class clazz) 
> method instead to allow inner providers availability without a hard 
> reference. This should also include drastically reducing the number of 
> methods on ArrowBuf as right now it includes every method from ByteBuf but 
> many of those are not very useful, appropriate.
> This work should come after we do the simpler ARROW-3191



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4559) [Python] pyarrow can't read/write filenames with special characters

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4559:
--
External issue URL: https://github.com/apache/arrow/issues/21106

> [Python] pyarrow can't read/write filenames with special characters
> ---
>
> Key: ARROW-4559
> URL: https://issues.apache.org/jira/browse/ARROW-4559
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
> Environment: $ python3 --version
> Python 3.6.6
> $ pip3 freeze | grep -Ei 'pyarrow|pandas'
> pandas==0.24.1
> pyarrow==0.12.0
>Reporter: Jean-Christophe Petkovich
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When writing or reading files to or from paths that have special characters 
> in them, (e.g., "#"), pyarrow returns an error: 
> {code:python}
> OSError: Passed non-file path...
> {code}
> This is a consequence of the following line:
> https://github.com/apache/arrow/blob/master/python/pyarrow/filesystem.py#L416
> File-paths will be parsed as URIs, which will give strange results for 
> filepaths like: "bad # actor.parquet":
> ParseResult(scheme='', netloc='', path='/tmp/bad ', params='', query='', 
> fragment='actor.parquet')
> This is trivial to reproduce with the following code which uses the 
> `pd.to_parquet` and `pd.read_parquet` interfaces:
> {code:python}
> import pandas as pd
> x = pd.DataFrame({"a": [1,2,3]})
> x.to_parquet("bad # actor.parquet")
> x.read_parquet("bad # actor.parquet")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3958) [Plasma] Reduce number of IPCs

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660981#comment-17660981
 ] 

Rok Mihevc commented on ARROW-3958:
---

This issue has been migrated to [issue 
#15897|https://github.com/apache/arrow/issues/15897] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Plasma] Reduce number of IPCs
> --
>
> Key: ARROW-3958
> URL: https://issues.apache.org/jira/browse/ARROW-3958
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Affects Versions: 0.11.1
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently we ship file descriptors of objects from the store to the client 
> every time an object is created or gotten. There is relatively few distinct 
> file descriptors, so caching them can get rid of one IPC in the majority of 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-2837) [C++] ArrayBuilder::null_bitmap returns PoolBuffer

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-2837:
--
External issue URL: https://github.com/apache/arrow/issues/19213

> [C++] ArrayBuilder::null_bitmap returns PoolBuffer
> --
>
> Key: ARROW-2837
> URL: https://issues.apache.org/jira/browse/ARROW-2837
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Dimitri Vorona
>Assignee: Dimitri Vorona
>Priority: Major
> Fix For: 0.10.0
>
>
> A simple buffer (like in case of ArrayBuilder::Data) seem to be enough to me, 
> and it doesn't break anything.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-3996) [C++] Insufficient description on build

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3996:
--
External issue URL: https://github.com/apache/arrow/issues/15900

> [C++] Insufficient description on build
> ---
>
> Key: ARROW-3996
> URL: https://issues.apache.org/jira/browse/ARROW-3996
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.11.1
> Environment: Ubuntu Linux(Include Ubuntu 18.04 LTS on Windows 10 WSL)
>Reporter: Kunihisa Abukawa
>Assignee: Kunihisa Abukawa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> C / C ++ version of Ubuntu Linux environment requires less library / 
> component description.
> Requirement
> * g ++
> * autoconf
> * Jemalloc
> In accordance with the above, you need to add the following to the library / 
> component you install with apt-get install.
> * libboost-regex-dev
> * libjemalloc-dev
> * autotools-dev



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-2857) [Python] Expose integration test JSON read/write in Python API

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17659882#comment-17659882
 ] 

Rok Mihevc commented on ARROW-2857:
---

This issue has been migrated to [issue 
#19230|https://github.com/apache/arrow/issues/19230] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] Expose integration test JSON read/write in Python API
> --
>
> Key: ARROW-2857
> URL: https://issues.apache.org/jira/browse/ARROW-2857
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner
>
> This should be clearly marked to not be used for persistence



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-2838) [Python] Speed up null testing with Pandas semantics

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-2838:
--
External issue URL: https://github.com/apache/arrow/issues/19214

> [Python] Speed up null testing with Pandas semantics
> 
>
> Key: ARROW-2838
> URL: https://issues.apache.org/jira/browse/ARROW-2838
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The {{PandasObjectIsNull}} helper function can be a significant contributor 
> when converting a Pandas dataframe to Arrow format (e.g. when writing a 
> dataframe to feather format). We can try to speed up the type checks in that 
> function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-2831) [Plasma] MemoryError in teardown

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-2831:
--
External issue URL: https://github.com/apache/arrow/issues/19208

> [Plasma] MemoryError in teardown
> 
>
> Key: ARROW-2831
> URL: https://issues.apache.org/jira/browse/ARROW-2831
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma
>Reporter: Uwe Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> There seems to be some flakiness in Plasma tests, e.g. see: 
> https://api.travis-ci.org/v3/job/402544643/log.txt
> {code}
>  ERRORS 
> 
> _ ERROR at teardown of TestPlasmaClient.test_subscribe 
> _
> self = 
> test_method =  >
> def teardown_method(self, test_method):
> try:
> # Check that the Plasma store is still alive.
> assert self.p.poll() is None
> # Ensure Valgrind and/or coverage have a clean exit
> self.p.send_signal(signal.SIGTERM)
> if sys.version_info >= (3, 3):
> self.p.wait(timeout=5)
> else:
> self.p.wait()
> >   assert self.p.returncode == 0
> E   assert 1 == 0
> E+  where 1 =  0x7f9a3dcd5850>.returncode
> E+where  
> = .p
> pyarrow/tests/test_plasma.py:132: AssertionError
>  Captured stderr setup 
> -
> ==20909== Memcheck, a memory error detector
> ==20909== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
> ==20909== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
> ==20909== Command: 
> /home/travis/build/apache/arrow/python/pyarrow/plasma_store -s 
> /tmp/test_plasma-Dzj8IQ/plasma.sock -m 1
> ==20909== 
> Allowing the Plasma store to use up to 0.1GB of memory.
> Connection to IPC socket failed for pathname 
> /tmp/test_plasma-Dzj8IQ/plasma.sock, retrying 50 more times
> Starting object store with directory /dev/shm and huge page support disabled
> Connection to IPC socket failed for pathname 
> /tmp/test_plasma-Dzj8IQ/plasma.sock, retrying 49 more times
> --- Captured stderr teardown 
> ---
> ==20909== Invalid free() / delete / delete[] / realloc()
> ==20909==at 0x4C2C83C: operator delete[](void*) (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==20909==by 0x4A870A: std::default_delete []>::operator()(unsigned char*) const (unique_ptr.h:99)
> ==20909==by 0x4A0D6C: std::unique_ptr std::default_delete >::~unique_ptr() (unique_ptr.h:377)
> ==20909==by 0x4C3AE1: void std::_Destroy [], std::default_delete > >(std::unique_ptr [], std::default_delete >*) (stl_construct.h:93)
> ==20909==by 0x4C2DD4: void 
> std::_Destroy_aux::__destroy std::default_delete >*>(std::unique_ptr std::default_delete >*, std::unique_ptr std::default_delete >*) (stl_construct.h:103)
> ==20909==by 0x4C1999: void std::_Destroy [], std::default_delete >*>(std::unique_ptr [], std::default_delete >*, std::unique_ptr [], std::default_delete >*) (stl_construct.h:126)
> ==20909==by 0x4BF460: void std::_Destroy [], std::default_delete >*, std::unique_ptr [], std::default_delete > >(std::unique_ptr [], std::default_delete >*, std::unique_ptr [], std::default_delete >*, 
> std::allocator char []> > >&) (stl_construct.h:151)
> ==20909==by 0x4B9D41: std::deque std::default_delete >, 
> std::allocator char []> > > 
> >::_M_destroy_data_aux(std::_Deque_iterator std::default_delete >, std::unique_ptr std::default_delete >&, std::unique_ptr std::default_delete >*>, 
> std::_Deque_iterator std::default_delete >, std::unique_ptr std::default_delete >&, std::unique_ptr std::default_delete >*>) (deque.tcc:806)
> ==20909==by 0x4B17DA: std::deque std::default_delete >, 
> std::allocator char []> > > >::_M_destroy_data(std::_Deque_iterator char [], std::default_delete >, std::unique_ptr char [], std::default_delete >&, std::unique_ptr char [], std::default_delete >*>, 
> std::_Deque_iterator std::default_delete >, std::unique_ptr std::default_delete >&, std::unique_ptr std::default_delete >*>, 
> std::allocator char []> > > const&) (

[jira] [Updated] (ARROW-4008) [C++] Integration test executable failure

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4008:
--
External issue URL: https://github.com/apache/arrow/issues/20610

> [C++] Integration test executable failure
> -
>
> Key: ARROW-4008
> URL: https://issues.apache.org/jira/browse/ARROW-4008
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: ci-failure, pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> See consistent CI failures:
> {code}
> -- Creating binary inputs
> /home/travis/build/apache/arrow/cpp-build/debug/arrow-json-integration-test 
> --integration 
> --arrow=/tmp/tmpx3363_ef/ffd61e898f66410093235f7b3edc0b8f_struct_example.json_to_arrow
>  --json=/home/travis/build/apache/arrow/integration/data/struct_example.json 
> --mode=JSON_TO_ARROW
> -- Validating file
> /home/travis/build/apache/arrow/cpp-build/debug/arrow-json-integration-test 
> --integration 
> --arrow=/tmp/tmpx3363_ef/ffd61e898f66410093235f7b3edc0b8f_struct_example.json_to_arrow
>  --json=/home/travis/build/apache/arrow/integration/data/struct_example.json 
> --mode=VALIDATE
> -- Validating stream
> /home/travis/build/apache/arrow/cpp-build/debug/file-to-stream 
> /tmp/tmpx3363_ef/ffd61e898f66410093235f7b3edc0b8f_struct_example.json_to_arrow
>  > 
> /tmp/tmpx3363_ef/d9968e28006f42ee9df239f50c71231a_struct_example.arrow_to_stream
> cat 
> /tmp/tmpx3363_ef/d9968e28006f42ee9df239f50c71231a_struct_example.arrow_to_stream
>  | /home/travis/build/apache/arrow/cpp-build/debug/stream-to-file > 
> /tmp/tmpx3363_ef/eaf061ea34e64a08bf38800d4fe5a9be_struct_example.stream_to_arrow
> /home/travis/build/apache/arrow/cpp-build/debug/arrow-json-integration-test 
> --integration 
> --arrow=/tmp/tmpx3363_ef/eaf061ea34e64a08bf38800d4fe5a9be_struct_example.stream_to_arrow
>  --json=/home/travis/build/apache/arrow/integration/data/struct_example.json 
> --mode=VALIDATE
> Command failed: 
> /home/travis/build/apache/arrow/cpp-build/debug/arrow-json-integration-test 
> --integration 
> --arrow=/tmp/tmpx3363_ef/eaf061ea34e64a08bf38800d4fe5a9be_struct_example.stream_to_arrow
>  --json=/home/travis/build/apache/arrow/integration/data/struct_example.json 
> --mode=VALIDATE
> With output:
> --
> Error message: Invalid: 
> /home/travis/build/apache/arrow/cpp/src/arrow/ipc/json-integration-test.cc:151
>  code: RecordBatchFileReader::Open(arrow_file.get(), _reader)
> /home/travis/build/apache/arrow/cpp/src/arrow/ipc/reader.cc:624 code: 
> ReadFooter()
> File is too small: 0
> {code}
> https://travis-ci.org/apache/arrow/jobs/467092615#L2567



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4038) [Rust] Add array_ops methods for boolean AND, OR, NOT

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661061#comment-17661061
 ] 

Rok Mihevc commented on ARROW-4038:
---

This issue has been migrated to [issue 
#20638|https://github.com/apache/arrow/issues/20638] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Rust] Add array_ops methods for boolean AND, OR, NOT
> -
>
> Key: ARROW-4038
> URL: https://issues.apache.org/jira/browse/ARROW-4038
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have math and comparison operations and I would now like to add boolean 
> unary and binary operators such as AND, OR, NOT for use in predicates against 
> arrow data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3892) [JS] Remove any dependency on compromised NPM flatmap-stream package

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660916#comment-17660916
 ] 

Rok Mihevc commented on ARROW-3892:
---

This issue has been migrated to [issue 
#20499|https://github.com/apache/arrow/issues/20499] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [JS] Remove any dependency on compromised NPM flatmap-stream package
> 
>
> Key: ARROW-3892
> URL: https://issues.apache.org/jira/browse/ARROW-3892
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We are erroring out as the result of 
> https://github.com/dominictarr/event-stream/issues/116
> {code}
>  npm ERR! code ENOVERSIONS
>  npm ERR! No valid versions available for flatmap-stream
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4276) [Release] Remove needless Bintray authentication from binaries verify script

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4276:
--
External issue URL: https://github.com/apache/arrow/issues/20851

> [Release] Remove needless Bintray authentication from binaries verify script
> 
>
> Key: ARROW-4276
> URL: https://issues.apache.org/jira/browse/ARROW-4276
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3948) [CI][GLib] Set timeout to Homebrew

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660972#comment-17660972
 ] 

Rok Mihevc commented on ARROW-3948:
---

This issue has been migrated to [issue 
#20555|https://github.com/apache/arrow/issues/20555] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [CI][GLib] Set timeout to Homebrew
> --
>
> Key: ARROW-3948
> URL: https://issues.apache.org/jira/browse/ARROW-3948
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Continuous Integration, GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We can't get Homebrew log when Travis detects hanged Homebrew process and 
> kill the CI job.
> We need to detect hanged Homebrew process and kill the Homebrew process to 
> get Homebrew log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-2858) [Packaging] Add unit tests for crossbow

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17659883#comment-17659883
 ] 

Rok Mihevc commented on ARROW-2858:
---

This issue has been migrated to [issue 
#19231|https://github.com/apache/arrow/issues/19231] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Packaging] Add unit tests for crossbow
> ---
>
> Key: ARROW-2858
> URL: https://issues.apache.org/jira/browse/ARROW-2858
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Reporter: Phillip Cloud
>Priority: Major
>
> As this code grows we should start adding unit tests to make sure we can make 
> changes safely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4048) [GLib] Return ChunkedArray instead of Array in gparquet_arrow_file_reader_read_column

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4048:
--
External issue URL: https://github.com/apache/arrow/issues/20647

> [GLib] Return ChunkedArray instead of Array in 
> gparquet_arrow_file_reader_read_column
> -
>
> Key: ARROW-4048
> URL: https://issues.apache.org/jira/browse/ARROW-4048
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Yosuke Shiro
>Assignee: Yosuke Shiro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Because FileReader::ReadColumn(int i, std::shared_ptr* out) is 
> deprecated since 0.12.
> [https://github.com/apache/arrow/pull/3171/files#diff-e6f49fbed784ef27e3ab0075e01a5871R132]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4164) [Rust] ArrayBuilder should not have associated types

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661188#comment-17661188
 ] 

Rok Mihevc commented on ARROW-4164:
---

This issue has been migrated to [issue 
#20749|https://github.com/apache/arrow/issues/20749] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Rust] ArrayBuilder should not have associated types
> 
>
> Key: ARROW-4164
> URL: https://issues.apache.org/jira/browse/ARROW-4164
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andy Grove
>Priority: Minor
>
> When dealing with arrays at runtime it is possible to represent them with 
> ArrayRef, allowing for things like Vec and then downcasting arrays 
> as needed based on schema meta data.
> I am now trying to do the same thing with ArrayBuilder but because this trait 
> has an associated type, I cannot create a Vec. This makes it 
> difficult in some cases to dynamic build arrays.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4580) [JS] Accept Iterables in IntVector/FloatVector from() signatures

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661603#comment-17661603
 ] 

Rok Mihevc commented on ARROW-4580:
---

This issue has been migrated to [issue 
#21124|https://github.com/apache/arrow/issues/21124] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [JS] Accept Iterables in IntVector/FloatVector from() signatures
> 
>
> Key: ARROW-4580
> URL: https://issues.apache.org/jira/browse/ARROW-4580
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> Right now {{IntVector.from()}} and {{FloatVector.from()}} expect the data is 
> already in typed-array form. But if we know the desired Vector type before 
> hand (e.g. if {{Int32Vector.from()}} is called), we can accept any JS 
> iterable of the values.
> In order to do this, we should ensure {{Float16Vector.from()}} properly 
> clamps incoming f32/f64 values to u16s, in case the source is a vanilla 
> 64-bit JS float.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3972) [C++] Update to LLVM and Clang bits to 7.0

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660995#comment-17660995
 ] 

Rok Mihevc commented on ARROW-3972:
---

This issue has been migrated to [issue 
#20575|https://github.com/apache/arrow/issues/20575] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C++] Update to LLVM and Clang bits to 7.0
> --
>
> Key: ARROW-3972
> URL: https://issues.apache.org/jira/browse/ARROW-3972
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, C++ - Gandiva
>Reporter: Uwe Korn
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> As {{llvmlite}}, the other package in the Python ecosystem moved to LLVM 7, 
> we should follow along to avoid problems when we use it in the same Python 
> environment as Gandiva.
> Reference: https://github.com/numba/llvmlite/pull/412



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3989) [Rust] CSV reader should handle case sensitivity for boolean values

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661012#comment-17661012
 ] 

Rok Mihevc commented on ARROW-3989:
---

This issue has been migrated to [issue 
#20592|https://github.com/apache/arrow/issues/20592] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Rust] CSV reader should handle case sensitivity for boolean values
> ---
>
> Key: ARROW-3989
> URL: https://issues.apache.org/jira/browse/ARROW-3989
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.11.1
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Excel saves booleans in CSV in upper case, Pandas uses Proper case.
> Our CSV reader doesn't recognise (True, False, TRUE, FALSE). I noticed this 
> when making boolean schema inference case insensitive.
>  
> I would propose that we convert Boolean strings to lower-case before casting 
> them to Rust's bool type. [~andygrove], what do you think?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3964) [Go] More readable example for csv.Reader

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660987#comment-17660987
 ] 

Rok Mihevc commented on ARROW-3964:
---

This issue has been migrated to [issue 
#20567|https://github.com/apache/arrow/issues/20567] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Go] More readable example for csv.Reader
> -
>
> Key: ARROW-3964
> URL: https://issues.apache.org/jira/browse/ARROW-3964
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Masashi Shibata
>Assignee: Masashi Shibata
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Example of godoc doesn't include input file(testdata/simple.csv). So it's 
> hard to understand the output.
> [https://godoc.org/github.com/apache/arrow/go/arrow/csv]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-3864) [GLib] Add support for allow-float-truncate cast option

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17660887#comment-17660887
 ] 

Rok Mihevc commented on ARROW-3864:
---

This issue has been migrated to [issue 
#20429|https://github.com/apache/arrow/issues/20429] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [GLib] Add support for allow-float-truncate cast option
> ---
>
> Key: ARROW-3864
> URL: https://issues.apache.org/jira/browse/ARROW-3864
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Affects Versions: 0.11.1
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4540) [Rust] Add basic JSON reader

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4540:
--
External issue URL: https://github.com/apache/arrow/issues/21089

> [Rust] Add basic JSON reader
> 
>
> Key: ARROW-4540
> URL: https://issues.apache.org/jira/browse/ARROW-4540
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This is the first step in getting a JSON reader working in Rust



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4049) [C++] Arrow never use glog even though glog is linked.

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661072#comment-17661072
 ] 

Rok Mihevc commented on ARROW-4049:
---

This issue has been migrated to [issue 
#20648|https://github.com/apache/arrow/issues/20648] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C++] Arrow never use glog even though glog is linked.
> --
>
> Key: ARROW-4049
> URL: https://issues.apache.org/jira/browse/ARROW-4049
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following is a part of arrow/util/logging.cc.
> {code}
> #ifdef ARROW_USE_GLOG
> typedef google::LogMessage LoggingProvider;
> #else
> typedef CerrLog LoggingProvider;
> #endif
> {code}
> As you see, when ARROW_USE_GLOG is defined, glog is intended to be used but 
> it's not never defined and glog is never used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4569) [Gandiva] validate that the precision/scale are within bounds

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4569:
--
External issue URL: https://github.com/apache/arrow/issues/21115

> [Gandiva] validate that the precision/scale are within bounds
> -
>
> Key: ARROW-4569
> URL: https://issues.apache.org/jira/browse/ARROW-4569
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-3904) [C++/Python] Validate scale and precision of decimal128 type

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3904:
--
External issue URL: https://github.com/apache/arrow/issues/20518

> [C++/Python] Validate scale and precision of decimal128 type
> 
>
> Key: ARROW-3904
> URL: https://issues.apache.org/jira/browse/ARROW-3904
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Krisztian Szucs
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> Do We have a specification for it? 
> Currently I can create `pa.decimal128(-2147483648, 2147483647)`, which seems 
> wrong.
> References:
> - 
> https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-2017
> - 
> https://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#CNCPT1832



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-3929) [Go] improve memory usage of CSV reader to improve runtime performances

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3929:
--
External issue URL: https://github.com/apache/arrow/issues/20539

> [Go] improve memory usage of CSV reader to improve runtime performances
> ---
>
> Key: ARROW-3929
> URL: https://issues.apache.org/jira/browse/ARROW-3929
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-3898) parquet-arrow example has compilation errors

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3898:
--
External issue URL: https://github.com/apache/arrow/issues/20511

> parquet-arrow example has compilation errors
> 
>
> Key: ARROW-3898
> URL: https://issues.apache.org/jira/browse/ARROW-3898
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Miao Wang
>Assignee: Miao Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When compiling example, reader-writer.cc, it shows following compilation 
> errors:
>  *no member named 'cout' in namespace 'std'*
>   PARQUET_THROW_NOT_OK(arrow::PrettyPrint(*array, 4, ::cout));
> in multiple places. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-3923) [Java] JDBC-to-Arrow Conversion: Unnecessary Calendar Requirement

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3923:
--
External issue URL: https://github.com/apache/arrow/issues/20534

> [Java] JDBC-to-Arrow Conversion: Unnecessary Calendar Requirement
> -
>
> Key: ARROW-3923
> URL: https://issues.apache.org/jira/browse/ARROW-3923
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Michael Pigott
>Assignee: Michael Pigott
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> While I was going through the JDBC Adapter source code, I noticed a Calendar 
> was required to create the Arrow Schema (for any Timestamp fields), and also 
> needed for converting a JDBC ResultSet to an ArrowVector (for Date, Time, and 
> Timestamp fields).
> However, Arrow Timestamps do not require a time zone, and none of the JDBC 
> getters for Date, Time, and Timestamp require a Calendar.
> I am proposing a change to make the Schema creator and ResultSet converter 
> support null Calendars. If a Calendar is available, it will be used, and if 
> not, it will not be used.
> The existing SureFire plugin configuration uses a UTC calendar for the 
> database, which is the default Calendar in the existing code.  Likewise, no 
> changes to the unit tests are required to provide adequate coverage for the 
> change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-3921) [CI][GLib] Log Homebrew output

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3921:
--
External issue URL: https://github.com/apache/arrow/issues/20532

> [CI][GLib] Log Homebrew output
> --
>
> Key: ARROW-3921
> URL: https://issues.apache.org/jira/browse/ARROW-3921
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We need more information to fix {{brew update}} problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4512) [R] Stream reader/writer API that takes socket stream

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4512:
--
External issue URL: https://github.com/apache/arrow/issues/21063

> [R] Stream reader/writer API that takes socket stream
> -
>
> Key: ARROW-4512
> URL: https://issues.apache.org/jira/browse/ARROW-4512
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 0.12.0, 0.14.1, 1.0.0
>Reporter: Hyukjin Kwon
>Assignee: Dewey Dunnington
>Priority: Major
> Fix For: 8.0.0
>
>
> I have been working on Spark integration with Arrow.
> I realised that there are no ways to use socket as input to use Arrow stream 
> format. For instance,
> I want to something like:
> {code}
> connStream <- socketConnection(port = , blocking = TRUE, open = "wb")
> rdf_slices <- # a list of data frames.
> stream_writer <- NULL
> tryCatch({
>   for (rdf_slice in rdf_slices) {
> batch <- record_batch(rdf_slice)
> if (is.null(stream_writer)) {
>   stream_writer <- RecordBatchStreamWriter(connStream, batch$schema)  # 
> Here, looks there's no way to use socket.
> }
> stream_writer$write_batch(batch)
>   }
> },
> finally = {
>   if (!is.null(stream_writer)) {
> stream_writer$close()
>   }
> })
> {code}
> Likewise, I cannot find a way to iterate the stream batch by batch
> {code}
> RecordBatchStreamReader(connStream)$batches()  # Here, looks there's no way 
> to use socket.
> {code}
> This looks easily possible in Python side but looks missing in R APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ARROW-4297) [C++] Fix build for 32-bit MSYS2

2023-01-10 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4297:
--
External issue URL: https://github.com/apache/arrow/issues/20870

> [C++] Fix build for 32-bit MSYS2
> 
>
> Key: ARROW-4297
> URL: https://issues.apache.org/jira/browse/ARROW-4297
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Javier Luraschi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> see mailing list thread 
> https://lists.apache.org/thread.html/0fc5f7ffe4eb4e4aea931f0b26902d27305cf98829fb5c4a7375f2ab@%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-4566) [C++][Flight] Add option to run arrow-flight-benchmark against a perf server running on a different host

2023-01-10 Thread Rok Mihevc (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17661588#comment-17661588
 ] 

Rok Mihevc commented on ARROW-4566:
---

This issue has been migrated to [issue 
#21112|https://github.com/apache/arrow/issues/21112] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C++][Flight] Add option to run arrow-flight-benchmark against a perf server 
> running on a different host
> 
>
> Key: ARROW-4566
> URL: https://issues.apache.org/jira/browse/ARROW-4566
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the assumption is that both processes are running on localhost. 
> While also interesting (to see how fast things can go taking network IO out 
> of the equation) it is not very realistic. It would be good to both establish 
> a baseline network IO benchmark between two hosts and then see how close a 
> Flight stream can get to that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >