[jira] [Updated] (ARROW-6592) [Java] Add support for skipping decoding of columns/field in Avro converter

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6592:
--
Labels: avro pull-request-available  (was: avro)

> [Java] Add support for skipping decoding of columns/field in Avro converter
> ---
>
> Key: ARROW-6592
> URL: https://issues.apache.org/jira/browse/ARROW-6592
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>  Labels: avro, pull-request-available
>
> Users should be able to pass in a set of fields they wish to decode from Avro 
> and the converter should avoid creating Vectors in the returned 
> ArrowSchemaRoot.  This would ideally support nested columns so if there was:
>  
> Struct A {
>     int B;
>     int C;
> } 
>  
> The use could choose to only read A.B or A.C or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6601) [Java] Improve JDBC adapter performance & add benchmark

2019-09-23 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-6601.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request 5472
[https://github.com/apache/arrow/pull/5472]

> [Java] Improve JDBC adapter performance & add benchmark
> ---
>
> Key: ARROW-6601
> URL: https://issues.apache.org/jira/browse/ARROW-6601
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Add a performance test as well to get a baseline number, to avoid performance 
> regression when we change related code.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-5580) [C++][Gandiva] Correct definitions of timestamp functions in Gandiva

2019-09-23 Thread Prudhvi Porandla (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prudhvi Porandla closed ARROW-5580.
---
Resolution: Fixed

> [C++][Gandiva] Correct definitions of timestamp functions in Gandiva
> 
>
> Key: ARROW-5580
> URL: https://issues.apache.org/jira/browse/ARROW-5580
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Prudhvi Porandla
>Assignee: Prudhvi Porandla
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Timestamp functions are unsupported in Gandiva due to definition mismatch.
> For example, Gandiva supports timestampAddMonth(timestamp, int32) but the 
> expected signature is  timestampAddMonth(int32, timestamp).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4930) [Python] Remove LIBDIR assumptions in Python build

2019-09-23 Thread Suvayu Ali (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936383#comment-16936383
 ] 

Suvayu Ali commented on ARROW-4930:
---

Hi [~apitrou], I have had limited success so far.

[I was working off of master, {{git describe}} says: 
{{apache-arrow-0.14.0-584-g176adf5a0}}]

This is what I found:

1. {{setup.py}} makes the library directory is {{$ARROW_HOME/lib}} when setting 
{{PKG_CONFIG_PATH}} in the environment (line 253). I believe this is bit of a 
hack, which is also mentioned by the author in the issue that tracked that 
change ARROW-1090. The resolution should be somewhere in the cmake scripts.
 2. I successfully detected {{libarrow}} with the attached patch 
[^FindArrow.cmake.patch].
 3. However I then failed to detect {{libparquet}}. On further investigation I 
found (AFAIU) that even though {{FindParquet.cmake}} sets {{ARROW_HOME}}, it is 
not used. However, it does use {{PARQUET_HOME}}. Since my CMake foo is a bit 
weak, I worked up a similar patch [^FindParquet.cmake.patch] as before and set 
{{export PARQUET_HOME=$ARROW_HOME}} in the terminal. This allowed the 
compilation to succeed.

The compilation commands I used for C++ and Python are:
{code:java}
$ cmake -G Ninja -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
  -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_ORC=ON \
  -DARROW_PARQUET=ON -DPYTHON_EXECUTABLE=/usr/bin/python3.7m \
  -DARROW_PYTHON=ON -DARROW_PLASMA=ON \
  -DARROW_BUILD_TESTS=ON -DLLVM_DIR=/usr/lib64/llvm7.0 ..
$ python3 setup.py build_ext --cmake-generator Ninja --inplace
{code}
I then tried to run the python tests with {{pytest-3 pyarrow}}. The summary was:
{quote}5 failed, 1411 passed, 59 skipped, 4 xfailed, 29 warnings in 28.30 
seconds
{quote}
The failures are all some kind of setup related issues, not being able to 
import, not being able to start plasma, etc.

I'll investigate this further, but my take is the cmake scripts don't actually 
have _one way_ of detecting the libraries, making it very difficult to 
configure it properly from setup.py.

> [Python] Remove LIBDIR assumptions in Python build
> --
>
> Key: ARROW-4930
> URL: https://issues.apache.org/jira/browse/ARROW-4930
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.12.1
>Reporter: Suvayu Ali
>Priority: Minor
>  Labels: setup.py
> Fix For: 2.0.0
>
> Attachments: FindArrow.cmake.patch, FindParquet.cmake.patch
>
>
> This is in reference to (4) in 
> [this|http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C0AF328A1-ED2A-457F-B72D-3B49C8614850%40xhochy.com%3E]
>  mailing list discussion.
> Certain sections of setup.py assume a specific location of the C++ libraries. 
> Removing this hard assumption will simplify PyArrow builds significantly. As 
> far as I could tell these assumptions are made in the 
> {{build_ext._run_cmake()}} method (wherever bundling of C++ libraries are 
> handled).
>  # The first occurrence is before invoking cmake (see line 237).
>  # The second occurrence is when the C++ libraries are moved from their build 
> directory to the Python tree (see line 347). The actual implementation is in 
> the function {{_move_shared_libs_unix(..)}} (see line 468).
> Hope this helps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4930) [Python] Remove LIBDIR assumptions in Python build

2019-09-23 Thread Suvayu Ali (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suvayu Ali updated ARROW-4930:
--
Attachment: FindParquet.cmake.patch
FindArrow.cmake.patch

> [Python] Remove LIBDIR assumptions in Python build
> --
>
> Key: ARROW-4930
> URL: https://issues.apache.org/jira/browse/ARROW-4930
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.12.1
>Reporter: Suvayu Ali
>Priority: Minor
>  Labels: setup.py
> Fix For: 2.0.0
>
> Attachments: FindArrow.cmake.patch, FindParquet.cmake.patch
>
>
> This is in reference to (4) in 
> [this|http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C0AF328A1-ED2A-457F-B72D-3B49C8614850%40xhochy.com%3E]
>  mailing list discussion.
> Certain sections of setup.py assume a specific location of the C++ libraries. 
> Removing this hard assumption will simplify PyArrow builds significantly. As 
> far as I could tell these assumptions are made in the 
> {{build_ext._run_cmake()}} method (wherever bundling of C++ libraries are 
> handled).
>  # The first occurrence is before invoking cmake (see line 237).
>  # The second occurrence is when the C++ libraries are moved from their build 
> directory to the Python tree (see line 347). The actual implementation is in 
> the function {{_move_shared_libs_unix(..)}} (see line 468).
> Hope this helps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6668) [Rust] [DataFusion] Implement CAST expression

2019-09-23 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6668.

Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5477
[https://github.com/apache/arrow/pull/5477]

> [Rust] [DataFusion] Implement CAST expression
> -
>
> Key: ARROW-6668
> URL: https://issues.apache.org/jira/browse/ARROW-6668
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Implement CAST expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6664) [C++] Add option to build without SSE4.2

2019-09-23 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-6664.
-
Resolution: Fixed

Issue resolved by pull request 5468
[https://github.com/apache/arrow/pull/5468]

> [C++] Add option to build without SSE4.2
> 
>
> Key: ARROW-6664
> URL: https://issues.apache.org/jira/browse/ARROW-6664
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Child task of ARROW-5381



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6664) [C++] Add option to build without SSE4.2

2019-09-23 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-6664:
---

Assignee: Wes McKinney

> [C++] Add option to build without SSE4.2
> 
>
> Key: ARROW-6664
> URL: https://issues.apache.org/jira/browse/ARROW-6664
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Child task of ARROW-5381



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6532) [R] Write parquet files with compression

2019-09-23 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-6532:
--

Assignee: Romain François  (was: Neal Richardson)

> [R] Write parquet files with compression
> 
>
> Key: ARROW-6532
> URL: https://issues.apache.org/jira/browse/ARROW-6532
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Followup to ARROW-6360. See ARROW-6216 for the C++ side. `write_parquet()` 
> should be able to write compressed files, including with a specified 
> compression level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6532) [R] Write parquet files with compression

2019-09-23 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-6532:
--

Assignee: Neal Richardson

> [R] Write parquet files with compression
> 
>
> Key: ARROW-6532
> URL: https://issues.apache.org/jira/browse/ARROW-6532
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Followup to ARROW-6360. See ARROW-6216 for the C++ side. `write_parquet()` 
> should be able to write compressed files, including with a specified 
> compression level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-3817) [R] $ method for RecordBatch

2019-09-23 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-3817.

Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5459
[https://github.com/apache/arrow/pull/5459]

> [R] $ method for RecordBatch
> 
>
> Key: ARROW-3817
> URL: https://issues.apache.org/jira/browse/ARROW-3817
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Romain François
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6670) [CI][R] Fix fix for R nightly jobs

2019-09-23 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson resolved ARROW-6670.

Resolution: Fixed

Issue resolved by pull request 5479
[https://github.com/apache/arrow/pull/5479]

> [CI][R] Fix fix for R nightly jobs
> --
>
> Key: ARROW-6670
> URL: https://issues.apache.org/jira/browse/ARROW-6670
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6665) [Rust] [DataFusion] Implement numeric literal expressions

2019-09-23 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-6665.
---
Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5474
[https://github.com/apache/arrow/pull/5474]

> [Rust] [DataFusion] Implement numeric literal expressions
> -
>
> Key: ARROW-6665
> URL: https://issues.apache.org/jira/browse/ARROW-6665
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Implement numeric literal expressions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6670) [CI][R] Fix fix for R nightly jobs

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6670:
--
Labels: pull-request-available  (was: )

> [CI][R] Fix fix for R nightly jobs
> --
>
> Key: ARROW-6670
> URL: https://issues.apache.org/jira/browse/ARROW-6670
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6669) [Rust] [DataFusion] Implement physical expression for binary expressions

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6669:
--
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Implement physical expression for binary expressions
> 
>
> Key: ARROW-6669
> URL: https://issues.apache.org/jira/browse/ARROW-6669
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
>
> Implement comparison operators (<, <=, >, >=, =, !=) as well as binary 
> operators AND and OR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6670) [CI][R] Fix fix for R nightly jobs

2019-09-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6670:
--

 Summary: [CI][R] Fix fix for R nightly jobs
 Key: ARROW-6670
 URL: https://issues.apache.org/jira/browse/ARROW-6670
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.15.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6668) [Rust] [DataFusion] Implement CAST expression

2019-09-23 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-6668:
-

Assignee: Andy Grove

> [Rust] [DataFusion] Implement CAST expression
> -
>
> Key: ARROW-6668
> URL: https://issues.apache.org/jira/browse/ARROW-6668
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement CAST expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6669) [Rust] [DataFusion] Implement physical expression for binary expressions

2019-09-23 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-6669:
-

Assignee: Andy Grove

> [Rust] [DataFusion] Implement physical expression for binary expressions
> 
>
> Key: ARROW-6669
> URL: https://issues.apache.org/jira/browse/ARROW-6669
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> Implement comparison operators (<, <=, >, >=, =, !=) as well as binary 
> operators AND and OR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6669) [Rust] [DataFusion] Implement physical expression for binary expressions

2019-09-23 Thread Andy Grove (Jira)
Andy Grove created ARROW-6669:
-

 Summary: [Rust] [DataFusion] Implement physical expression for 
binary expressions
 Key: ARROW-6669
 URL: https://issues.apache.org/jira/browse/ARROW-6669
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove


Implement comparison operators (<, <=, >, >=, =, !=) as well as binary 
operators AND and OR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6667) [Python] Avoid Reference Cycles in pyarrow.parquet

2019-09-23 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6667:
--
Component/s: Python

> [Python] Avoid Reference Cycles in pyarrow.parquet
> --
>
> Key: ARROW-6667
> URL: https://issues.apache.org/jira/browse/ARROW-6667
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Aaron Opfer
>Priority: Minor
>  Labels: pull-request-available
> Attachments: cycle1_build_nested_path.PNG, cycle2_open_dataset.PNG
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Reference cycles appear in two places inside pyarrow.parquet which causes 
> these objects to have much longer lifetimes than necessary:
>  
> {{_build_nested_path}} has a reference cycle because the closured function 
> refers to the parent cell which also refers to the closured function again 
> (objgraph shown in attachment)
> {{open_dataset_file}} is partialed with self inside the {{ParquetFile}} class 
> (objgraph shown in attachment).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6655) [Python] Filesystem bindings for S3

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6655:
--
Labels: pull-request-available  (was: )

> [Python] Filesystem bindings for S3
> ---
>
> Key: ARROW-6655
> URL: https://issues.apache.org/jira/browse/ARROW-6655
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Follow-up work of ARROW-5494: [Python] Create FileSystem bindings



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6668) [Rust] [DataFusion] Implement CAST expression

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6668:
--
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Implement CAST expression
> -
>
> Key: ARROW-6668
> URL: https://issues.apache.org/jira/browse/ARROW-6668
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Implement CAST expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6668) [Rust] [DataFusion] Implement CAST expression

2019-09-23 Thread Andy Grove (Jira)
Andy Grove created ARROW-6668:
-

 Summary: [Rust] [DataFusion] Implement CAST expression
 Key: ARROW-6668
 URL: https://issues.apache.org/jira/browse/ARROW-6668
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Implement CAST expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6664) [C++] Add option to build without SSE4.2

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6664:
--
Labels: pull-request-available  (was: )

> [C++] Add option to build without SSE4.2
> 
>
> Key: ARROW-6664
> URL: https://issues.apache.org/jira/browse/ARROW-6664
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>
> Child task of ARROW-5381



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6213) [C++] tests fail for AVX512

2019-09-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936012#comment-16936012
 ] 

Antoine Pitrou commented on ARROW-6213:
---

Well, let's create that account :-) Does it have a stable IP address?

> [C++] tests fail for AVX512
> ---
>
> Key: ARROW-6213
> URL: https://issues.apache.org/jira/browse/ARROW-6213
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.14.1
> Environment: CentOS 7.6.1810, Intel Xeon Processor (Skylake, IBRS) 
> avx512
>Reporter: Charles Coulombe
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: arrow-0.14.1-c++-failed-tests-cmake-conf.txt, 
> arrow-0.14.1-c++-failed-tests.txt
>
>
> When building libraries for avx512 with GCC 7.3.0, two C++ tests fails.
> {noformat}
> The following tests FAILED: 
>   28 - arrow-compute-compare-test (Failed) 
>   30 - arrow-compute-filter-test (Failed) 
> Errors while running CTest{noformat}
> while for avx2 they passes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6667) [Python] Avoid Reference Cycles in pyarrow.parquet

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6667:
--
Labels: pull-request-available  (was: )

> [Python] Avoid Reference Cycles in pyarrow.parquet
> --
>
> Key: ARROW-6667
> URL: https://issues.apache.org/jira/browse/ARROW-6667
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Aaron Opfer
>Priority: Minor
>  Labels: pull-request-available
> Attachments: cycle1_build_nested_path.PNG, cycle2_open_dataset.PNG
>
>
> Reference cycles appear in two places inside pyarrow.parquet which causes 
> these objects to have much longer lifetimes than necessary:
>  
> {{_build_nested_path}} has a reference cycle because the closured function 
> refers to the parent cell which also refers to the closured function again 
> (objgraph shown in attachment)
> {{open_dataset_file}} is partialed with self inside the {{ParquetFile}} class 
> (objgraph shown in attachment).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6667) [Python] Avoid Reference Cycles in pyarrow.parquet

2019-09-23 Thread Aaron Opfer (Jira)
Aaron Opfer created ARROW-6667:
--

 Summary: [Python] Avoid Reference Cycles in pyarrow.parquet
 Key: ARROW-6667
 URL: https://issues.apache.org/jira/browse/ARROW-6667
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Aaron Opfer
 Attachments: cycle1_build_nested_path.PNG, cycle2_open_dataset.PNG

Reference cycles appear in two places inside pyarrow.parquet which causes these 
objects to have much longer lifetimes than necessary:

 

{{_build_nested_path}} has a reference cycle because the closured function 
refers to the parent cell which also refers to the closured function again 
(objgraph shown in attachment)

{{open_dataset_file}} is partialed with self inside the {{ParquetFile}} class 
(objgraph shown in attachment).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6089) [Rust] [DataFusion] Implement parallel execution for selection

2019-09-23 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-6089:
--
Fix Version/s: (was: 0.15.0)
   1.0.0

> [Rust] [DataFusion] Implement parallel execution for selection
> --
>
> Key: ARROW-6089
> URL: https://issues.apache.org/jira/browse/ARROW-6089
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Implement physical plan for selection operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6665) [Rust] [DataFusion] Implement numeric literal expressions

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6665:
--
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Implement numeric literal expressions
> -
>
> Key: ARROW-6665
> URL: https://issues.apache.org/jira/browse/ARROW-6665
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Implement numeric literal expressions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6666) [Rust] [DataFusion] Implement string literal expression

2019-09-23 Thread Andy Grove (Jira)
Andy Grove created ARROW-:
-

 Summary: [Rust] [DataFusion] Implement string literal expression
 Key: ARROW-
 URL: https://issues.apache.org/jira/browse/ARROW-
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Implement string literal expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6665) [Rust] [DataFusion] Implement numeric literal expressions

2019-09-23 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-6665:
-

Assignee: Andy Grove

> [Rust] [DataFusion] Implement numeric literal expressions
> -
>
> Key: ARROW-6665
> URL: https://issues.apache.org/jira/browse/ARROW-6665
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> Implement numeric literal expressions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6665) [Rust] [DataFusion] Implement numeric literal expressions

2019-09-23 Thread Andy Grove (Jira)
Andy Grove created ARROW-6665:
-

 Summary: [Rust] [DataFusion] Implement numeric literal expressions
 Key: ARROW-6665
 URL: https://issues.apache.org/jira/browse/ARROW-6665
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Implement numeric literal expressions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6605) [C++] Add recursion depth control to fs::Selector

2019-09-23 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques resolved ARROW-6605.
---
Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5429
[https://github.com/apache/arrow/pull/5429]

> [C++] Add recursion depth control to fs::Selector
> -
>
> Key: ARROW-6605
> URL: https://issues.apache.org/jira/browse/ARROW-6605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is similar to the recursive options, but also control the depth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6605) [C++] Add recursion depth control to fs::Selector

2019-09-23 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-6605:
-

Assignee: Francois Saint-Jacques

> [C++] Add recursion depth control to fs::Selector
> -
>
> Key: ARROW-6605
> URL: https://issues.apache.org/jira/browse/ARROW-6605
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is similar to the recursive options, but also control the depth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6352) [Java] Add implementation of DenseUnionVector.

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6352:
--
Labels: pull-request-available  (was: )

> [Java] Add implementation of DenseUnionVector.
> --
>
> Key: ARROW-6352
> URL: https://issues.apache.org/jira/browse/ARROW-6352
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Micah Kornfield
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>
> Today only Sparse unions are supported.  We should have a dense union 
> implementation vector that conforms to the IPC protocol (the current sparse 
> union vector doesn't do this and there are other JIRAs covering making it 
> compatible).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6601) [Java] Improve JDBC adapter performance & add benchmark

2019-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6601:
--
Labels: pull-request-available  (was: )

> [Java] Improve JDBC adapter performance & add benchmark
> ---
>
> Key: ARROW-6601
> URL: https://issues.apache.org/jira/browse/ARROW-6601
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Critical
>  Labels: pull-request-available
>
> Add a performance test as well to get a baseline number, to avoid performance 
> regression when we change related code.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6601) [Java] Improve JDBC adapter performance & add benchmark

2019-09-23 Thread Ji Liu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935613#comment-16935613
 ] 

Ji Liu commented on ARROW-6601:
---

When working with Jdbc adapter benchmark, I found the jmh result is very worse 
(about 168 ns/op), and I finally found that when we initialize a 
VectorSchemaRoot, when call JdbcToArrowUtils#allocateVectors which is time 
consuming, and this is not necessary since we use setSafe API in consumers. 
After remove this, the jmh result is about 2000ns/op (3 coulumns with 
valueCount = 3000).

I think this one should merged into 0.15 release.

> [Java] Improve JDBC adapter performance & add benchmark
> ---
>
> Key: ARROW-6601
> URL: https://issues.apache.org/jira/browse/ARROW-6601
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Critical
>
> Add a performance test as well to get a baseline number, to avoid performance 
> regression when we change related code.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6601) [Java] Improve JDBC adapter performance & add benchmark

2019-09-23 Thread Ji Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu updated ARROW-6601:
--
Summary: [Java] Improve JDBC adapter performance & add benchmark  (was: 
[Java] Add benchmark for JDBC adapter to avoid potential regression)

> [Java] Improve JDBC adapter performance & add benchmark
> ---
>
> Key: ARROW-6601
> URL: https://issues.apache.org/jira/browse/ARROW-6601
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Critical
>
> Add a performance test as well to get a baseline number, to avoid performance 
> regression when we change related code.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)