[jira] [Updated] (ARROW-3111) [Java] Enable changing default logging level when running tests

2018-08-22 Thread Bryan Cutler (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-3111:

Description: Currently tests use the logback logger which has a default 
level of DEBUG. We should provide a way to change this level so that tests can 
be run without seeing a ton of DEBUG logging messages, if needed.  (was: 
Currently tests use the logback logger which has a default level of DEBUG. We 
should provide a way to change this level so that tests can be run without 
seeing a ton of DEBUG messages, if needed.)

> [Java] Enable changing default logging level when running tests
> ---
>
> Key: ARROW-3111
> URL: https://issues.apache.org/jira/browse/ARROW-3111
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently tests use the logback logger which has a default level of DEBUG. We 
> should provide a way to change this level so that tests can be run without 
> seeing a ton of DEBUG logging messages, if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3111) [Java] Enable changing default logging level when running tests

2018-08-22 Thread Bryan Cutler (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated ARROW-3111:

Description: Currently tests use the logback logger which has a default 
level of DEBUG. We should provide a way to change this level so that tests can 
be run without seeing a ton of DEBUG messages, if needed.  (was: Currently 
tests use the logback logger which has a default level of DEBUG. We should 
provide a way to change this level so that CI can run a build without seeing 
DEBUG messages if needed.)

> [Java] Enable changing default logging level when running tests
> ---
>
> Key: ARROW-3111
> URL: https://issues.apache.org/jira/browse/ARROW-3111
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently tests use the logback logger which has a default level of DEBUG. We 
> should provide a way to change this level so that tests can be run without 
> seeing a ton of DEBUG messages, if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3085) [Rust] Add an adapter for parquet.

2018-08-22 Thread Andy Grove (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589665#comment-16589665
 ] 

Andy Grove commented on ARROW-3085:
---

[~liurenjie1024] I have code in [https://github.com/datafusion-rs/datafusion] 
that loads parquet files into arrow memory. That may be useful at least as a 
reference. I'm also happy to donate any of this code to the arrow project (or 
to the parquet-rs project).

> [Rust] Add an adapter for parquet.
> --
>
> Key: ARROW-3085
> URL: https://issues.apache.org/jira/browse/ARROW-3085
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1

2018-08-22 Thread Lukasz Bartnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Bartnik reassigned ARROW-1380:
-

Assignee: Lukasz Bartnik

> [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
> 
>
> Key: ARROW-1380
> URL: https://issues.apache.org/jira/browse/ARROW-1380
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Assignee: Lukasz Bartnik
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
> Attachments: LastTest.log, valgrind.supp_
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I thought I fixed this, but they seem to have recurred:
> https://travis-ci.org/apache/arrow/jobs/266421430#L5220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1380:
--
Labels: pull-request-available  (was: )

> [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
> 
>
> Key: ARROW-1380
> URL: https://issues.apache.org/jira/browse/ARROW-1380
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
> Attachments: LastTest.log, valgrind.supp_
>
>
> I thought I fixed this, but they seem to have recurred:
> https://travis-ci.org/apache/arrow/jobs/266421430#L5220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3111) [Java] Enable changing default logging level when running tests

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3111:
--
Labels: pull-request-available  (was: )

> [Java] Enable changing default logging level when running tests
> ---
>
> Key: ARROW-3111
> URL: https://issues.apache.org/jira/browse/ARROW-3111
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> Currently tests use the logback logger which has a default level of DEBUG. We 
> should provide a way to change this level so that CI can run a build without 
> seeing DEBUG messages if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3111) [Java] Enable changing default logging level when running tests

2018-08-22 Thread Bryan Cutler (JIRA)
Bryan Cutler created ARROW-3111:
---

 Summary: [Java] Enable changing default logging level when running 
tests
 Key: ARROW-3111
 URL: https://issues.apache.org/jira/browse/ARROW-3111
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler


Currently tests use the logback logger which has a default level of DEBUG. We 
should provide a way to change this level so that CI can run a build without 
seeing DEBUG messages if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-2092) [Python] Enhance benchmark suite

2018-08-22 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-2092.
-
Resolution: Not A Problem

I'm gonna close this in favour of more focussed issues when necessary.

> [Python] Enhance benchmark suite
> 
>
> Key: ARROW-2092
> URL: https://issues.apache.org/jira/browse/ARROW-2092
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>
> We need to test more operations in the ASV-based benchmarks suite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2965) [Python] Possible uint64 overflow issues in python_to_arrow.cc

2018-08-22 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-2965.
-
Resolution: Fixed

Issue resolved by pull request 2463
[https://github.com/apache/arrow/pull/2463]

> [Python] Possible uint64 overflow issues in python_to_arrow.cc
> --
>
> Key: ARROW-2965
> URL: https://issues.apache.org/jira/browse/ARROW-2965
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There are some places, like the {{AppendScalar}} function, where UINT64 or 
> ULONGLONG is being cast to int64 without overflow checking



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3110) [C++] Compilation warnings with gcc 7.3.0

2018-08-22 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3110.
-
   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2464
[https://github.com/apache/arrow/pull/2464]

> [C++] Compilation warnings with gcc 7.3.0
> -
>
> Key: ARROW-3110
> URL: https://issues.apache.org/jira/browse/ARROW-3110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is happening when building in release mode:
> {code}
> ../src/arrow/python/python_to_arrow.cc: In function 'arrow::Status 
> arrow::py::detail::BuilderAppend(arrow::BinaryBuilder*, PyObject*, bool*)':
> ../src/arrow/python/python_to_arrow.cc:388:56: warning: 'length' may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
>if (ARROW_PREDICT_FALSE(builder->value_data_length() + length > 
> kBinaryMemoryLimit)) {
> ^
> ../src/arrow/python/python_to_arrow.cc:385:11: note: 'length' was declared 
> here
>int32_t length;
>^~
> In file included from ../src/arrow/python/serialize.cc:32:0:
> ../src/arrow/builder.h: In member function 'arrow::Status 
> arrow::py::SequenceBuilder::Update(int64_t, int8_t*)':
> ../src/arrow/builder.h:413:5: warning: 'offset32' may be used uninitialized 
> in this function [-Wmaybe-uninitialized]
>  raw_data_[length_++] = val;
>  ^
> ../src/arrow/python/serialize.cc:90:13: note: 'offset32' was declared here
>  int32_t offset32;
>  ^~~~
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3110) [C++] Compilation warnings with gcc 7.3.0

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3110:
--
Labels: pull-request-available  (was: )

> [C++] Compilation warnings with gcc 7.3.0
> -
>
> Key: ARROW-3110
> URL: https://issues.apache.org/jira/browse/ARROW-3110
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> This is happening when building in release mode:
> {code}
> ../src/arrow/python/python_to_arrow.cc: In function 'arrow::Status 
> arrow::py::detail::BuilderAppend(arrow::BinaryBuilder*, PyObject*, bool*)':
> ../src/arrow/python/python_to_arrow.cc:388:56: warning: 'length' may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
>if (ARROW_PREDICT_FALSE(builder->value_data_length() + length > 
> kBinaryMemoryLimit)) {
> ^
> ../src/arrow/python/python_to_arrow.cc:385:11: note: 'length' was declared 
> here
>int32_t length;
>^~
> In file included from ../src/arrow/python/serialize.cc:32:0:
> ../src/arrow/builder.h: In member function 'arrow::Status 
> arrow::py::SequenceBuilder::Update(int64_t, int8_t*)':
> ../src/arrow/builder.h:413:5: warning: 'offset32' may be used uninitialized 
> in this function [-Wmaybe-uninitialized]
>  raw_data_[length_++] = val;
>  ^
> ../src/arrow/python/serialize.cc:90:13: note: 'offset32' was declared here
>  int32_t offset32;
>  ^~~~
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3110) [C++] Compilation warnings with gcc 7.3.0

2018-08-22 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3110:
-

 Summary: [C++] Compilation warnings with gcc 7.3.0
 Key: ARROW-3110
 URL: https://issues.apache.org/jira/browse/ARROW-3110
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.10.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


This is happening when building in release mode:
{code}
../src/arrow/python/python_to_arrow.cc: In function 'arrow::Status 
arrow::py::detail::BuilderAppend(arrow::BinaryBuilder*, PyObject*, bool*)':
../src/arrow/python/python_to_arrow.cc:388:56: warning: 'length' may be used 
uninitialized in this function [-Wmaybe-uninitialized]
   if (ARROW_PREDICT_FALSE(builder->value_data_length() + length > 
kBinaryMemoryLimit)) {
^
../src/arrow/python/python_to_arrow.cc:385:11: note: 'length' was declared here
   int32_t length;
   ^~
In file included from ../src/arrow/python/serialize.cc:32:0:
../src/arrow/builder.h: In member function 'arrow::Status 
arrow::py::SequenceBuilder::Update(int64_t, int8_t*)':
../src/arrow/builder.h:413:5: warning: 'offset32' may be used uninitialized in 
this function [-Wmaybe-uninitialized]
 raw_data_[length_++] = val;
 ^
../src/arrow/python/serialize.cc:90:13: note: 'offset32' was declared here
 int32_t offset32;
 ^~~~
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2965) [Python] Possible uint64 overflow issues in python_to_arrow.cc

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2965:
--
Labels: pull-request-available  (was: )

> [Python] Possible uint64 overflow issues in python_to_arrow.cc
> --
>
> Key: ARROW-2965
> URL: https://issues.apache.org/jira/browse/ARROW-2965
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> There are some places, like the {{AppendScalar}} function, where UINT64 or 
> ULONGLONG is being cast to int64 without overflow checking



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1661) [Python] Python 3.7 support

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1661:
---
Description: 
Things to do for Python 3.7 (mostly depends on downstream):
||Task || Done||
|Pandas release (0.23.2)|(/)|
|manylinux1 container with Python 3.7|(/)|
|conda-forge python update (optional, can also use Anaconda version): 
https://github.com/conda-forge/python-feedstock/issues/177|(/)|
|conda-forge dependencies are built for Python 3.7 (and/or upstream manylinux1 
wheels), see 
https://github.com/conda-forge/conda-forge-enhancement-proposals/pull/10 |(x)|

See discussion in [https://github.com/apache/arrow/issues/1125]

  was:
Things to do for Python 3.7 (mostly depends on downstream):
||Task || Done||
|Pandas release (0.23.2)|(/)|
|manylinux1 container with Python 3.7|(/)|
|conda-forge python update (optional, can also use Anaconda version): 
https://github.com/conda-forge/python-feedstock/issues/177|(/)|
|conda-forge dependencies are built for Python 3.7 (and/or upstream manylinux1 
wheels) |(x)|
|more to come…| |

See discussion in [https://github.com/apache/arrow/issues/1125]


> [Python] Python 3.7 support
> ---
>
> Key: ARROW-1661
> URL: https://issues.apache.org/jira/browse/ARROW-1661
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Things to do for Python 3.7 (mostly depends on downstream):
> ||Task || Done||
> |Pandas release (0.23.2)|(/)|
> |manylinux1 container with Python 3.7|(/)|
> |conda-forge python update (optional, can also use Anaconda version): 
> https://github.com/conda-forge/python-feedstock/issues/177|(/)|
> |conda-forge dependencies are built for Python 3.7 (and/or upstream 
> manylinux1 wheels), see 
> https://github.com/conda-forge/conda-forge-enhancement-proposals/pull/10 |(x)|
> See discussion in [https://github.com/apache/arrow/issues/1125]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1661) [Python] Python 3.7 support

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-1661:
--

Assignee: Uwe L. Korn

> [Python] Python 3.7 support
> ---
>
> Key: ARROW-1661
> URL: https://issues.apache.org/jira/browse/ARROW-1661
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Things to do for Python 3.7 (mostly depends on downstream):
> ||Task || Done||
> |Pandas release (0.23.2)|(/)|
> |manylinux1 container with Python 3.7|(/)|
> |conda-forge python update (optional, can also use Anaconda version): 
> https://github.com/conda-forge/python-feedstock/issues/177|(/)|
> |conda-forge dependencies are built for Python 3.7 (and/or upstream 
> manylinux1 wheels), see 
> https://github.com/conda-forge/conda-forge-enhancement-proposals/pull/10 |(x)|
> See discussion in [https://github.com/apache/arrow/issues/1125]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1558) [C++] Implement boolean selection kernels

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-1558:
--

Assignee: Uwe L. Korn

> [C++] Implement boolean selection kernels
> -
>
> Key: ARROW-1558
> URL: https://issues.apache.org/jira/browse/ARROW-1558
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: Analytics
> Fix For: 0.11.0
>
>
> Select values where a boolean selection array is true. As a default, if any 
> values in the selection are null, then values in the output array should be 
> null. 
> The null behaviour does not need to be toggable, if the user wants to select 
> nothing in the case of null, then it is necessary to call 
> selection_array.fillna(false) first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1558) [C++] Implement boolean selection kernels

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1558:
---
Description: 
Select values where a boolean selection array is true. As a default, if any 
values in the selection are null, then values in the output array should be 
null. 

The null behaviour does not need to be toggable, if the user wants to select 
nothing in the case of null, then it is necessary to call 
selection_array.fillna(false) first.

  was:Select values where a boolean selection array is true. As a default, if 
any values in the selection are null, then values in the output array should be 
null. The null behaviour should be toggable as this may vary on the use case.


> [C++] Implement boolean selection kernels
> -
>
> Key: ARROW-1558
> URL: https://issues.apache.org/jira/browse/ARROW-1558
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
> Fix For: 0.11.0
>
>
> Select values where a boolean selection array is true. As a default, if any 
> values in the selection are null, then values in the output array should be 
> null. 
> The null behaviour does not need to be toggable, if the user wants to select 
> nothing in the case of null, then it is necessary to call 
> selection_array.fillna(false) first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1558) [C++] Implement boolean selection kernels

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1558:
---
Fix Version/s: 0.11.0

> [C++] Implement boolean selection kernels
> -
>
> Key: ARROW-1558
> URL: https://issues.apache.org/jira/browse/ARROW-1558
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: Analytics
> Fix For: 0.11.0
>
>
> Select values where a boolean selection array is true. As a default, if any 
> values in the selection are null, then values in the output array should be 
> null. 
> The null behaviour does not need to be toggable, if the user wants to select 
> nothing in the case of null, then it is necessary to call 
> selection_array.fillna(false) first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1563) [C++] Implement logical unary and binary kernels for boolean arrays

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1563:
--
Labels: Analytics pull-request-available  (was: Analytics)

> [C++] Implement logical unary and binary kernels for boolean arrays
> ---
>
> Key: ARROW-1563
> URL: https://issues.apache.org/jira/browse/ARROW-1563
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: Analytics, pull-request-available
> Fix For: 0.11.0
>
>
> And, or, not (negate), xor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1470) [C++] Add BufferAllocator abstract interface

2018-08-22 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588807#comment-16588807
 ] 

Wes McKinney commented on ARROW-1470:
-

I'd like to take a look at this for 0.11

> [C++] Add BufferAllocator abstract interface
> 
>
> Key: ARROW-1470
> URL: https://issues.apache.org/jira/browse/ARROW-1470
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> There are some situations ({{arrow::ipc::SerializeRecordBatch}} where we pass 
> a {{MemoryPool*}} solely to call {{AllocateBuffer}} using it. This is not as 
> flexible as it could be, since there are situation where we may wish to 
> allocate from shared memory instead. 
> So instead:
> {code}
> Func(..., BufferAllocator* allocator, ...) {
>   ...
>   std::shared_ptr buffer;
>   RETURN_NOT_OK(allocator->Allocate(nbytes, ));
>   ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1470) [C++] Add BufferAllocator abstract interface

2018-08-22 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588805#comment-16588805
 ] 

Wes McKinney commented on ARROW-1470:
-

Yes, these APIs assume that we are allocating CPU memory:

https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer.cc#L164

That detail is sort of "baked in". A BufferAllocator abstract interface would 
allow other kinds of buffers to be returned

> [C++] Add BufferAllocator abstract interface
> 
>
> Key: ARROW-1470
> URL: https://issues.apache.org/jira/browse/ARROW-1470
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> There are some situations ({{arrow::ipc::SerializeRecordBatch}} where we pass 
> a {{MemoryPool*}} solely to call {{AllocateBuffer}} using it. This is not as 
> flexible as it could be, since there are situation where we may wish to 
> allocate from shared memory instead. 
> So instead:
> {code}
> Func(..., BufferAllocator* allocator, ...) {
>   ...
>   std::shared_ptr buffer;
>   RETURN_NOT_OK(allocator->Allocate(nbytes, ));
>   ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1470) [C++] Add BufferAllocator abstract interface

2018-08-22 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1470:

Fix Version/s: (was: 0.12.0)
   0.11.0

> [C++] Add BufferAllocator abstract interface
> 
>
> Key: ARROW-1470
> URL: https://issues.apache.org/jira/browse/ARROW-1470
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> There are some situations ({{arrow::ipc::SerializeRecordBatch}} where we pass 
> a {{MemoryPool*}} solely to call {{AllocateBuffer}} using it. This is not as 
> flexible as it could be, since there are situation where we may wish to 
> allocate from shared memory instead. 
> So instead:
> {code}
> Func(..., BufferAllocator* allocator, ...) {
>   ...
>   std::shared_ptr buffer;
>   RETURN_NOT_OK(allocator->Allocate(nbytes, ));
>   ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3109) [Python] Add Python 3.7 virtualenvs to manylinux1 container

2018-08-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3109:
--
Labels: pull-request-available  (was: )

> [Python] Add Python 3.7 virtualenvs to manylinux1 container
> ---
>
> Key: ARROW-3109
> URL: https://issues.apache.org/jira/browse/ARROW-3109
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3109) [Python] Add Python 3.7 virtualenvs to manylinux1 container

2018-08-22 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-3109:
--

 Summary: [Python] Add Python 3.7 virtualenvs to manylinux1 
container
 Key: ARROW-3109
 URL: https://issues.apache.org/jira/browse/ARROW-3109
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.11.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3099) [C++] Add benchmark for number parsing

2018-08-22 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-3099.
---
   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2456
[https://github.com/apache/arrow/pull/2456]

> [C++] Add benchmark for number parsing
> --
>
> Key: ARROW-3099
> URL: https://issues.apache.org/jira/browse/ARROW-3099
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Number parsing will become important once we have a CSV reader (or possibly 
> other text-based formats). We should add benchmarks for the internal 
> conversion routines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1

2018-08-22 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588313#comment-16588313
 ] 

Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 8:48 AM:


The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.

{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)
{code}

With changes as in 
https://github.com/lbartnik/arrow/commit/089153d518c081d7b9c1b3fb839463bca9ac1a35
 I can reduce warnings to the one below. Looking at the code it's not clear if 
CreateObject() should be paired with a delete operation of if there is an 
internal pool/tracking mechanism.

{code}
pyarrow/tests/test_plasma.py::TestPlasmaClient::test_put_and_get command:  
valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all 
--leak-check-heuristics=stdstring --error-exitcode=1 
/io/arrow/python/pyarrow/plasma_store_server -s 
/tmp/test_plasma-k6wtcvi4/plasma.sock -m 1
==575== Memcheck, a memory error detector
==575== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==575== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==575== Command: /io/arrow/python/pyarrow/plasma_store_server -s 
/tmp/test_plasma-k6wtcvi4/plasma.sock -m 1
==575== 
Allowing the Plasma store to use up to 0.1GB of memory.
Starting object store with directory /dev/shm and huge page support disabled
PASSED==575== 
==575== HEAP SUMMARY:
==575== in use at exit: 552 bytes in 1 blocks
==575==   total heap usage: 178 allocs, 177 frees, 143,037 bytes allocated
==575== 
==575== 552 bytes in 1 blocks are still reachable in loss record 1 of 1
==575==at 0x4C2FB0F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==575==by 0x567F5F7: fdopen@@GLIBC_2.2.5 (iofdopen.c:122)
==575==by 0x1BD47F: create_buffer(long) (malloc.cc:105)
==575==by 0x1BFF17: fake_mmap (malloc.cc:135)
==575==by 0x1C077B: sys_alloc (dlmalloc.c:4155)
==575==by 0x1C077B: dlmalloc (dlmalloc.c:4680)
==575==by 0x1C2850: internal_memalign.constprop.98 (dlmalloc.c:4917)
==575==by 0x19391A: plasma::PlasmaStore::CreateObject(plasma::UniqueID 
const&, long, long, int, plasma::Client*, plasma::PlasmaObject*) (store.cc:178)
==575==by 0x197337: plasma::PlasmaStore::ProcessMessage(plasma::Client*) 
(store.cc:740)
==575==by 0x195E02: 
plasma::PlasmaStore::ConnectClient(int)::{lambda(int)#1}::operator()(int) const 
(store.cc:544)
==575==by 0x19927B: std::_Function_handler::_M_invoke(std::_Any_data
 const&, int&&) (std_function.h:297)
==575==by 0x1B75FD: std::function::operator()(int) const 
(std_function.h:687)
==575==by 0x1B6F4E: plasma::EventLoop::FileEventCallback(aeEventLoop*, int, 
void*, int) (events.cc:28)
==575== 
==575== LEAK SUMMARY:
==575==definitely lost: 0 bytes in 0 blocks
==575==indirectly lost: 0 bytes in 0 blocks
==575==  possibly lost: 0 bytes in 0 blocks
==575==still reachable: 552 bytes in 1 blocks
==575== suppressed: 0 bytes in 0 blocks
==575== 
==575== For counts of detected and suppressed errors, rerun with: -v
==575== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
{code}


was (Author: lbartnik):
The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.

{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: 

[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1

2018-08-22 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588313#comment-16588313
 ] 

Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 8:45 AM:


The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.

{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)
{code}

With changes as in I can reduce warnings to the one below. Looking at the code 
it's not clear if CreateObject() should be paired with a delete operation of if 
there is an internal pool/tracking mechanism.

{code}
pyarrow/tests/test_plasma.py::TestPlasmaClient::test_put_and_get command:  
valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all 
--leak-check-heuristics=stdstring --error-exitcode=1 
/io/arrow/python/pyarrow/plasma_store_server -s 
/tmp/test_plasma-k6wtcvi4/plasma.sock -m 1
==575== Memcheck, a memory error detector
==575== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==575== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==575== Command: /io/arrow/python/pyarrow/plasma_store_server -s 
/tmp/test_plasma-k6wtcvi4/plasma.sock -m 1
==575== 
Allowing the Plasma store to use up to 0.1GB of memory.
Starting object store with directory /dev/shm and huge page support disabled
PASSED==575== 
==575== HEAP SUMMARY:
==575== in use at exit: 552 bytes in 1 blocks
==575==   total heap usage: 178 allocs, 177 frees, 143,037 bytes allocated
==575== 
==575== 552 bytes in 1 blocks are still reachable in loss record 1 of 1
==575==at 0x4C2FB0F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==575==by 0x567F5F7: fdopen@@GLIBC_2.2.5 (iofdopen.c:122)
==575==by 0x1BD47F: create_buffer(long) (malloc.cc:105)
==575==by 0x1BFF17: fake_mmap (malloc.cc:135)
==575==by 0x1C077B: sys_alloc (dlmalloc.c:4155)
==575==by 0x1C077B: dlmalloc (dlmalloc.c:4680)
==575==by 0x1C2850: internal_memalign.constprop.98 (dlmalloc.c:4917)
==575==by 0x19391A: plasma::PlasmaStore::CreateObject(plasma::UniqueID 
const&, long, long, int, plasma::Client*, plasma::PlasmaObject*) (store.cc:178)
==575==by 0x197337: plasma::PlasmaStore::ProcessMessage(plasma::Client*) 
(store.cc:740)
==575==by 0x195E02: 
plasma::PlasmaStore::ConnectClient(int)::{lambda(int)#1}::operator()(int) const 
(store.cc:544)
==575==by 0x19927B: std::_Function_handler::_M_invoke(std::_Any_data
 const&, int&&) (std_function.h:297)
==575==by 0x1B75FD: std::function::operator()(int) const 
(std_function.h:687)
==575==by 0x1B6F4E: plasma::EventLoop::FileEventCallback(aeEventLoop*, int, 
void*, int) (events.cc:28)
==575== 
==575== LEAK SUMMARY:
==575==definitely lost: 0 bytes in 0 blocks
==575==indirectly lost: 0 bytes in 0 blocks
==575==  possibly lost: 0 bytes in 0 blocks
==575==still reachable: 552 bytes in 1 blocks
==575== suppressed: 0 bytes in 0 blocks
==575== 
==575== For counts of detected and suppressed errors, rerun with: -v
==575== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
{code}


was (Author: lbartnik):
The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.
{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 

[jira] [Created] (ARROW-3108) [C++] arrow::PrettyPrint for Table instances

2018-08-22 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-3108:
--

 Summary: [C++] arrow::PrettyPrint for Table instances
 Key: ARROW-3108
 URL: https://issues.apache.org/jira/browse/ARROW-3108
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Affects Versions: 0.10.0
Reporter: Uwe L. Korn
 Fix For: 0.12.0


Extend the {{arrow::PrettyPrint}} functionality to also support 
{{arrow::Table}} instances in addition to {{RecordBatch}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3107) [C++] arrow::PrettyPrint for Column instances

2018-08-22 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-3107:
--

 Summary: [C++] arrow::PrettyPrint for Column instances
 Key: ARROW-3107
 URL: https://issues.apache.org/jira/browse/ARROW-3107
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Affects Versions: 0.10.0
Reporter: Uwe L. Korn
 Fix For: 0.12.0


Currently, we support {{arrow::ChunkedArray}} instances in {{PrettyPrint}}. We 
should also support columns. The main addition will be here that will also 
print the specified field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2243) [C++] Enable IPO/LTO

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2243:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [C++] Enable IPO/LTO
> 
>
> Key: ARROW-2243
> URL: https://issues.apache.org/jira/browse/ARROW-2243
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Minor
> Fix For: 0.13.0
>
>
> We should enable interprocedural/link-time optimization. CMake >= 3.9.4 
> supports a generic way of doing this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2885) [C++] Right-justify array values in PrettyPrint

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2885:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [C++] Right-justify array values in PrettyPrint
> ---
>
> Key: ARROW-2885
> URL: https://issues.apache.org/jira/browse/ARROW-2885
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 0.13.0
>
>
> Currently we the output of {{PrettyPrint}} for an array looks as follows:
> {code}
> [
>   1,
>   NA
> ]
> {code}
> We should right-justify it for better readability:
> {code}
> [
>1,
>   NA
> ]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2646) [Python] Pandas roundtrip for date objects

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2646:
--

Assignee: Uwe L. Korn

> [Python] Pandas roundtrip for date objects
> --
>
> Key: ARROW-2646
> URL: https://issues.apache.org/jira/browse/ARROW-2646
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Florian Jetter
>Assignee: Uwe L. Korn
>Priority: Minor
> Fix For: 0.11.0
>
>
> Arrow currently casts date objects to nanosecond precision datetime objects. 
> I'd like to have a way to preserve the type during a roundtrip
> {code}
> >>> import pandas as pd
> >>> import pyarrow as pa
> >>> import datetime
> >>> pa.date32().to_pandas_dtype()
> dtype(' >>> df = pd.DataFrame({'date': [datetime.date(2018, 1, 1)]})
> >>> df.dtypes
> date object
> dtype: object
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas()
> >>> df_rountrip.dtypes
> datedatetime64[ns]
> dtype: object
> {code}
> I'd expect something like this to work:
> {code}
> >>> import pandas.testing as pdt
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas(date_as_object=True)
> >>> pdt.assert_frame_equal(df_rountrip, df)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2610) [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2610:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm
> ---
>
> Key: ARROW-2610
> URL: https://issues.apache.org/jira/browse/ARROW-2610
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> The DictionaryType is a bit more complex as it also references the dictionary 
> values itself. This also needs to be integrated into 
> {{pyarrow.Field.from_jvm}} but the work to make DictionaryType working maybe 
> also depends on that {{pyarrow.Array.from_jvm}} first supports non-primitive 
> arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2606) [Java/Python]  Add unit test for pyarrow.decimal128 in Array.from_jvm

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2606:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Java/Python]  Add unit test for pyarrow.decimal128 in Array.from_jvm
> -
>
> Key: ARROW-2606
> URL: https://issues.apache.org/jira/browse/ARROW-2606
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> Follow-up after https://issues.apache.org/jira/browse/ARROW-2249. We need to 
> find the correct code to construct Java decimals and fill them into a 
> {{DecimalVector}}. Afterwards, we should activate the decimal128 type on 
> {{test_jvm_array}} and ensure that we load them correctly from Java into 
> Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2610) [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2610:
--

Assignee: Uwe L. Korn

> [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm
> ---
>
> Key: ARROW-2610
> URL: https://issues.apache.org/jira/browse/ARROW-2610
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> The DictionaryType is a bit more complex as it also references the dictionary 
> values itself. This also needs to be integrated into 
> {{pyarrow.Field.from_jvm}} but the work to make DictionaryType working maybe 
> also depends on that {{pyarrow.Array.from_jvm}} first supports non-primitive 
> arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2652) [C++/Python] Document how to provide information on segfaults

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2652:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [C++/Python] Document how to provide information on segfaults
> -
>
> Key: ARROW-2652
> URL: https://issues.apache.org/jira/browse/ARROW-2652
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> We often have users that report segmentation faults in {{pyarrow}}. This will 
> sadly keep reappearing as we also don't have the magical ability of writing 
> 100%-bug-free code. Thus we should have a small section in our documentation 
> on how people can give us the relevant information in the case of a 
> segmentation fault. Preferably the documentation covers {{gdb}} and {{lldb}}. 
> They both have similar commands but differ in some minor flags.
> For one of the example comments I gave to a user in tickets see 
> https://github.com/apache/arrow/issues/2089#issuecomment-393477116



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2606) [Java/Python]  Add unit test for pyarrow.decimal128 in Array.from_jvm

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2606:
--

Assignee: Uwe L. Korn

> [Java/Python]  Add unit test for pyarrow.decimal128 in Array.from_jvm
> -
>
> Key: ARROW-2606
> URL: https://issues.apache.org/jira/browse/ARROW-2606
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> Follow-up after https://issues.apache.org/jira/browse/ARROW-2249. We need to 
> find the correct code to construct Java decimals and fill them into a 
> {{DecimalVector}}. Afterwards, we should activate the decimal128 type on 
> {{test_jvm_array}} and ensure that we load them correctly from Java into 
> Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2651) [Python] Build & Test with PyPy

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2651:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [Python] Build & Test with PyPy
> ---
>
> Key: ARROW-2651
> URL: https://issues.apache.org/jira/browse/ARROW-2651
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: outline_for_beginners
> Fix For: 0.13.0
>
>
> At the moment, we only build with CPython in our CI matrix and only do 
> releases for it. As reported in 
> https://github.com/apache/arrow/issues/2089#issuecomment-393126040 not 
> everything is working yet. This may either be due to problems on our or 
> PyPy's side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2609) [Java/Python] Complex type conversion in pyarrow.Field.from_jvm

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2609:
--

Assignee: Uwe L. Korn

> [Java/Python] Complex type conversion in pyarrow.Field.from_jvm
> ---
>
> Key: ARROW-2609
> URL: https://issues.apache.org/jira/browse/ARROW-2609
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> The converter {{pyarrow.Field.from_jvm}} currently only works for primitive 
> types. Types like List, Struct or Union that have children in their 
> definition are not supported. We should add the needed recursion for these 
> types and enable the respective tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2609) [Java/Python] Complex type conversion in pyarrow.Field.from_jvm

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2609:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Java/Python] Complex type conversion in pyarrow.Field.from_jvm
> ---
>
> Key: ARROW-2609
> URL: https://issues.apache.org/jira/browse/ARROW-2609
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> The converter {{pyarrow.Field.from_jvm}} currently only works for primitive 
> types. Types like List, Struct or Union that have children in their 
> definition are not supported. We should add the needed recursion for these 
> types and enable the respective tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2607) [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2607:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm
> ---
>
> Key: ARROW-2607
> URL: https://issues.apache.org/jira/browse/ARROW-2607
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> Follow-up after https://issues.apache.org/jira/browse/ARROW-2249: Currently 
> only primitive arrays are supported in {{pyarrow.Array.from_jvm}} as it uses 
> {{pyarrow.Array.from_buffers}} underneath. We should extend one of the two 
> functions to be able to deal with string arrays. There is a currently failing 
> unit test {{test_jvm_string_array}} in {{pyarrow/tests/test_jvm.py}} to 
> verify the implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2607) [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2607:
--

Assignee: Uwe L. Korn

> [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm
> ---
>
> Key: ARROW-2607
> URL: https://issues.apache.org/jira/browse/ARROW-2607
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> Follow-up after https://issues.apache.org/jira/browse/ARROW-2249: Currently 
> only primitive arrays are supported in {{pyarrow.Array.from_jvm}} as it uses 
> {{pyarrow.Array.from_buffers}} underneath. We should extend one of the two 
> functions to be able to deal with string arrays. There is a currently failing 
> unit test {{test_jvm_string_array}} in {{pyarrow/tests/test_jvm.py}} to 
> verify the implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2605) [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2605:
--

Assignee: Uwe L. Korn

> [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm
> -
>
> Key: ARROW-2605
> URL: https://issues.apache.org/jira/browse/ARROW-2605
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> Follow-up after https://issues.apache.org/jira/browse/ARROW-2249 as we are 
> missing the necessary methods to construct these arrays conveniently on the 
> Python side.
> Once there is a path to construct {{pyarrow.Array}} instances from a Python 
> list of {{datetime.time}} for the various time types, we should activate the 
> time types on {{test_jvm_array}} and ensure that we load them correctly from 
> Java into Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2555) [Python] Provide an option to convert on coerce_timestamps instead of error

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2555:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [Python] Provide an option to convert on coerce_timestamps instead of error
> ---
>
> Key: ARROW-2555
> URL: https://issues.apache.org/jira/browse/ARROW-2555
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> At the moment, we error out on {{coerce_timestamps='ms'}} on 
> {{pyarrow.parquet.write_table}} if the data contains a timestamp that would 
> loose information when converted to milliseconds. In a lot of cases the user 
> does not care about this granularity and rather wants the comfort 
> functionality that the timestamp are stored regardlessly in Parquet. Thus we 
> should provide an option to ignore the error and do the lossy conversion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2605) [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2605:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm
> -
>
> Key: ARROW-2605
> URL: https://issues.apache.org/jira/browse/ARROW-2605
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> Follow-up after https://issues.apache.org/jira/browse/ARROW-2249 as we are 
> missing the necessary methods to construct these arrays conveniently on the 
> Python side.
> Once there is a path to construct {{pyarrow.Array}} instances from a Python 
> list of {{datetime.time}} for the various time types, we should activate the 
> time types on {{test_jvm_array}} and ensure that we load them correctly from 
> Java into Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2535) [C++/Python] Provide pre-commit hooks that check flake8 et al.

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2535:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [C++/Python] Provide pre-commit hooks that check flake8 et al.
> --
>
> Key: ARROW-2535
> URL: https://issues.apache.org/jira/browse/ARROW-2535
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>
> We should provide pre-commit hooks that users can install (optionally) that 
> check e.g. flake8 and clang-format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2352) [C++/Python] Test OSX packaging in Travis matrix

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2352.

Resolution: Won't Fix

> [C++/Python] Test OSX packaging in Travis matrix
> 
>
> Key: ARROW-2352
> URL: https://issues.apache.org/jira/browse/ARROW-2352
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 0.11.0
>
>
> At the moment, we only test the conda based build in Travis but we also ship 
> binary wheels after the release. The process of building them is currently 
> part of the {{arrow-dist}} repository and uses the {{multibuild}} scripts 
> that are used for many other Python packages that also have native code.
> The code should be ported to run as a real CI job, i.e. in addition to just 
> packaging the code, we will also need to run the unit tests. Furthermore, 
> once the job is running and green, we also need to look at the runtimes as we 
> already have a quite packed CI matrix and we expect that many steps of the 
> wheel build are just to setup the environment. We should be able to cache 
> them.
> Maybe we want to do this as a nightly cron. For a first draft, it will be ok 
> to add it to the full matrix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2352) [C++/Python] Test OSX packaging in Travis matrix

2018-08-22 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588514#comment-16588514
 ] 

Uwe L. Korn commented on ARROW-2352:


Won't fix as we have crossbow now.

> [C++/Python] Test OSX packaging in Travis matrix
> 
>
> Key: ARROW-2352
> URL: https://issues.apache.org/jira/browse/ARROW-2352
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: beginner
> Fix For: 0.11.0
>
>
> At the moment, we only test the conda based build in Travis but we also ship 
> binary wheels after the release. The process of building them is currently 
> part of the {{arrow-dist}} repository and uses the {{multibuild}} scripts 
> that are used for many other Python packages that also have native code.
> The code should be ported to run as a real CI job, i.e. in addition to just 
> packaging the code, we will also need to run the unit tests. Furthermore, 
> once the job is running and green, we also need to look at the runtimes as we 
> already have a quite packed CI matrix and we expect that many steps of the 
> wheel build are just to setup the environment. We should be able to cache 
> them.
> Maybe we want to do this as a nightly cron. For a first draft, it will be ok 
> to add it to the full matrix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1570) [C++] Define API for creating a kernel instance from function of scalar input and output with a particular signature

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1570:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [C++] Define API for creating a kernel instance from function of scalar input 
> and output with a particular signature
> 
>
> Key: ARROW-1570
> URL: https://issues.apache.org/jira/browse/ARROW-1570
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
> Fix For: 0.12.0
>
>
> This could include an {{std::function}} instance (but these cannot be inlined 
> by the C++ compiler), but should also permit use with inline-able functions 
> or functors



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1572) [C++] Implement "value counts" kernels for tabulating value frequencies

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1572:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [C++] Implement "value counts" kernels for tabulating value frequencies
> ---
>
> Key: ARROW-1572
> URL: https://issues.apache.org/jira/browse/ARROW-1572
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: Analytics, pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is related to "match", "isin", and "unique" since hashing is generally 
> required



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1636) Integration tests for null type

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1636:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> Integration tests for null type
> ---
>
> Key: ARROW-1636
> URL: https://issues.apache.org/jira/browse/ARROW-1636
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Java
>Reporter: Wes McKinney
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.13.0
>
>
> This was not implemented on the C++ side, and came up in ARROW-1584. 
> Realistically arrays may be of null type, and we should be able to message 
> these correctly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1628) [Python] Incorrect serialization of numpy datetimes.

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1628:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Python] Incorrect serialization of numpy datetimes.
> 
>
> Key: ARROW-1628
> URL: https://issues.apache.org/jira/browse/ARROW-1628
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Major
> Fix For: 0.12.0
>
>
> See https://github.com/ray-project/ray/issues/1041.
> The issue can be reproduced as follows.
> {code}
> import pyarrow as pa
> import numpy as np
> t = np.datetime64(datetime.datetime.now())
> print(type(t), t)  #  2017-09-30T09:50:46.089952
> t_new = pa.deserialize(pa.serialize(t).to_buffer())
> print(type(t_new), t_new)  #  0
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1587) [Format] Add metadata for user-defined logical types

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1587:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [Format] Add metadata for user-defined logical types
> 
>
> Key: ARROW-1587
> URL: https://issues.apache.org/jira/browse/ARROW-1587
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> While we have the custom_metadata field at the Field level, it may be useful 
> to have a proper user-defined type metadata in the `Type` union, which would 
> allow us to provide a physical representation type (e.g. "Latitude longitude 
> is represented by a struct, whose children consist of two doubles") from the 
> other non-user defined types.
> This is more flexible than {{custom_metadata}} because we can leverage 
> existing structure in the Flatbuffers for describing the user type
> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L285



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1

2018-08-22 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588313#comment-16588313
 ] 

Lukasz Bartnik edited comment on ARROW-1380 at 8/22/18 7:52 AM:


The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.
{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)
{code}
 

I see that SIGTERM comes from Python: "Ensure Valgrind and/or coverage have a 
clean exit". Does it make sense to set an exit flag in the signal handler and 
then let the event loop exit on its own in the main call stack?


was (Author: lbartnik):
The first of these warnings could be probably addressed by not calling exit(0) 
from the signal handler. My impression is that after a signal is caught and 
exit() is called, main() never returns, and thus destructors for its local 
objects are not called. Below is the valgrind warning in question.
{code:java}
==1990== 33 bytes in 1 blocks are still reachable in loss record 1 of 2
==1990== at 0x4C3017F: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1990== by 0x513088C: std::string::_Rep::_S_create(unsigned long, unsigned 
long, std::allocator const&) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5130C55: std::string::_M_mutate(unsigned long, unsigned long, 
unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x5131321: std::string::_M_replace_safe(unsigned long, unsigned 
long, char const*, unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1990== by 0x198A23: main (store.cc:937)
{code}
 

I tried simply commenting exit() out but that leads to other errors and I 
assume is not intended. I don't see much other signal handling in plasma and my 
current guess is that it is ae that gets interrupted and then drops the event 
loop.

Why is there even a SIGTERM in the first place? Where does it come from?

I'd be grateful for comments and/or pointers to relevant areas in the code.

> [C++] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
> 
>
> Key: ARROW-1380
> URL: https://issues.apache.org/jira/browse/ARROW-1380
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
> Attachments: LastTest.log, valgrind.supp_
>
>
> I thought I fixed this, but they seem to have recurred:
> https://travis-ci.org/apache/arrow/jobs/266421430#L5220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1572) [C++] Implement "value counts" kernels for tabulating value frequencies

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-1572:
--

Assignee: Uwe L. Korn

> [C++] Implement "value counts" kernels for tabulating value frequencies
> ---
>
> Key: ARROW-1572
> URL: https://issues.apache.org/jira/browse/ARROW-1572
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: Analytics, pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is related to "match", "isin", and "unique" since hashing is generally 
> required



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1638) [Java] IPC roundtrip for null type

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1638:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [Java] IPC roundtrip for null type
> --
>
> Key: ARROW-1638
> URL: https://issues.apache.org/jira/browse/ARROW-1638
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Wes McKinney
>Assignee: Siddharth Teotia
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1599) [Python] Unable to read Parquet files with list inside struct

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1599:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [Python] Unable to read Parquet files with list inside struct
> -
>
> Key: ARROW-1599
> URL: https://issues.apache.org/jira/browse/ARROW-1599
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.0
> Environment: Ubuntu
>Reporter: Jovann Kung
>Assignee: Joshua Storck
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> Is PyArrow currently unable to read in Parquet files with a vector as a 
> column? For example, the schema of such a file is below:
> {{
> mbc: FLOAT
> deltae: FLOAT
> labels: FLOAT
> features.type: INT32 INT_8
> features.size: INT32
> features.indices.list.element: INT32
> features.values.list.element: DOUBLE}}
> Using either pq.read_table() or pq.ParquetDataset('/path/to/parquet').read() 
> yields the following error: ArrowNotImplementedError: Currently only nesting 
> with Lists is supported.
> From the error I assume that this may be implemented in further releases?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1644) [Python] Read and write nested Parquet data with a mix of struct and list nesting levels

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1644:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [Python] Read and write nested Parquet data with a mix of struct and list 
> nesting levels
> 
>
> Key: ARROW-1644
> URL: https://issues.apache.org/jira/browse/ARROW-1644
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: DB Tsai
>Assignee: Joshua Storck
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.13.0
>
>
> We have many nested parquet files generated from Apache Spark for ranking 
> problems, and we would like to load them in python for other programs to 
> consume. 
> The schema looks like 
> {code:java}
> root
>  |-- profile_id: long (nullable = true)
>  |-- country_iso_code: string (nullable = true)
>  |-- items: array (nullable = false)
>  ||-- element: struct (containsNull = false)
>  |||-- show_title_id: integer (nullable = true)
>  |||-- duration: double (nullable = true)
> {code}
> And when I tried to load it with nightly build pyarrow on Oct 4, 2017, I got 
> the following error.
> {code:python}
> Python 3.6.2 |Anaconda, Inc.| (default, Sep 30 2017, 18:42:57) 
> [GCC 7.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import numpy as np
> >>> import pandas as pd
> >>> import pyarrow as pa
> >>> import pyarrow.parquet as pq
> >>> table2 = pq.read_table('part-0')
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/home/dbt/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py", 
> line 823, in read_table
> use_pandas_metadata=use_pandas_metadata)
>   File "/home/dbt/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py", 
> line 119, in read
> nthreads=nthreads)
>   File "_parquet.pyx", line 466, in pyarrow._parquet.ParquetReader.read_all
>   File "error.pxi", line 85, in pyarrow.lib.check_status
> pyarrow.lib.ArrowNotImplementedError: lists with structs are not supported.
> {code}
> I somehow get the impression that after 
> https://issues.apache.org/jira/browse/PARQUET-911 is merged, we should be 
> able to load the nested parquet in pyarrow. 
> Any insight about this? 
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1688) [Java] Fail build on checkstyle warnings

2018-08-22 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588498#comment-16588498
 ] 

Uwe L. Korn commented on ARROW-1688:


Any chance to get this into 0.11? [~siddteotia] [~bryanc]?

> [Java] Fail build on checkstyle warnings
> 
>
> Key: ARROW-1688
> URL: https://issues.apache.org/jira/browse/ARROW-1688
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Assignee: Siddharth Teotia
>Priority: Major
> Fix For: 0.11.0
>
>
> see discussion in ARROW-1474



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1639) [Python] More efficient serialization for RangeIndex in serialize_pandas

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1639:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [Python] More efficient serialization for RangeIndex in serialize_pandas
> 
>
> Key: ARROW-1639
> URL: https://issues.apache.org/jira/browse/ARROW-1639
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1682) [Python] Add documentation / example for reading a directory of Parquet files on S3

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1682:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Python] Add documentation / example for reading a directory of Parquet files 
> on S3
> ---
>
> Key: ARROW-1682
> URL: https://issues.apache.org/jira/browse/ARROW-1682
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> Opened based on comment 
> https://github.com/apache/arrow/pull/916#issuecomment-337563492



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1692) [Python, Java] UnionArray round trip not working

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1692:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Python, Java] UnionArray round trip not working
> 
>
> Key: ARROW-1692
> URL: https://issues.apache.org/jira/browse/ARROW-1692
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.12.0
>
> Attachments: union_array.arrow
>
>
> I'm currently working on making pyarrow.serialization data available from the 
> Java side, one problem I was running into is that it seems the Java 
> implementation cannot read UnionArrays generated from C++. To make this 
> easily reproducible I created a clean Python implementation for creating 
> UnionArrays: https://github.com/apache/arrow/pull/1216
> The data is generated with the following script:
> {code}
> import pyarrow as pa
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 1, 1, 0], type='int8')
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> result = pa.UnionArray.from_arrays([binary, int64], types, value_offsets)
> batch = pa.RecordBatch.from_arrays([result], ["test"])
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> sink.close()
> b = sink.get_result()
> with open("union_array.arrow", "wb") as f:
> f.write(b)
> # Sanity check: Read the batch in again
> with open("union_array.arrow", "rb") as f:
> b = f.read()
> reader = pa.RecordBatchStreamReader(pa.BufferReader(b))
> batch = reader.read_next_batch()
> print("union array is", batch.column(0))
> {code}
> I attached the file generated by that script. Then when I run the following 
> code in Java:
> {code}
> RootAllocator allocator = new RootAllocator(10);
> ByteArrayInputStream in = new 
> ByteArrayInputStream(Files.readAllBytes(Paths.get("union_array.arrow")));
> ArrowStreamReader reader = new ArrowStreamReader(in, allocator);
> reader.loadNextBatch()
> {code}
> I get the following error:
> {code}
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field test: Union(Sparse, [22, 5])<0: Binary, 1: Int(64, true)>. error 
> message: can not truncate buffer to a larger size 7: 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#7:1)
> {code}
> It seems like Java is not picking up that the UnionArray is Dense instead of 
> Sparse. After changing the default in 
> java/vector/src/main/codegen/templates/UnionVector.java from Sparse to Dense, 
> I get this:
> {code}
> jshell> reader.getVectorSchemaRoot().getSchema()
> $9 ==> Schema [0])<: Int(64, true)>
> {code}
> but then reading doesn't work:
> {code}
> jshell> reader.loadNextBatch()
> |  java.lang.IllegalArgumentException thrown: Could not load buffers for 
> field list: Union(Dense, [1])<: Struct Int(64, true). error message: can not truncate buffer to a larger size 1: > 0
> |at VectorLoader.loadBuffers (VectorLoader.java:83)
> |at VectorLoader.load (VectorLoader.java:62)
> |at ArrowReader$1.visit (ArrowReader.java:125)
> |at ArrowReader$1.visit (ArrowReader.java:111)
> |at ArrowRecordBatch.accepts (ArrowRecordBatch.java:128)
> |at ArrowReader.loadNextBatch (ArrowReader.java:137)
> |at (#8:1)
> {code}
> Any help with this is appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1560) [C++] Kernel implementations for "match" function

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1560:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [C++] Kernel implementations for "match" function
> -
>
> Key: ARROW-1560
> URL: https://issues.apache.org/jira/browse/ARROW-1560
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
> Fix For: 0.12.0
>
>
> Match computes a position index array from an array values into a set of 
> categories
> {code}
> match(['a', 'b', 'a', null, 'b', 'a', 'b'], ['b', 'a'])
> return [1, 0, 1, null, 0, 1, 0]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1561) [C++] Kernel implementations for "isin" (set containment)

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-1561:
--

Assignee: Uwe L. Korn

> [C++] Kernel implementations for "isin" (set containment)
> -
>
> Key: ARROW-1561
> URL: https://issues.apache.org/jira/browse/ARROW-1561
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: Analytics
> Fix For: 0.11.0
>
>
> isin determines whether each element in the left array is contained in the 
> values in the right array. This function must handle the case where the right 
> array has nulls (so that null in the left array will return true)
> {code}
> isin(['a', 'b', null], ['a', 'c'])
> returns [true, false, null]
> isin(['a', 'b', null], ['a', 'c', null])
> returns [true, false, true]
> {code}
> May need an option to return false for null instead of null



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1470) [C++] Add BufferAllocator abstract interface

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1470:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [C++] Add BufferAllocator abstract interface
> 
>
> Key: ARROW-1470
> URL: https://issues.apache.org/jira/browse/ARROW-1470
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> There are some situations ({{arrow::ipc::SerializeRecordBatch}} where we pass 
> a {{MemoryPool*}} solely to call {{AllocateBuffer}} using it. This is not as 
> flexible as it could be, since there are situation where we may wish to 
> allocate from shared memory instead. 
> So instead:
> {code}
> Func(..., BufferAllocator* allocator, ...) {
>   ...
>   std::shared_ptr buffer;
>   RETURN_NOT_OK(allocator->Allocate(nbytes, ));
>   ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1509) [Python] Write serialized object as a stream of encapsulated IPC messages

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1509:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [Python] Write serialized object as a stream of encapsulated IPC messages
> -
>
> Key: ARROW-1509
> URL: https://issues.apache.org/jira/browse/ARROW-1509
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> The structure of the stream in {{arrow::py::WriteSerializedObject}} is 
> generated on an ad hoc basis -- the components of the stream would be easier 
> to manipulate if this were internally a generic stream of IPC messages. For 
> example, one would be able to examine only the union that represents the 
> structure of the serialized payload and leave the tensors undisturbed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1325) [R] Bootstrap R bindings subproject

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1325:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [R] Bootstrap R bindings subproject
> ---
>
> Key: ARROW-1325
> URL: https://issues.apache.org/jira/browse/ARROW-1325
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Clark Fitzgerald
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The R language was designed to perform "Columnar in memory analytics". R / 
> Arrow bindings would be useful for:
> * better compatibility between R and other languages / big data systems
> * chunk-based data parallelism
> * portable and efficient IO via Parquet
> R has a C++ interface so the natural way to write these bindings is to 
> leverage Arrow's C++ library as much as possible.
> Feather provides a starting point: 
> [https://github.com/wesm/feather/tree/master/R].
> This can serve as an umbrella JIRA for work on R related tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1176) [C++] Replace WrappedBinary with Tensorflow's StringPiece

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1176:
---
Fix Version/s: (was: 0.11.0)
   0.12.0

> [C++] Replace WrappedBinary with Tensorflow's StringPiece
> -
>
> Key: ARROW-1176
> URL: https://issues.apache.org/jira/browse/ARROW-1176
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.12.0
>
>
> Instead of using the very simple {{WrappedBinary}} class, we may want to use 
> Tensorflow's {{StringPiece}} to handle binary cells: 
> https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/lib/core/stringpiece.h



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1171) [C++] Segmentation faults on Fedora 24 with pyarrow-manylinux1 and self-compiled turbodbc

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-1171:
---
Fix Version/s: (was: 0.11.0)
   0.13.0

> [C++] Segmentation faults on Fedora 24 with pyarrow-manylinux1 and 
> self-compiled turbodbc
> -
>
> Key: ARROW-1171
> URL: https://issues.apache.org/jira/browse/ARROW-1171
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.4.1
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Original issue: https://github.com/blue-yonder/turbodbc/issues/102
> When using the {{pyarrow}} {{manylinux1}} Wheels to build Turbodbc on Fedora 
> 24, the {{turbodbc_arrow}} unittests segfault. The main environment attribute 
> here is that the compiler version used for building Turbodbc is newer than 
> the one used for Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-300) [Format] Add buffer compression option to IPC file format

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-300:
--
Fix Version/s: (was: 0.11.0)
   0.13.0

> [Format] Add buffer compression option to IPC file format
> -
>
> Key: ARROW-300
> URL: https://issues.apache.org/jira/browse/ARROW-300
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> It may be useful if data is to be sent over the wire to compress the data 
> buffers themselves as their being written in the file layout.
> I would propose that we keep this extremely simple with a global buffer 
> compression setting in the file Footer. Probably only two compressors worth 
> supporting out of the box would be zlib (higher compression ratios) and lz4 
> (better performance).
> What does everyone think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-352) [Format] Interval(DAY_TIME) has no unit

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-352:
--
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Format] Interval(DAY_TIME) has no unit
> ---
>
> Key: ARROW-352
> URL: https://issues.apache.org/jira/browse/ARROW-352
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Reporter: Julien Le Dem
>Assignee: Wes McKinney
>Priority: Major
>  Labels: columnar-format-1.0
> Fix For: 0.12.0
>
>
> Interval(DATE_TIME) assumes milliseconds.
> we should have a time unit like timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-488) [Python] Implement conversion between integer coded as floating points with NaN to an Arrow integer type

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-488:
--
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Python] Implement conversion between integer coded as floating points with 
> NaN to an Arrow integer type
> 
>
> Key: ARROW-488
> URL: https://issues.apache.org/jira/browse/ARROW-488
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
> Fix For: 0.12.0
>
>
> For example: if pandas has casted integer data to float, this would enable 
> the integer data to be recovered (so long as the values fall in the ~2^53 
> floating point range for exact integer representation)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-473:
--
Fix Version/s: (was: 0.11.0)
   0.13.0

> [C++/Python] Add public API for retrieving block locations for a particular 
> HDFS file
> -
>
> Key: ARROW-473
> URL: https://issues.apache.org/jira/browse/ARROW-473
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> This is necessary for applications looking to schedule data-local work. 
> libhdfs does not have APIs to request the block locations directly, so we 
> need to see if the {{hdfsGetHosts}} function will do what we need. For 
> libhdfs3 there is a public API function 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-412) [Format] Handling of buffer padding in the IPC metadata

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-412:
--
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Format] Handling of buffer padding in the IPC metadata
> ---
>
> Key: ARROW-412
> URL: https://issues.apache.org/jira/browse/ARROW-412
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> See discussion in ARROW-399. Do we include padding bytes in the metadata or 
> set the actual used bytes? In the latter case, the padding would be a part of 
> the format (any buffers continue to be expected to be 64-byte padded, to 
> permit AVX512 instructions)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-973) [Website] Add FAQ page about project

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-973:
--
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Website] Add FAQ page about project
> 
>
> Key: ARROW-973
> URL: https://issues.apache.org/jira/browse/ARROW-973
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Website
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> As some suggested initial topics for the FAQ:
> * How Apache Arrow is related to Apache Parquet (the difference between a 
> "storage format" and an "in-memory format" causes confusion)
> * How is Arrow similar to / different from Flatbuffers and Cap'n Proto
> * How Arrow uses Flatbuffers (I have had people incorrectly state to me 
> things like "Arrow is just Flatbuffers under the hood")
> Any other ideas?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-501) [C++] Implement concurrent / buffering InputStream for streaming data use cases

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-501:
--
Fix Version/s: (was: 0.11.0)
   0.13.0

> [C++] Implement concurrent / buffering InputStream for streaming data use 
> cases
> ---
>
> Key: ARROW-501
> URL: https://issues.apache.org/jira/browse/ARROW-501
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Related to ARROW-500, when processing an input data stream, we may wish to 
> continue buffering input (up to an maximum buffer size) in between 
> synchronous Read calls



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-842) [Python] Handle more kinds of null sentinel objects from pandas 0.x

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-842:
--
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Python] Handle more kinds of null sentinel objects from pandas 0.x
> ---
>
> Key: ARROW-842
> URL: https://issues.apache.org/jira/browse/ARROW-842
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> Follow-on work to ARROW-707. See 
> https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/lib.pyx#L193 
> and discussion in https://github.com/apache/arrow/pull/554



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-549) [C++] Add function to concatenate like-typed arrays

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-549:
--
Fix Version/s: (was: 0.11.0)
   0.12.0

> [C++] Add function to concatenate like-typed arrays
> ---
>
> Key: ARROW-549
> URL: https://issues.apache.org/jira/browse/ARROW-549
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Panchen Xue
>Priority: Major
>  Labels: Analytics
> Fix For: 0.12.0
>
>
> A la 
> {{Status arrow::Concatenate(const std::vector>& 
> arrays, MemoryPool* pool, std::shared_ptr* out)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-554) [C++] Implement functions to conform unequal dictionaries amongst multiple Arrow arrays

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-554:
--
Fix Version/s: (was: 0.11.0)
   0.12.0

> [C++] Implement functions to conform unequal dictionaries amongst multiple 
> Arrow arrays
> ---
>
> Key: ARROW-554
> URL: https://issues.apache.org/jira/browse/ARROW-554
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
> Fix For: 0.12.0
>
>
> We may wish to either
> * Conform the dictionary indices to reference a common dictionary
> * Concatenate indices into a new array with a common dictionary
> This is related to in-memory dictionary encoding, as you start with a 
> partially-built dictionary and then add entries as you observe new ones in 
> other dictionaries, all the while "rebasing" indices to consistently 
> reference the same dictionary at the end



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-974) [Website] Add Use Cases section to the website

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-974:
--
Fix Version/s: (was: 0.11.0)
   0.12.0

> [Website] Add Use Cases section to the website
> --
>
> Key: ARROW-974
> URL: https://issues.apache.org/jira/browse/ARROW-974
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Website
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> This will contain a list of "canonical use cases" for Arrow:
> * In-memory data structure for vectorized analytics / SIMD, or creating a 
> column-oriented analytic database system
> * Reading and writing columnar storage formats like Apache Parquet
> * Faster alternative to Thrift, Protobuf, or Avro in RPC
> * Shared memory IPC (zero-copy in-situ analytics)
> Any other ideas?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-47) [C++] Consider adding a scalar type object model

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-47:


Assignee: Uwe L. Korn

> [C++] Consider adding a scalar type object model
> 
>
> Key: ARROW-47
> URL: https://issues.apache.org/jira/browse/ARROW-47
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: Analytics
> Fix For: 0.11.0
>
>
> Just did this on the Python side. In later analytics routines, passing in 
> scalar values (example: Array + Scalar) requires some kind of container. Some 
> systems, like the R language, solve this problem with length-1 arrays, but we 
> should do some analysis of use cases and figure out what will work best for 
> Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-45) [Python] Add unnest/flatten function for List types

2018-08-22 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-45:
-
Fix Version/s: (was: 0.11.0)
   0.13.0

> [Python] Add unnest/flatten function for List types
> ---
>
> Key: ARROW-45
> URL: https://issues.apache.org/jira/browse/ARROW-45
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)