[jira] [Commented] (ARROW-3772) [C++] Read Parquet dictionary encoded ColumnChunks directly into an Arrow DictionaryArray

2019-02-27 Thread Hatem Helal (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780207#comment-16780207
 ] 

Hatem Helal commented on ARROW-3772:


I'd like to take a stab at this after ARROW-3769

> [C++] Read Parquet dictionary encoded ColumnChunks directly into an Arrow 
> DictionaryArray
> -
>
> Key: ARROW-3772
> URL: https://issues.apache.org/jira/browse/ARROW-3772
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Stav Nir
>Assignee: Hatem Helal
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> Dictionary data is very common in parquet, in the current implementation 
> parquet-cpp decodes dictionary encoded data always before creating a plain 
> arrow array. This process is wasteful since we could use arrow's 
> DictionaryArray directly and achieve several benefits:
>  # Smaller memory footprint - both in the decoding process and in the 
> resulting arrow table - especially when the dict values are large
>  # Better decoding performance - mostly as a result of the first bullet - 
> less memory fetches and less allocations.
> I think those benefits could achieve significant improvements in runtime.
> My direction for the implementation is to read the indices (through the 
> DictionaryDecoder, after the RLE decoding) and values separately into 2 
> arrays and create a DictionaryArray using them.
> There are some questions to discuss:
>  # Should this be the default behavior for dictionary encoded data
>  # Should it be controlled with a parameter in the API
>  # What should be the policy in case some of the chunks are dictionary 
> encoded and some are not.
> I started implementing this but would like to hear your opinions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3772) [C++] Read Parquet dictionary encoded ColumnChunks directly into an Arrow DictionaryArray

2019-02-27 Thread Hatem Helal (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hatem Helal reassigned ARROW-3772:
--

Assignee: Hatem Helal

> [C++] Read Parquet dictionary encoded ColumnChunks directly into an Arrow 
> DictionaryArray
> -
>
> Key: ARROW-3772
> URL: https://issues.apache.org/jira/browse/ARROW-3772
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Stav Nir
>Assignee: Hatem Helal
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> Dictionary data is very common in parquet, in the current implementation 
> parquet-cpp decodes dictionary encoded data always before creating a plain 
> arrow array. This process is wasteful since we could use arrow's 
> DictionaryArray directly and achieve several benefits:
>  # Smaller memory footprint - both in the decoding process and in the 
> resulting arrow table - especially when the dict values are large
>  # Better decoding performance - mostly as a result of the first bullet - 
> less memory fetches and less allocations.
> I think those benefits could achieve significant improvements in runtime.
> My direction for the implementation is to read the indices (through the 
> DictionaryDecoder, after the RLE decoding) and values separately into 2 
> arrays and create a DictionaryArray using them.
> There are some questions to discuss:
>  # Should this be the default behavior for dictionary encoded data
>  # Should it be controlled with a parameter in the API
>  # What should be the policy in case some of the chunks are dictionary 
> encoded and some are not.
> I started implementing this but would like to hear your opinions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4711) [Plasma] enhance plasma client interfaces to work with multiple objects

2019-02-27 Thread Zhijun Fu (JIRA)
Zhijun Fu created ARROW-4711:


 Summary: [Plasma]  enhance plasma client interfaces to work with 
multiple objects
 Key: ARROW-4711
 URL: https://issues.apache.org/jira/browse/ARROW-4711
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Plasma
Reporter: Zhijun Fu


Right now the Delete() interface in plasma client supports deleting multiple 
objects in a single shot, so that we can save IPCs (inter-processing 
communication) between plasma clients and plasma store. This reduces latency 
for plasma clients, and also improve the actual throughput for plasma store.

I made a simple prototype for changing Release() function as well, when 
batching release 10 objects in a single IPC, it only takes about 1/6 of the 
time compared with using 10 separate IPCs. Also from profiling, processing IPCs 
takes a lot of CPU cycles in plasma store currently, as UNIX domain socket 
processing needs to go through kernel, thus batching multiple IPCs into a 
single one should greatly improve the plasma store performance as well.

 

This change mostly applies to Release(), Seal(), and Create() interfaces.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3121) [C++] Mean kernel aggregate

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3121.
-
Resolution: Fixed

Issue resolved by pull request 3708
[https://github.com/apache/arrow/pull/3708]

> [C++]  Mean kernel aggregate
> 
>
> Key: ARROW-3121
> URL: https://issues.apache.org/jira/browse/ARROW-3121
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: analytics, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2627) [Python] Add option (or some equivalent) to toggle memory mapping functionality when using parquet.ParquetFile or other read entry points

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-2627.
-
Resolution: Fixed

Issue resolved by pull request 3639
[https://github.com/apache/arrow/pull/3639]

> [Python] Add option (or some equivalent) to toggle memory mapping 
> functionality when using parquet.ParquetFile or other read entry points
> -
>
> Key: ARROW-2627
> URL: https://issues.apache.org/jira/browse/ARROW-2627
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> See issue described in https://github.com/apache/arrow/issues/1946. When 
> passing a filename to {{parquet.ParquetFile}}, one cannot control what kind 
> of file reader internally is created (OSFile or MemoryMappedFile)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4710) [C++][R] New linting script skip files with "cpp" extension

2019-02-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4710:
--
Labels: pull-request-available  (was: )

> [C++][R] New linting script skip files with "cpp" extension
> ---
>
> Key: ARROW-4710
> URL: https://issues.apache.org/jira/browse/ARROW-4710
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> As a result, the {{r/lint.sh}} script is not checking the cpp files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4699) [C++] json parser should not rely on null terminated buffers

2019-02-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4699:
--
Labels: pull-request-available  (was: )

> [C++] json parser should not rely on null terminated buffers
> 
>
> Key: ARROW-4699
> URL: https://issues.apache.org/jira/browse/ARROW-4699
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Null terminated buffers are not always trivial to guarantee, for example when 
> parsing mmapped files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4710) [C++][R] New linting script skip files with "cpp" extension

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4710:
---

Assignee: Wes McKinney

> [C++][R] New linting script skip files with "cpp" extension
> ---
>
> Key: ARROW-4710
> URL: https://issues.apache.org/jira/browse/ARROW-4710
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> As a result, the {{r/lint.sh}} script is not checking the cpp files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4710) [C++][R] New linting script skip files with "cpp" extension

2019-02-27 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4710:
---

 Summary: [C++][R] New linting script skip files with "cpp" 
extension
 Key: ARROW-4710
 URL: https://issues.apache.org/jira/browse/ARROW-4710
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Wes McKinney
 Fix For: 0.13.0


As a result, the {{r/lint.sh}} script is not checking the cpp files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4709) [C++] Optimize for ordered JSON fields

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4709:


 Summary: [C++] Optimize for ordered JSON fields
 Key: ARROW-4709
 URL: https://issues.apache.org/jira/browse/ARROW-4709
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Benjamin Kietzman
Assignee: Benjamin Kietzman
 Fix For: 0.13.0


Fields appear consistently ordered in most JSON data in the wild, but the JSON 
parser currently looks fields up in a hash table. The ordering can probably be 
exploited to yield better performance when looking up field indices



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4708) [C++] Add multithreaded JSON reader

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4708:


 Summary: [C++] Add multithreaded JSON reader 
 Key: ARROW-4708
 URL: https://issues.apache.org/jira/browse/ARROW-4708
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Benjamin Kietzman
Assignee: Benjamin Kietzman


The JSON reader currently only parses from a single, contiguous buffer and only 
in a single thread. This would be much more useful if it supported 
multithreaded parsing from a stream, as CSV does



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4706) [C++] shared conversion framework for JSON/CSV parsers

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4706:


 Summary: [C++] shared conversion framework for JSON/CSV parsers
 Key: ARROW-4706
 URL: https://issues.apache.org/jira/browse/ARROW-4706
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Benjamin Kietzman
Assignee: Benjamin Kietzman


CSV and JSON both convert strings to values in a Array but there is little code 
sharing beyond {{arrow::util::StringConverter}}.

It would be advantageous if a single interface could be shared between CSV and 
JSON to do the heavy lifting of conversion consistently. This would simplify 
addition of new parsers as well as allowing all parsers to immediately take 
advantage of a new conversion strategy.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4560) [R] array() needs to take single input, not ...

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4560.
-
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3635
[https://github.com/apache/arrow/pull/3635]

> [R] array() needs to take single input, not ...
> ---
>
> Key: ARROW-4560
> URL: https://issues.apache.org/jira/browse/ARROW-4560
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Assignee: Romain François
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The `array()` factory takes `...` and this makes this harder than it needs to 
> be because then we have two competing views on type: 
>  - `vctrs::vec_c(.ptype=)` which uses R specific typing system
>  - the arrow type 
>  
> So `array()` should really take a single thing for `data`, which may be the 
> result of a `vctrs::vec_c(...)` if we do want R type promotion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4707) [C++] move BitsetStack to bit-util.h

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4707:


 Summary: [C++] move BitsetStack to bit-util.h
 Key: ARROW-4707
 URL: https://issues.apache.org/jira/browse/ARROW-4707
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Benjamin Kietzman
Assignee: Benjamin Kietzman
 Fix For: 0.13.0


BitsetStack was written for use in the JSON parser, but it's useful enough that 
it should be made available in bit-util.h



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4705) [Rust] CSV reader should show line number and error message when failing to parse a line

2019-02-27 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4705:
-

 Summary: [Rust] CSV reader should show line number and error 
message when failing to parse a line
 Key: ARROW-4705
 URL: https://issues.apache.org/jira/browse/ARROW-4705
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.12.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.13.0


We currently throw away the original error and do not report line number, 
making it very difficult to debug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4700) [C++] Add DecimalType support to JSON parser

2019-02-27 Thread Benjamin Kietzman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kietzman updated ARROW-4700:
-
Component/s: C++

> [C++] Add DecimalType support to JSON parser
> 
>
> Key: ARROW-4700
> URL: https://issues.apache.org/jira/browse/ARROW-4700
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Minor
>
> This should be simple to add, since decimals can already be parsed from 
> strings. Decimals will be represented in JSON as strings. If a decimal with 
> different precision or scale is parsed from the string, should there be any 
> acceptable conversions? (for example, if the column is of type {{decimal(5, 
> 2)}} should {{"12.3"}} be an error or equivalent to {{"012.30"}}?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4704) [CI][GLib] Plasma test is flaky

2019-02-27 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-4704:
---

 Summary: [CI][GLib] Plasma test is flaky
 Key: ARROW-4704
 URL: https://issues.apache.org/jira/browse/ARROW-4704
 Project: Apache Arrow
  Issue Type: Test
  Components: GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


https://travis-ci.org/apache/arrow/jobs/496581538

{noformat}
I0221 17:46:34.225402 20533 store.cc:1093] Allowing the Plasma store to use up 
to 0.00104858GB of memory.
I0221 17:46:34.227128 20533 store.cc:1120] Starting object store with directory 
/dev/shm and huge page support disabled
I0221 17:46:34.229485 20533 store.cc:693] Disconnecting client on fd 7
No output has been received in the last 10m0s, this potentially indicates a 
stalled build or something wrong with the build itself.
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4696) Verify release script is over optimist with CUDA detection

2019-02-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4696:
--
Labels: pull-request-available  (was: )

> Verify release script is over optimist with CUDA detection
> --
>
> Key: ARROW-4696
> URL: https://issues.apache.org/jira/browse/ARROW-4696
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Francois Saint-Jacques
>Priority: Minor
>  Labels: pull-request-available
>
> I have a Nvidia GPU without cuda, everytime I run the verification scripts it 
> borks in the middle because ARROW_HAVE_CUDA is evaluated to yes because 
> `nvidia-smi --list-gpus` returns true. This can be a long process if I forget 
> about it.
> Would it be better to check for `CUDA_HOME`?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-4703) [C++] Upgrade dependency versions

2019-02-27 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou closed ARROW-4703.
-
Resolution: Duplicate

It looks like I opened a duplicate issue while JIRA was being flaky.

> [C++] Upgrade dependency versions
> -
>
> Key: ARROW-4703
> URL: https://issues.apache.org/jira/browse/ARROW-4703
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.13.0
>
>
> At some point we should probably update the versions of the third-party 
> libraries we depend on. There might be useful bug or security fixes there, or 
> performance improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4702) [C++] Upgrade dependency versions

2019-02-27 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-4702:
-

 Summary: [C++] Upgrade dependency versions
 Key: ARROW-4702
 URL: https://issues.apache.org/jira/browse/ARROW-4702
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
 Fix For: 0.13.0


At some point we should probably update the versions of the third-party 
libraries we depend on. There might be useful bug or security fixes there, or 
performance improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4703) [C++] Upgrade dependency versions

2019-02-27 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-4703:
-

 Summary: [C++] Upgrade dependency versions
 Key: ARROW-4703
 URL: https://issues.apache.org/jira/browse/ARROW-4703
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
 Fix For: 0.13.0


At some point we should probably update the versions of the third-party 
libraries we depend on. There might be useful bug or security fixes there, or 
performance improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4039) [Python] Update link to 'development.rst' page from Python README.md

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4039:

Summary: [Python] Update link to 'development.rst' page from Python 
README.md  (was: Update link to 'development.rst' page from Python README.md)

> [Python] Update link to 'development.rst' page from Python README.md
> 
>
> Key: ARROW-4039
> URL: https://issues.apache.org/jira/browse/ARROW-4039
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation, Python
>Reporter: Tanya Schlusser
>Assignee: Benjamin Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When the Sphinx docs were restructured, the link in the 
> [README|https://github.com/apache/arrow/blob/master/python/README.md]  
> changed from
> [https://github.com/apache/arrow/blob/master/python/doc/source/development.rst]
> to
> [https://github.com/apache/arrow/blob/master/docs/source/python/development.rst]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4696) Verify release script is over optimist with CUDA detection

2019-02-27 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779718#comment-16779718
 ] 

Kouhei Sutou commented on ARROW-4696:
-

Is {{CUDA_HOME}} an environment variable?
If so, it's not suitable because we can use CUDA without defining {{CUDA_HOME}} 
environment variable.

{{nvidia-smi --list-gpus 2>&1 > /dev/null && nvcc --version 2>&1 > /dev/null}} 
will be better.

> Verify release script is over optimist with CUDA detection
> --
>
> Key: ARROW-4696
> URL: https://issues.apache.org/jira/browse/ARROW-4696
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Francois Saint-Jacques
>Priority: Minor
>
> I have a Nvidia GPU without cuda, everytime I run the verification scripts it 
> borks in the middle because ARROW_HAVE_CUDA is evaluated to yes because 
> `nvidia-smi --list-gpus` returns true. This can be a long process if I forget 
> about it.
> Would it be better to check for `CUDA_HOME`?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4007) [Java][Plasma] Plasma JNI tests failing

2019-02-27 Thread Francois Saint-Jacques (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779721#comment-16779721
 ] 

Francois Saint-Jacques commented on ARROW-4007:
---

My impression is that this is a duplicate of ARROW-4236 and can be closed.

> [Java][Plasma] Plasma JNI tests failing
> ---
>
> Key: ARROW-4007
> URL: https://issues.apache.org/jira/browse/ARROW-4007
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Wes McKinney
>Priority: Critical
> Fix For: 0.13.0
>
>
> see https://travis-ci.org/apache/arrow/jobs/466819720
> {code}
> [INFO] Total time: 10.633 s
> [INFO] Finished at: 2018-12-12T03:56:33Z
> [INFO] Final Memory: 39M/426M
> [INFO] 
> 
>   linux-vdso.so.1 =>  (0x7ffcff172000)
>   librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7f99ecd9e000)
>   libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x7f99ecb85000)
>   libboost_system.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0 (0x7f99ec981000)
>   libboost_filesystem.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.54.0 (0x7f99ec76b000)
>   libboost_regex.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_regex.so.1.54.0 (0x7f99ec464000)
>   libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
> (0x7f99ec246000)
>   libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
> (0x7f99ebf3)
>   libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f99ebc2a000)
>   libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 
> (0x7f99eba12000)
>   libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f99eb649000)
>   libicuuc.so.52 => /usr/lib/x86_64-linux-gnu/libicuuc.so.52 
> (0x7f99eb2d)
>   libicui18n.so.52 => /usr/lib/x86_64-linux-gnu/libicui18n.so.52 
> (0x7f99eaec9000)
>   /lib64/ld-linux-x86-64.so.2 (0x7f99ecfa6000)
>   libicudata.so.52 => /usr/lib/x86_64-linux-gnu/libicudata.so.52 
> (0x7f99e965c000)
>   libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7f99e9458000)
> /home/travis/build/apache/arrow/cpp/src/plasma/store.cc:985: Allowing the 
> Plasma store to use up to 0.01GB of memory.
> /home/travis/build/apache/arrow/cpp/src/plasma/store.cc:1015: Starting object 
> store with directory /dev/shm and huge page support disabled
> Start process 317574433 OK, cmd = 
> [/home/travis/build/apache/arrow/cpp-install/bin/plasma_store_server  -s  
> /tmp/store89237  -m  1000]
> Start object store success
> Start test.
> Plasma java client put test success.
> Plasma java client get single object test success.
> Plasma java client get multi-object test success.
> ObjectId [B@34c45dca error at PlasmaClient put
> java.lang.Exception: An object with this ID already exists in the plasma 
> store.
>   at org.apache.arrow.plasma.PlasmaClientJNI.create(Native Method)
>   at org.apache.arrow.plasma.PlasmaClient.put(PlasmaClient.java:51)
>   at 
> org.apache.arrow.plasma.PlasmaClientTest.doTest(PlasmaClientTest.java:145)
>   at 
> org.apache.arrow.plasma.PlasmaClientTest.main(PlasmaClientTest.java:220)
> Plasma java client put same object twice exception test success.
> Plasma java client hash test success.
> Plasma java client contains test success.
> Plasma java client metadata get test success.
> Plasma java client delete test success.
> Kill plasma store process forcely
> All test success.
> ~/build/apache/arrow
> {code}
> I didn't see any related code changes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4696) Verify release script is over optimist with CUDA detection

2019-02-27 Thread Francois Saint-Jacques (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779719#comment-16779719
 ] 

Francois Saint-Jacques commented on ARROW-4696:
---

Seems like `nvcc` check is a good middle ground.

> Verify release script is over optimist with CUDA detection
> --
>
> Key: ARROW-4696
> URL: https://issues.apache.org/jira/browse/ARROW-4696
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Francois Saint-Jacques
>Priority: Minor
>
> I have a Nvidia GPU without cuda, everytime I run the verification scripts it 
> borks in the middle because ARROW_HAVE_CUDA is evaluated to yes because 
> `nvidia-smi --list-gpus` returns true. This can be a long process if I forget 
> about it.
> Would it be better to check for `CUDA_HOME`?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4460) [Website] Write blog post to announce DataFusion donation

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4460:

Component/s: Website

> [Website] Write blog post to announce DataFusion donation
> -
>
> Key: ARROW-4460
> URL: https://issues.apache.org/jira/browse/ARROW-4460
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4324) [Python] Array dtype inference incorrect when created from list of mixed numpy scalars

2019-02-27 Thread Keith Kraus (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779717#comment-16779717
 ] 

Keith Kraus commented on ARROW-4324:


[~wesmckinn] I would love to, but unfortunately won't have any bandwidth likely 
for at least a month so. I'd support [~xhochy]'s proposal of throwing an 
Exception on mixed dtypes.

> [Python] Array dtype inference incorrect when created from list of mixed 
> numpy scalars
> --
>
> Key: ARROW-4324
> URL: https://issues.apache.org/jira/browse/ARROW-4324
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Keith Kraus
>Priority: Minor
> Fix For: 0.13.0
>
>
> Minimal reproducer:
> {code:python}
> import pyarrow as pa
> import numpy as np
> test_list = [np.dtype('int32').type(10), np.dtype('float32').type(0.5)]
> test_array = pa.array(test_list)
> # Expected
> # test_array
> # 
> # [
> #   10,
> #   0.5
> # ]
> # Got
> # test_array
> # 
> # [
> #   10,
> #   0
> # ]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4546) [C++] LICENSE.txt should be updated.

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4546:

Summary: [C++] LICENSE.txt should be updated.  (was: LICENSE.txt should be 
updated.)

> [C++] LICENSE.txt should be updated.
> 
>
> Key: ARROW-4546
> URL: https://issues.apache.org/jira/browse/ARROW-4546
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Renat Valiullin
>Assignee: Francois Saint-Jacques
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> parquet-cpp/blob/master/LICENSE.txt is not mentioned there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4657) [Release] gbenchmark should not be needed for verification

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4657.
-
Resolution: Fixed

Issue resolved by pull request 3772
[https://github.com/apache/arrow/pull/3772]

> [Release] gbenchmark should not be needed for verification
> --
>
> Key: ARROW-4657
> URL: https://issues.apache.org/jira/browse/ARROW-4657
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Packaging
>Affects Versions: 0.12.0, 0.12.1
>Reporter: Uwe L. Korn
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{gbenchmark}} is built during verification and thus we require a minimal 
> version of CMake 3.6. I would have guessed that we should not require it as 
> we do not need to build the benchmarks during the verification. I guess that 
> a recent fix from [~wesmckinn] may have fixed this, but we should verify this 
> before doing the next release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4403) [Rust] CI fails due to formatting errors

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4403:

Component/s: Rust

> [Rust] CI fails due to formatting errors
> 
>
> Key: ARROW-4403
> URL: https://issues.apache.org/jira/browse/ARROW-4403
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [https://travis-ci.org/apache/arrow/jobs/485310770]
>  
> Diff in /home/travis/build/apache/arrow/rust/arrow/src/csv/reader.rs at line 
> 545:
>  Field::new("lng", DataType::Float64, false),
>  ]);
>  
> - let file_with_headers = 
> File::open("test/data/uk_cities_with_headers.csv").unwrap();
> + let file_with_headers =
> + File::open("test/data/uk_cities_with_headers.csv").unwrap();
>  let file_without_headers = File::open("test/data/uk_cities.csv").unwrap();
>  let both_files = file_with_headers
>  .chain(Cursor::new("\n".to_string()))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4414) [C++] Stop using cmake COMMAND_EXPAND_LISTS because it breaks package builds for older distros

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4414:

Component/s: C++

> [C++] Stop using cmake COMMAND_EXPAND_LISTS because it breaks package builds 
> for older distros
> --
>
> Key: ARROW-4414
> URL: https://issues.apache.org/jira/browse/ARROW-4414
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> COMMAND_EXPAND_LISTS option of add_custom_command is too new on Ubuntu Xenial 
> and Debian stretch. It's available since CMake 3.8: 
> https://cmake.org/cmake/help/v3.8/command/add_custom_command.html
> We need to stop using it in cpp/src/gandiva/precompiled/CMakeLists.txt
> Also We should pin cmake to version 3.5 in travis builds (xenial ships cmake 
> 3.5)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4444) [Testing] Add DataFusion test files to arrow-testing repo

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-:

Component/s: Rust - DataFusion

> [Testing] Add DataFusion test files to arrow-testing repo
> -
>
> Key: ARROW-
> URL: https://issues.apache.org/jira/browse/ARROW-
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Adding DataFusion test CSV and Parquet files to arrow-testing so that all 
> implementations can use them if desired



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4434) [Python] Cannot create empty StructArray via pa.StructArray.from_arrays

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4434:

Component/s: Python

> [Python] Cannot create empty StructArray via pa.StructArray.from_arrays
> ---
>
> Key: ARROW-4434
> URL: https://issues.apache.org/jira/browse/ARROW-4434
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Krisztian Szucs
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:python}
> In [5]: pa.StructArray.from_arrays([], names=[])
> ---
> ValueErrorTraceback (most recent call last)
>  in 
> > 1 pa.StructArray.from_arrays([], names=[])
> ~/Workspace/arrow/python/pyarrow/array.pxi in 
> pyarrow.lib.StructArray.from_arrays()
>1326 num_arrays = len(arrays)
>1327 if num_arrays == 0:
> -> 1328 raise ValueError("arrays list is empty")
>1329
>1330 length = len(arrays[0])
> ValueError: arrays list is empty
> {code}
> however
> {code:python}
> pa.array([], type=pa.struct([]))
> {code}
> works



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4457) [Python] Cannot create Decimal128 array using integers

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4457:

Component/s: Python

> [Python] Cannot create Decimal128 array using integers
> --
>
> Key: ARROW-4457
> URL: https://issues.apache.org/jira/browse/ARROW-4457
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.0, 0.11.1, 0.12.0
> Environment: Python: 2.7.15 and 3.7.2
> pyarrow: tested on 0.11.0, 0.11.1, 0.12.0
>Reporter: Diego Argueta
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There appears to have been a regression introduced in 0.11.0 such that we can 
> no longer create a {{Decimal128}} array using integers.
> To reproduce:
> {code:python}
> import pyarrow
> column = pyarrow.decimal128(16, 4)
> array = pyarrow.array([1], column)
> {code}
> Expected result: Behavior same as 0.10.0 and earlier; a {{Decimal128}} array 
> would be created with no problems.
> Actual result: an exception is thrown.
> {code}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "pyarrow/array.pxi", line 175, in pyarrow.lib.array
> return _sequence_to_array(obj, mask, size, type, pool, from_pandas)
>   File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
> check_status(ConvertPySequence(sequence, mask, options, ))
>   File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
> raise ArrowInvalid(message)
> ArrowInvalid: Could not convert 1 with type int: converting to Decimal128
> Could not convert 1 with type int: converting to Decimal128
> {code}
> The crash doesn't occur if we use a {{decimal.Decimal}} object instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4459) [Testing] Add git submodule for arrow-testing data files

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4459:

Component/s: Developer Tools

> [Testing] Add git submodule for arrow-testing data files
> 
>
> Key: ARROW-4459
> URL: https://issues.apache.org/jira/browse/ARROW-4459
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We now have some test data files in arrow-testing that can be shared across 
> Arrow implementations. We need to add this repo as a submodule in the main 
> Arrow repo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4546) LICENSE.txt should be updated.

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4546:

Component/s: C++

> LICENSE.txt should be updated.
> --
>
> Key: ARROW-4546
> URL: https://issues.apache.org/jira/browse/ARROW-4546
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Renat Valiullin
>Assignee: Francois Saint-Jacques
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> parquet-cpp/blob/master/LICENSE.txt is not mentioned there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4604) [Rust] [DataFusion] Add benchmarks for SQL query execution

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4604:

Component/s: Rust - DataFusion

> [Rust] [DataFusion] Add benchmarks for SQL query execution
> --
>
> Key: ARROW-4604
> URL: https://issues.apache.org/jira/browse/ARROW-4604
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.13.0
>
>
> Add benchmarks for various types of SQL query so we can catch performance 
> regressions easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4475) [Python] Serializing objects that contain themselves

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4475:

Component/s: Python

> [Python] Serializing objects that contain themselves
> 
>
> Key: ARROW-4475
> URL: https://issues.apache.org/jira/browse/ARROW-4475
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a regression from [https://github.com/apache/arrow/pull/3423]
> The following segfaults:
> {code:java}
> import pyarrow as pa
> lst = []
> lst.append(lst)
> pa.serialize(lst){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4618) [Docker] Makefile to build dependent docker images

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4618:

Component/s: Developer Tools

> [Docker] Makefile to build dependent docker images
> --
>
> Key: ARROW-4618
> URL: https://issues.apache.org/jira/browse/ARROW-4618
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Docker compose cannot be used to build image hierarchies:
> - https://github.com/docker/compose/issues/6093
> - https://github.com/docker/compose/issues/6264#issuecomment-429268195



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4639) [CI] Crossbow build failing for Gandiva jars

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4639:

Component/s: C++ - Gandiva

> [CI] Crossbow build failing for Gandiva jars
> 
>
> Key: ARROW-4639
> URL: https://issues.apache.org/jira/browse/ARROW-4639
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> All tests are failing. Seems to be related to gflags.
>  
> [https://travis-ci.org/pravindra/arrow-build/jobs/495977029]
>  
> 1: Test timeout computed to be: 1000
> 1: Running arrow-allocator-test, redirecting output into 
> /Users/travis/build/pravindra/arrow-build/arrow/cpp/build/build/test-logs/arrow-allocator-test.txt
>  (attempt 1/1)
> 1: dyld: Library not loaded: @rpath/libgflags.2.2.dylib
> 1: Referenced from: 
> /Users/travis/build/pravindra/arrow-build/arrow/cpp/build/release/libarrow.13.dylib
> 1: Reason: image not found
> 1: 
> /Users/travis/build/pravindra/arrow-build/arrow/cpp/build-support/run-test.sh:
>  line 97: 8124 Abort trap: 6 $TEST_EXECUTABLE "$@" 2>&1
> 1: 8125 Done | $ROOT/build-support/asan_symbolize.py
> 1: 8126 Done | c++filt
> 1: 8127 Done | $ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE
> 1: 8128 Done | $pipe_cmd 2>&1
> 1: 8129 Done | tee $LOGFILE



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4689) [Go] add support for WASM

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4689.
-
Resolution: Fixed

Issue resolved by pull request 3707
[https://github.com/apache/arrow/pull/3707]

> [Go] add support for WASM
> -
>
> Key: ARROW-4689
> URL: https://issues.apache.org/jira/browse/ARROW-4689
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4689) [Go] add support for WASM

2019-02-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4689:
--
Labels: pull-request-available  (was: )

> [Go] add support for WASM
> -
>
> Key: ARROW-4689
> URL: https://issues.apache.org/jira/browse/ARROW-4689
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4473) [Website] Add instructions to do a test-deploy of Arrow website and fix bugs

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4473:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Website] Add instructions to do a test-deploy of Arrow website and fix bugs
> 
>
> Key: ARROW-4473
> URL: https://issues.apache.org/jira/browse/ARROW-4473
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> This will help with testing and proofing the website.
> I have noticed that there are bugs in the website when the baseurl is not a 
> foo.bar.baz, e.g. if you deploy at root foo.bar.baz/test-site many images and 
> links are broken



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4701) [C++] Add JSON chunker benchmarks

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4701:


 Summary: [C++] Add JSON chunker benchmarks
 Key: ARROW-4701
 URL: https://issues.apache.org/jira/browse/ARROW-4701
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Benjamin Kietzman
Assignee: Benjamin Kietzman
 Fix For: 0.13.0


The JSON chunker is not currently benchmarked or tested, but it is a necessary 
component of a multithreaded reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3361) [R] Run cpp/build-support/cpplint.py on C++ source files

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3361.
-
Resolution: Fixed

Issue resolved by pull request 3773
[https://github.com/apache/arrow/pull/3773]

> [R] Run cpp/build-support/cpplint.py on C++ source files
> 
>
> Key: ARROW-3361
> URL: https://issues.apache.org/jira/browse/ARROW-3361
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This will help with additional code cleanliness



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4700) [C++] Add DecimalType support to JSON parser

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4700:


 Summary: [C++] Add DecimalType support to JSON parser
 Key: ARROW-4700
 URL: https://issues.apache.org/jira/browse/ARROW-4700
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Benjamin Kietzman
Assignee: Benjamin Kietzman


This should be simple to add, since decimals can already be parsed from 
strings. Decimals will be represented in JSON as strings. If a decimal with 
different precision or scale is parsed from the string, should there be any 
acceptable conversions? (for example, if the column is of type {{decimal(5, 
2)}} should {{"12.3"}} be an error or equivalent to {{"012.30"}}?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4649) [C++/CI/R] Add nightly job that builds `brew install apache-arrow --HEAD`

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4649:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++/CI/R] Add nightly job that builds `brew install apache-arrow --HEAD`
> -
>
> Key: ARROW-4649
> URL: https://issues.apache.org/jira/browse/ARROW-4649
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, R
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> Now that we have an Arrow homebrew formula again and we may want to have it 
> as a simple setup for R Arrow users, we should add a nightly crossbow task 
> that checks whether this still builds fine.
> To implement this, one should write a new travis.yml like 
> [https://github.com/apache/arrow/blob/master/dev/tasks/python-wheels/travis.osx.yml]
>  that calls {{brew install apache-arrow --HEAD}}. This task should then be 
> added to https://github.com/apache/arrow/blob/master/dev/tasks/tests.yml so 
> that it is executed as part of the nightly chain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4698) [C++] Let StringBuilder be constructible with a pre allocated buffer for character data

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4698:


 Summary: [C++] Let StringBuilder be constructible with a pre 
allocated buffer for character data
 Key: ARROW-4698
 URL: https://issues.apache.org/jira/browse/ARROW-4698
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Benjamin Kietzman
Assignee: Benjamin Kietzman


This is useful for example when an existing buffer can be immediately reused. 
This is currently used for [storage of strings in json 
parsing](https://github.com/apache/arrow/blob/master/cpp/src/arrow/json/parser.cc#L60),
 so it'd be straightforward to refactor into a constructor of StringBuilder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4504) [C++] Reduce the number of unit test executables

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4504:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Reduce the number of unit test executables
> 
>
> Key: ARROW-4504
> URL: https://issues.apache.org/jira/browse/ARROW-4504
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Link times are a significant drag in MSVC builds. They don't affect Linux 
> nearly as much when building with Ninja. I suggest we combine some of the 
> fast-running tests within logical units to see if we can cut down from 106 
> test executables to 70 or so
> {code}
> 100% tests passed, 0 tests failed out of 107
> Label Time Summary:
> arrow-tests   =  21.19 sec*proc (48 tests)
> arrow_python-tests=   0.26 sec*proc (1 test)
> example   =   0.05 sec*proc (1 test)
> gandiva-tests =  11.65 sec*proc (39 tests)
> parquet-tests =  35.81 sec*proc (18 tests)
> unittest  =  68.92 sec*proc (106 tests)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-4382) [C++] Improve new cpplint output readability

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-4382.
---
Resolution: Not A Problem

> [C++] Improve new cpplint output readability
> 
>
> Key: ARROW-4382
> URL: https://issues.apache.org/jira/browse/ARROW-4382
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> The output is bring printed as a __repr__ so the line breaks don't render 
> properly
> {code}
> cd /home/wesm/code/arrow/cpp/preflight-build && 
> /home/wesm/miniconda/envs/arrow-3.7/bin/python 
> /home/wesm/code/arrow/cpp/build-support/run_cpplint.py --cpplint_binary 
> /home/wesm/code/arrow/cpp/build-support/cpplint.py --exclude_globs 
> /home/wesm/code/arrow/cpp/build-support/lint_exclusions.txt --source_dir 
> /home/wesm/code/arrow/cpp/src --quiet
> b'/home/wesm/code/arrow/cpp/src/parquet/schema.cc:68:  Add #include 
>  for copy  [build/include_what_you_use] 
> [4]\n/home/wesm/code/arrow/cpp/src/parquet/schema.cc:71:  Add #include 
>  for move  [build/include_what_you_use] 
> [4]\n/home/wesm/code/arrow/cpp/src/parquet/file_writer.cc:225:  Add #include 
>  for vector<>  [build/include_what_you_use] 
> [4]\n/home/wesm/code/arrow/cpp/src/parquet/file_writer.cc:390:  Add #include 
>  for move  [build/include_what_you_use] [4]\n'
> b'/home/wesm/code/arrow/cpp/src/parquet/arrow/record_reader.cc:897:  Add 
> #include  for move  [build/include_what_you_use] 
> [4]\n/home/wesm/code/arrow/cpp/src/parquet/arrow/writer.cc:1117:  Add 
> #include  for move  [build/include_what_you_use] 
> [4]\n/home/wesm/code/arrow/cpp/src/parquet/arrow/reader.cc:1552:  Add 
> #include  for move  [build/include_what_you_use] [4]\n'
> /home/wesm/code/arrow/cpp/src/parquet/file_writer.cc: had cpplint issues
> /home/wesm/code/arrow/cpp/src/parquet/schema.cc: had cpplint issues
> /home/wesm/code/arrow/cpp/src/parquet/arrow/record_reader.cc: had cpplint 
> issues
> /home/wesm/code/arrow/cpp/src/parquet/arrow/writer.cc: had cpplint issues
> /home/wesm/code/arrow/cpp/src/parquet/arrow/reader.cc: had cpplint issues
> ninja: build stopped: subcommand failed.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4699) [C++] json parser should not rely on null terminated buffers

2019-02-27 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-4699:


 Summary: [C++] json parser should not rely on null terminated 
buffers
 Key: ARROW-4699
 URL: https://issues.apache.org/jira/browse/ARROW-4699
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Benjamin Kietzman
Assignee: Benjamin Kietzman
 Fix For: 0.13.0


Null terminated buffers are not always trivial to guarantee, for example when 
parsing mmapped files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4327) [Python] Add requirements-build.txt file to simplify setting up Python build environment

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4327:
---

Assignee: Ryan White  (was: Krisztian Szucs)

> [Python] Add requirements-build.txt file to simplify setting up Python build 
> environment
> 
>
> Key: ARROW-4327
> URL: https://issues.apache.org/jira/browse/ARROW-4327
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, Python
> Environment: CentOS7 or Fedora29
>Reporter: Ryan White
>Assignee: Ryan White
>Priority: Minor
>  Labels: build, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Trying to build pyarrow on CentOS or Fedora fails to load libarrow.so. The 
> build does not use conda, rather is similar to the OSX build instructions. 
>  A dockerfile is available here:
> https://github.com/ryanmackenziewhite/dockers/blob/master/centos7-py36-arrowbuild/Dockerfile
> {code:java}
> // ImportError while loading conftest 
> '/work/repos/arrow/python/pyarrow/tests/conftest.py'.
> pyarrow/__init__.py:54: in 
> from pyarrow.lib import cpu_count, set_cpu_count
> E ImportError: libarrow.so.12: cannot open shared object file: No such file 
> or directory
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4428) [R] Feature flags for R build

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779679#comment-16779679
 ] 

Wes McKinney commented on ARROW-4428:
-

[~romainfrancois] [~javierluraschi] any thoughts about this? The window for 
0.13 is closing

> [R] Feature flags for R build
> -
>
> Key: ARROW-4428
> URL: https://issues.apache.org/jira/browse/ARROW-4428
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> There are a number of optional components in the Arrow C++ library. In Python 
> we have feature flags to turn on and off parts of the bindings based on what 
> C++ libraries have been built. There is also some logic to try to detect what 
> has been built and enable those features.
> We need to have the same thing in R. Some components, like Plasma, are not 
> available for Windows and so necessarily these will have to be flagged off. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4343) [C++] Add as complete as possible Ubuntu Trusty / 14.04 build to docker-compose setup

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4343:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Add as complete as possible Ubuntu Trusty / 14.04 build to 
> docker-compose setup
> -
>
> Key: ARROW-4343
> URL: https://issues.apache.org/jira/browse/ARROW-4343
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Until we formally stop supporting Trusty it would be useful to be able to 
> verify in Docker that builds work there. I still have an Ubuntu 14.04 machine 
> that I use (and I've been filing bugs that I find on it) but not sure for how 
> much longer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4333) [C++] Sketch out design for kernels and "query" execution in compute layer

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4333:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Sketch out design for kernels and "query" execution in compute layer
> --
>
> Key: ARROW-4333
> URL: https://issues.apache.org/jira/browse/ARROW-4333
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Micah Kornfield
>Priority: Major
>  Labels: analytics
> Fix For: 0.14.0
>
>
> It would be good to formalize the design of kernels and the controlling query 
> execution layer (e.g. volcano batch model?) to understand the following:
> Contracts for kernels:
>  * Thread safety of kernels?
>  * When Kernels should allocate memory vs expect preallocated memory?  How to 
> communicate requirements for a kernels memory allocaiton?
>  * How to communicate the whether a kernels execution is parallelizable 
> across a ChunkedArray?  How to determine if the order to execution across a 
> ChunkedArray is important?
>  * How to communicate when it is safe to re-use the same buffers and input 
> and output to the same kernel?
> What does the threading model look like for the higher level of control?  
> Where should synchronization happen?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4327) [Python] Add requirements-build.txt file to simplify setting up Python build environment

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4327.
-
Resolution: Fixed

Resolved by 
https://github.com/apache/arrow/commit/08d6307671878b35ffc99eb9f6398b3b2aee2f15

> [Python] Add requirements-build.txt file to simplify setting up Python build 
> environment
> 
>
> Key: ARROW-4327
> URL: https://issues.apache.org/jira/browse/ARROW-4327
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, Python
> Environment: CentOS7 or Fedora29
>Reporter: Ryan White
>Assignee: Ryan White
>Priority: Minor
>  Labels: build, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Trying to build pyarrow on CentOS or Fedora fails to load libarrow.so. The 
> build does not use conda, rather is similar to the OSX build instructions. 
>  A dockerfile is available here:
> https://github.com/ryanmackenziewhite/dockers/blob/master/centos7-py36-arrowbuild/Dockerfile
> {code:java}
> // ImportError while loading conftest 
> '/work/repos/arrow/python/pyarrow/tests/conftest.py'.
> pyarrow/__init__.py:54: in 
> from pyarrow.lib import cpu_count, set_cpu_count
> E ImportError: libarrow.so.12: cannot open shared object file: No such file 
> or directory
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4327) [Python] Add requirements-build.txt file to simplify setting up Python build environment

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4327:

Summary: [Python] Add requirements-build.txt file to simplify setting up 
Python build environment  (was: [Python] pyarrow fails to load libarrow.so in 
Fedora / CentOS Docker build)

> [Python] Add requirements-build.txt file to simplify setting up Python build 
> environment
> 
>
> Key: ARROW-4327
> URL: https://issues.apache.org/jira/browse/ARROW-4327
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, Python
> Environment: CentOS7 or Fedora29
>Reporter: Ryan White
>Assignee: Krisztian Szucs
>Priority: Minor
>  Labels: build, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Trying to build pyarrow on CentOS or Fedora fails to load libarrow.so. The 
> build does not use conda, rather is similar to the OSX build instructions. 
>  A dockerfile is available here:
> https://github.com/ryanmackenziewhite/dockers/blob/master/centos7-py36-arrowbuild/Dockerfile
> {code:java}
> // ImportError while loading conftest 
> '/work/repos/arrow/python/pyarrow/tests/conftest.py'.
> pyarrow/__init__.py:54: in 
> from pyarrow.lib import cpu_count, set_cpu_count
> E ImportError: libarrow.so.12: cannot open shared object file: No such file 
> or directory
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4208) [CI/Python] Have automatized tests for S3

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4208:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [CI/Python] Have automatized tests for S3
> -
>
> Key: ARROW-4208
> URL: https://issues.apache.org/jira/browse/ARROW-4208
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: s3
> Fix For: 0.14.0
>
>
> Currently We don't run S3 integration tests regularly. 
> Possible solutions:
> - mock it within python/pytest
> - simply run the s3 tests with an S3 credential provided
> - create a hdfs-integration like docker-compose setup and run an S3 mock 
> server (e.g.: https://github.com/adobe/S3Mock, 
> https://github.com/jubos/fake-s3, https://github.com/gaul/s3proxy, 
> https://github.com/jserver/mock-s3)
> For more see discussion https://github.com/apache/arrow/pull/3286



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4007) [Java][Plasma] Plasma JNI tests failing

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779657#comment-16779657
 ] 

Wes McKinney commented on ARROW-4007:
-

?

> [Java][Plasma] Plasma JNI tests failing
> ---
>
> Key: ARROW-4007
> URL: https://issues.apache.org/jira/browse/ARROW-4007
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Wes McKinney
>Priority: Critical
> Fix For: 0.13.0
>
>
> see https://travis-ci.org/apache/arrow/jobs/466819720
> {code}
> [INFO] Total time: 10.633 s
> [INFO] Finished at: 2018-12-12T03:56:33Z
> [INFO] Final Memory: 39M/426M
> [INFO] 
> 
>   linux-vdso.so.1 =>  (0x7ffcff172000)
>   librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7f99ecd9e000)
>   libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x7f99ecb85000)
>   libboost_system.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0 (0x7f99ec981000)
>   libboost_filesystem.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.54.0 (0x7f99ec76b000)
>   libboost_regex.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_regex.so.1.54.0 (0x7f99ec464000)
>   libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
> (0x7f99ec246000)
>   libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
> (0x7f99ebf3)
>   libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f99ebc2a000)
>   libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 
> (0x7f99eba12000)
>   libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f99eb649000)
>   libicuuc.so.52 => /usr/lib/x86_64-linux-gnu/libicuuc.so.52 
> (0x7f99eb2d)
>   libicui18n.so.52 => /usr/lib/x86_64-linux-gnu/libicui18n.so.52 
> (0x7f99eaec9000)
>   /lib64/ld-linux-x86-64.so.2 (0x7f99ecfa6000)
>   libicudata.so.52 => /usr/lib/x86_64-linux-gnu/libicudata.so.52 
> (0x7f99e965c000)
>   libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7f99e9458000)
> /home/travis/build/apache/arrow/cpp/src/plasma/store.cc:985: Allowing the 
> Plasma store to use up to 0.01GB of memory.
> /home/travis/build/apache/arrow/cpp/src/plasma/store.cc:1015: Starting object 
> store with directory /dev/shm and huge page support disabled
> Start process 317574433 OK, cmd = 
> [/home/travis/build/apache/arrow/cpp-install/bin/plasma_store_server  -s  
> /tmp/store89237  -m  1000]
> Start object store success
> Start test.
> Plasma java client put test success.
> Plasma java client get single object test success.
> Plasma java client get multi-object test success.
> ObjectId [B@34c45dca error at PlasmaClient put
> java.lang.Exception: An object with this ID already exists in the plasma 
> store.
>   at org.apache.arrow.plasma.PlasmaClientJNI.create(Native Method)
>   at org.apache.arrow.plasma.PlasmaClient.put(PlasmaClient.java:51)
>   at 
> org.apache.arrow.plasma.PlasmaClientTest.doTest(PlasmaClientTest.java:145)
>   at 
> org.apache.arrow.plasma.PlasmaClientTest.main(PlasmaClientTest.java:220)
> Plasma java client put same object twice exception test success.
> Plasma java client hash test success.
> Plasma java client contains test success.
> Plasma java client metadata get test success.
> Plasma java client delete test success.
> Kill plasma store process forcely
> All test success.
> ~/build/apache/arrow
> {code}
> I didn't see any related code changes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4324) [Python] Array dtype inference incorrect when created from list of mixed numpy scalars

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779666#comment-16779666
 ] 

Wes McKinney commented on ARROW-4324:
-

[~keith.j.kraus] are you able to submit a PR for this?

> [Python] Array dtype inference incorrect when created from list of mixed 
> numpy scalars
> --
>
> Key: ARROW-4324
> URL: https://issues.apache.org/jira/browse/ARROW-4324
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Keith Kraus
>Priority: Minor
> Fix For: 0.13.0
>
>
> Minimal reproducer:
> {code:python}
> import pyarrow as pa
> import numpy as np
> test_list = [np.dtype('int32').type(10), np.dtype('float32').type(0.5)]
> test_array = pa.array(test_list)
> # Expected
> # test_array
> # 
> # [
> #   10,
> #   0.5
> # ]
> # Got
> # test_array
> # 
> # [
> #   10,
> #   0
> # ]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4309) [Release] gen_apidocs docker-compose task is out of date

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4309:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Release] gen_apidocs docker-compose task is out of date
> 
>
> Key: ARROW-4309
> URL: https://issues.apache.org/jira/browse/ARROW-4309
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools, Documentation
>Reporter: Wes McKinney
>Priority: Major
>  Labels: docker
> Fix For: 0.14.0
>
>
> This needs to be updated to build with CUDA support (which in turn will 
> require the host machine to have nvidia-docker), among other things



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4007) [Java][Plasma] Plasma JNI tests failing

2019-02-27 Thread Francois Saint-Jacques (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779665#comment-16779665
 ] 

Francois Saint-Jacques commented on ARROW-4007:
---

See 
https://github.com/apache/arrow/pull/3306/files#diff-a41857fc507aab72ff56d242b2cc9c61R148

> [Java][Plasma] Plasma JNI tests failing
> ---
>
> Key: ARROW-4007
> URL: https://issues.apache.org/jira/browse/ARROW-4007
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Wes McKinney
>Priority: Critical
> Fix For: 0.13.0
>
>
> see https://travis-ci.org/apache/arrow/jobs/466819720
> {code}
> [INFO] Total time: 10.633 s
> [INFO] Finished at: 2018-12-12T03:56:33Z
> [INFO] Final Memory: 39M/426M
> [INFO] 
> 
>   linux-vdso.so.1 =>  (0x7ffcff172000)
>   librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7f99ecd9e000)
>   libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x7f99ecb85000)
>   libboost_system.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0 (0x7f99ec981000)
>   libboost_filesystem.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.54.0 (0x7f99ec76b000)
>   libboost_regex.so.1.54.0 => 
> /usr/lib/x86_64-linux-gnu/libboost_regex.so.1.54.0 (0x7f99ec464000)
>   libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
> (0x7f99ec246000)
>   libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
> (0x7f99ebf3)
>   libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f99ebc2a000)
>   libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 
> (0x7f99eba12000)
>   libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f99eb649000)
>   libicuuc.so.52 => /usr/lib/x86_64-linux-gnu/libicuuc.so.52 
> (0x7f99eb2d)
>   libicui18n.so.52 => /usr/lib/x86_64-linux-gnu/libicui18n.so.52 
> (0x7f99eaec9000)
>   /lib64/ld-linux-x86-64.so.2 (0x7f99ecfa6000)
>   libicudata.so.52 => /usr/lib/x86_64-linux-gnu/libicudata.so.52 
> (0x7f99e965c000)
>   libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7f99e9458000)
> /home/travis/build/apache/arrow/cpp/src/plasma/store.cc:985: Allowing the 
> Plasma store to use up to 0.01GB of memory.
> /home/travis/build/apache/arrow/cpp/src/plasma/store.cc:1015: Starting object 
> store with directory /dev/shm and huge page support disabled
> Start process 317574433 OK, cmd = 
> [/home/travis/build/apache/arrow/cpp-install/bin/plasma_store_server  -s  
> /tmp/store89237  -m  1000]
> Start object store success
> Start test.
> Plasma java client put test success.
> Plasma java client get single object test success.
> Plasma java client get multi-object test success.
> ObjectId [B@34c45dca error at PlasmaClient put
> java.lang.Exception: An object with this ID already exists in the plasma 
> store.
>   at org.apache.arrow.plasma.PlasmaClientJNI.create(Native Method)
>   at org.apache.arrow.plasma.PlasmaClient.put(PlasmaClient.java:51)
>   at 
> org.apache.arrow.plasma.PlasmaClientTest.doTest(PlasmaClientTest.java:145)
>   at 
> org.apache.arrow.plasma.PlasmaClientTest.main(PlasmaClientTest.java:220)
> Plasma java client put same object twice exception test success.
> Plasma java client hash test success.
> Plasma java client contains test success.
> Plasma java client metadata get test success.
> Plasma java client delete test success.
> Kill plasma store process forcely
> All test success.
> ~/build/apache/arrow
> {code}
> I didn't see any related code changes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3710) [CI/Python] Run nightly tests against pandas master

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779647#comment-16779647
 ] 

Wes McKinney commented on ARROW-3710:
-

Where do we stand on this?

> [CI/Python] Run nightly tests against pandas master
> ---
>
> Key: ARROW-3710
> URL: https://issues.apache.org/jira/browse/ARROW-3710
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Follow-up of [https://github.com/apache/arrow/pull/2758] and 
> https://github.com/apache/arrow/pull/2755



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3435) [C++] Add option to use dynamic linking with re2

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3435:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Add option to use dynamic linking with re2
> 
>
> Key: ARROW-3435
> URL: https://issues.apache.org/jira/browse/ARROW-3435
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Initial support for re2 uses static linking -- some applications may wish to 
> use dynamic linking



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4119) [C++] Clean up cast implementation from null to other types

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4119:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Clean up cast implementation from null to other types
> ---
>
> Key: ARROW-4119
> URL: https://issues.apache.org/jira/browse/ARROW-4119
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> We are currently generating a bunch of empty kernels when there is no need. 
> It would be better to define a clean {{NullCastKernel}} that handles ensuring 
> that appropriate buffers have been constructed for the output all-null array



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4111) [Python] Create time types from Python sequences of integers

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4111:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Create time types from Python sequences of integers
> 
>
> Key: ARROW-4111
> URL: https://issues.apache.org/jira/browse/ARROW-4111
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> This works for dates, but not times:
> {code}
> > traceback 
> > >
> def test_to_pandas_deduplicate_date_time():
> nunique = 100
> repeats = 10
> 
> unique_values = list(range(nunique))
> 
> cases = [
> # array type, to_pandas options
> ('date32', {'date_as_object': True}),
> ('date64', {'date_as_object': True}),
> ('time32[ms]', {}),
> ('time64[us]', {})
> ]
> 
> for array_type, pandas_options in cases:
> >   arr = pa.array(unique_values * repeats, type=array_type)
> pyarrow/tests/test_convert_pandas.py:2392: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> pyarrow/array.pxi:175: in pyarrow.lib.array
> return _sequence_to_array(obj, mask, size, type, pool, from_pandas)
> pyarrow/array.pxi:36: in pyarrow.lib._sequence_to_array
> check_status(ConvertPySequence(sequence, mask, options, ))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> >   raise ArrowInvalid(message)
> E   pyarrow.lib.ArrowInvalid: ../src/arrow/python/python_to_arrow.cc:1012 : 
> ../src/arrow/python/iterators.h:70 : Could not convert 0 with type int: 
> converting to time32
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3973) [Gandiva][Java] Move the benchmark tests out of unit test scope.

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779656#comment-16779656
 ] 

Wes McKinney commented on ARROW-3973:
-

Where do things stand on this?

> [Gandiva][Java] Move the benchmark tests out of unit test scope.
> 
>
> Key: ARROW-3973
> URL: https://issues.apache.org/jira/browse/ARROW-3973
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3947) [Python] query distinct values of a given partition from a ParquetDataset

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3947:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] query distinct values of a given partition from a ParquetDataset
> -
>
> Key: ARROW-3947
> URL: https://issues.apache.org/jira/browse/ARROW-3947
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Affects Versions: 0.10.0
> Environment: MacOSX, Python 3.6, 
>Reporter: Ji Xu
>Priority: Minor
> Fix For: 0.14.0
>
>
> Right now the values of a given partition from a `ParquetDataset` is buried 
> inside `ParquetDataset.pieces`, a bit inconvenient for the user to dig out 
> this information. A helper function/method to perform this task in 
> `ParquetDataset` class would be very helpful for the users.
> A pure personal opinion on the name of this method: 
> `ParquetDataset.select_distinct()` with partition_name as the positional arg, 
> to resemble SQL `SELECT DISTINCT column FROM table`.
> I'm not sure how to contribute here on Jira, so I created this [GitHub Gist 
> |https://gist.github.com/xujiboy/c3fcc47f720ed9adf2260c5d0ba8aed2]as an 
> possible solution for this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3770) [C++] Validate or add option to validate arrow::Table schema in parquet::arrow::FileWriter::WriteTable

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779651#comment-16779651
 ] 

Wes McKinney commented on ARROW-3770:
-

This should have the same fix applied as 
https://github.com/apache/arrow/commit/4a084b79f9ab5c1f73658a5e5ff3581f5b875c42

> [C++] Validate or add option to validate arrow::Table schema in 
> parquet::arrow::FileWriter::WriteTable
> --
>
> Key: ARROW-3770
> URL: https://issues.apache.org/jira/browse/ARROW-3770
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> Failing to validate will cause a segfault when the passed table does not 
> match the schema used to instantiate the writer. See ARROW-2926 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3496) [Java] Add microbenchmark code to Java

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3496:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Java] Add microbenchmark code to Java
> --
>
> Key: ARROW-3496
> URL: https://issues.apache.org/jira/browse/ARROW-3496
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Assignee: Animesh Trivedi
>Priority: Major
> Fix For: 0.14.0
>
>
> [~atrivedi] has done some microbenchmarking with the Java API. Let's consider 
> adding them to the codebase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3446) [R] integer type promotion

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779643#comment-16779643
 ] 

Wes McKinney commented on ARROW-3446:
-

Can we close the loop on this, possibly documenting the issue?

> [R] integer type promotion
> --
>
> Key: ARROW-3446
> URL: https://issues.apache.org/jira/browse/ARROW-3446
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Romain François
>Priority: Major
> Fix For: 0.13.0
>
>
> uint8, int16, uint16 -> int32
> uint32 -> numeric
> int64, uint64 -> ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4697) [C++] Add URI parsing facility

2019-02-27 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-4697:
-

Assignee: Antoine Pitrou

> [C++] Add URI parsing facility
> --
>
> Key: ARROW-4697
> URL: https://issues.apache.org/jira/browse/ARROW-4697
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>
> This is a prerequisite for ARROW-4651.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4697) [C++] Add URI parsing facility

2019-02-27 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-4697:
-

 Summary: [C++] Add URI parsing facility
 Key: ARROW-4697
 URL: https://issues.apache.org/jira/browse/ARROW-4697
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


This is a prerequisite for ARROW-4651.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4657) [Release] gbenchmark should not be needed for verification

2019-02-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4657:
--
Labels: pull-request-available  (was: )

> [Release] gbenchmark should not be needed for verification
> --
>
> Key: ARROW-4657
> URL: https://issues.apache.org/jira/browse/ARROW-4657
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Packaging
>Affects Versions: 0.12.0, 0.12.1
>Reporter: Uwe L. Korn
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> {{gbenchmark}} is built during verification and thus we require a minimal 
> version of CMake 3.6. I would have guessed that we should not require it as 
> we do not need to build the benchmarks during the verification. I guess that 
> a recent fix from [~wesmckinn] may have fixed this, but we should verify this 
> before doing the next release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4672) [C++] clang-7 matrix entry is build using gcc

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4672.
-
Resolution: Fixed

Issue resolved by pull request 3769
[https://github.com/apache/arrow/pull/3769]

> [C++] clang-7 matrix entry is build using gcc
> -
>
> Key: ARROW-4672
> URL: https://issues.apache.org/jira/browse/ARROW-4672
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Uwe L. Korn
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available, travis-ci
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Travis sets the following environment variables:
> {code}
> $ export CC="clang-7"
> $ export CXX="clang++-7"
> $ export TRAVIS_COMPILER=g++
> $ export CXX=g++
> $ export CXX_FOR_BUILD=g++
> $ export CC=gcc
> $ export CC_FOR_BUILD=gcc
> $ export PATH=/usr/lib/ccache:$PATH
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3408) [C++] Add option to CSV reader to dictionary encode individual columns or all string / binary columns

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3408:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Add option to CSV reader to dictionary encode individual columns or all 
> string / binary columns
> -
>
> Key: ARROW-3408
> URL: https://issues.apache.org/jira/browse/ARROW-3408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> For many datasets, dictionary encoding everything can result in drastically 
> lower memory usage and subsequently better performance in doing analytics
> One difficulty of dictionary encoding in multithreaded conversions is that 
> ideally you end up with one dictionary at the end. So you have two options:
> * Implement a concurrent hashing scheme -- for low cardinality dictionaries, 
> the overhead associated with mutex contention will not be meaningful, for 
> high cardinality it can be more of a problem
> * Hash each chunk separately, then normalize at the end
> My guess is that a crude concurrent hash table with a mutex to protect 
> mutations and resizes is going to outperform the latter



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3361) [R] Run cpp/build-support/cpplint.py on C++ source files

2019-02-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3361:
--
Labels: pull-request-available  (was: )

> [R] Run cpp/build-support/cpplint.py on C++ source files
> 
>
> Key: ARROW-3361
> URL: https://issues.apache.org/jira/browse/ARROW-3361
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> This will help with additional code cleanliness



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3361) [R] Run cpp/build-support/cpplint.py on C++ source files

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3361:
---

Assignee: Wes McKinney

> [R] Run cpp/build-support/cpplint.py on C++ source files
> 
>
> Key: ARROW-3361
> URL: https://issues.apache.org/jira/browse/ARROW-3361
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This will help with additional code cleanliness



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4696) Verify release script is over optimist with CUDA detection

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779617#comment-16779617
 ] 

Wes McKinney commented on ARROW-4696:
-

I think it would be better to be explicit about running with CUDA enabled

> Verify release script is over optimist with CUDA detection
> --
>
> Key: ARROW-4696
> URL: https://issues.apache.org/jira/browse/ARROW-4696
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Francois Saint-Jacques
>Priority: Minor
>
> I have a Nvidia GPU without cuda, everytime I run the verification scripts it 
> borks in the middle because ARROW_HAVE_CUDA is evaluated to yes because 
> `nvidia-smi --list-gpus` returns true. This can be a long process if I forget 
> about it.
> Would it be better to check for `CUDA_HOME`?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4661) [C++] Consolidate random string generators for use in benchmarks and unittests

2019-02-27 Thread Hatem Helal (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hatem Helal updated ARROW-4661:
---
Description: 
This was discussed in here:

[https://github.com/apache/arrow/pull/3721]

For testing/benchmarking dictionary encoding its useful to control the number 
of repeated values and it would also be good to optionally include null values. 
 The ability to provide a custom alphabet would be handy for generating strings 
with unicode characters.

 

Also note that a simple PRNG should be used as the group has observed 
performance trouble with Mersenne Twister.

  was:
This was discussed in here:

[https://github.com/apache/arrow/pull/3721]

For testing/benchmarking dictionary encoding its useful to control the number 
of repeated values and it would also be good to optionally include null values. 
 The ability to provide a custom alphabet would be handy for generating strings 
with unicode characters.


> [C++] Consolidate random string generators for use in benchmarks and unittests
> --
>
> Key: ARROW-4661
> URL: https://issues.apache.org/jira/browse/ARROW-4661
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Hatem Helal
>Assignee: Hatem Helal
>Priority: Minor
> Fix For: 0.14.0
>
>
> This was discussed in here:
> [https://github.com/apache/arrow/pull/3721]
> For testing/benchmarking dictionary encoding its useful to control the number 
> of repeated values and it would also be good to optionally include null 
> values.  The ability to provide a custom alphabet would be handy for 
> generating strings with unicode characters.
>  
> Also note that a simple PRNG should be used as the group has observed 
> performance trouble with Mersenne Twister.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2237) [Python] [Plasma] Huge pages test failure

2019-02-27 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779615#comment-16779615
 ] 

Antoine Pitrou commented on ARROW-2237:
---

I don't know, it's marked xfail :)

> [Python] [Plasma] Huge pages test failure
> -
>
> Key: ARROW-2237
> URL: https://issues.apache.org/jira/browse/ARROW-2237
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.13.0
>
>
> This is a new failure here (Ubuntu 16.04, x86-64):
> {code}
> _ test_use_huge_pages 
> _
> Traceback (most recent call last):
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 779, 
> in test_use_huge_pages
> create_object(plasma_client, 1)
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 80, in 
> create_object
> seal=seal)
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 69, in 
> create_object_with_id
> memory_buffer = client.create(object_id, data_size, metadata)
>   File "plasma.pyx", line 302, in pyarrow.plasma.PlasmaClient.create
>   File "error.pxi", line 79, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: /home/antoine/arrow/cpp/src/plasma/client.cc:192 
> code: PlasmaReceive(store_conn_, MessageType_PlasmaCreateReply, )
> /home/antoine/arrow/cpp/src/plasma/protocol.cc:46 code: ReadMessage(sock, 
> , buffer)
> Encountered unexpected EOF
>  Captured stderr call 
> -
> Allowing the Plasma store to use up to 0.1GB of memory.
> Starting object store with directory /mnt/hugepages and huge page support 
> enabled
> create_buffer failed to open file /mnt/hugepages/plasmapSNc0X
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4696) Verify release script is over optimist with CUDA detection

2019-02-27 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4696:
-

 Summary: Verify release script is over optimist with CUDA detection
 Key: ARROW-4696
 URL: https://issues.apache.org/jira/browse/ARROW-4696
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Francois Saint-Jacques


I have a Nvidia GPU without cuda, everytime I run the verification scripts it 
borks in the middle because ARROW_HAVE_CUDA is evaluated to yes because 
`nvidia-smi --list-gpus` returns true. This can be a long process if I forget 
about it.

Would it be better to check for `CUDA_HOME`?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2392) [Python] pyarrow RecordBatchStreamWriter allows writing batches with different schemas

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-2392.
-
Resolution: Fixed

Issue resolved by pull request 3762
[https://github.com/apache/arrow/pull/3762]

> [Python] pyarrow RecordBatchStreamWriter allows writing batches with 
> different schemas
> --
>
> Key: ARROW-2392
> URL: https://issues.apache.org/jira/browse/ARROW-2392
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Ernesto Ocampo
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> A RecordBatchStreamWriter initialised with a given schema will still allow 
> writing RecordBatches that have different schemas. Example:
>  
> {code:java}
> schema = pa.schema([pa.field('some_field', pa.int64())])
> stream = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(stream, schema)
> data = [pa.array([1.234])]
> batch = pa.RecordBatch.from_arrays(data, ['some_field'])  
> # batch does not conform to schema
> assert batch.schema != schema
> writer.write_batch(batch)  # no exception raised
> writer.close()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2237) [Python] [Plasma] Huge pages test failure

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779606#comment-16779606
 ] 

Wes McKinney commented on ARROW-2237:
-

Is this still occurring ever?

> [Python] [Plasma] Huge pages test failure
> -
>
> Key: ARROW-2237
> URL: https://issues.apache.org/jira/browse/ARROW-2237
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.13.0
>
>
> This is a new failure here (Ubuntu 16.04, x86-64):
> {code}
> _ test_use_huge_pages 
> _
> Traceback (most recent call last):
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 779, 
> in test_use_huge_pages
> create_object(plasma_client, 1)
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 80, in 
> create_object
> seal=seal)
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 69, in 
> create_object_with_id
> memory_buffer = client.create(object_id, data_size, metadata)
>   File "plasma.pyx", line 302, in pyarrow.plasma.PlasmaClient.create
>   File "error.pxi", line 79, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: /home/antoine/arrow/cpp/src/plasma/client.cc:192 
> code: PlasmaReceive(store_conn_, MessageType_PlasmaCreateReply, )
> /home/antoine/arrow/cpp/src/plasma/protocol.cc:46 code: ReadMessage(sock, 
> , buffer)
> Encountered unexpected EOF
>  Captured stderr call 
> -
> Allowing the Plasma store to use up to 0.1GB of memory.
> Starting object store with directory /mnt/hugepages and huge page support 
> enabled
> create_buffer failed to open file /mnt/hugepages/plasmapSNc0X
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1894) [Python] Treat CPython memoryview or buffer objects equivalently to pyarrow.Buffer in pyarrow.serialize

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1894:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Treat CPython memoryview or buffer objects equivalently to 
> pyarrow.Buffer in pyarrow.serialize
> ---
>
> Key: ARROW-1894
> URL: https://issues.apache.org/jira/browse/ARROW-1894
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner
> Fix For: 0.14.0
>
>
> These should be treated as Buffer-like on serialize. We should consider how 
> to "box" the buffers as the appropriate kind of object (Buffer, memoryview, 
> etc.) when being deserialized



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4644) [C++/Docker] Build Gandiva in the docker containers

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4644:

Fix Version/s: 0.13.0

> [C++/Docker] Build Gandiva in the docker containers
> ---
>
> Key: ARROW-4644
> URL: https://issues.apache.org/jira/browse/ARROW-4644
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: docker
> Fix For: 0.13.0
>
>
> Install LLVM dependency and enable it:
> https://github.com/apache/arrow/pull/3484/files#diff-1f2ebc25efb8f1e6646cbd31ce2f34f4R51



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4542) [C++] Denominate row group size in bytes (not in no of rows)

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4542:

Fix Version/s: 0.14.0

> [C++] Denominate row group size in bytes (not in no of rows)
> 
>
> Key: ARROW-4542
> URL: https://issues.apache.org/jira/browse/ARROW-4542
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Remek Zajac
>Priority: Major
> Fix For: 0.14.0
>
>
> Both the C++ [implementation of parquet writer for 
> arrow|https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/writer.cc#L1174]
>  and the [Python code bound to 
> it|https://github.com/apache/arrow/blob/master/python/pyarrow/_parquet.pyx#L911]
>  appears denominated in the *number of rows* (without making it very 
> explicit). Whereas:
> (1) [The Apache parquet 
> documentation|https://parquet.apache.org/documentation/latest/] states: 
> "_Row group size: Larger row groups allow for larger column chunks which 
> makes it possible to do larger sequential IO. Larger groups also require more 
> buffering in the write path (or a two pass write). *We recommend large row 
> groups (512MB - 1GB)*. Since an entire row group might need to be read, we 
> want it to completely fit on one HDFS block. Therefore, HDFS block sizes 
> should also be set to be larger. An optimized read setup would be: 1GB row 
> groups, 1GB HDFS block size, 1 HDFS block per HDFS file._"
> (2) Reference Apache [parquet-mr 
> implementation|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordWriter.java#L146]
>  for Java accepts the row size expressed in bytes.
> (3) The [low-level parquet read-write 
> example|https://github.com/apache/arrow/blob/master/cpp/examples/parquet/low-level-api/reader-writer2.cc#L88]
>  also considers row group be denominated in bytes.
> These insights make me conclude that:
>  * Per parquet design and to take advantage of HDFS block level operations, 
> it only makes sense to work with row group sizes as expressed in bytes - as 
> that is the only consequential desire the caller can utter and want to 
> influence.
>  * Arrow implementation of ParquetWriter would benefit from re-nominating its 
> `row_group_size` into bytes. I will also note it is impossible to use pyarrow 
> to shape equally byte-sized row groups as the size the row group takes is 
> post-compression and the caller only know how much uncompressed data they 
> have managed to put in.
> Now, my conclusions can be wrong and I may be blind to some alley of 
> reasoning, so this ticket is more of a question than a bug. A question on 
> whether the audience here agrees with my reasoning and if not - to explain 
> what detail I have missed.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1833) [Java] Add accessor methods for data buffers that skip null checking

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1833:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Java] Add accessor methods for data buffers that skip null checking
> 
>
> Key: ARROW-1833
> URL: https://issues.apache.org/jira/browse/ARROW-1833
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4568) [C++] Add version macros to headers

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4568:

Fix Version/s: 0.14.0

> [C++] Add version macros to headers
> ---
>
> Key: ARROW-4568
> URL: https://issues.apache.org/jira/browse/ARROW-4568
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Lawrence Chan
>Priority: Minor
> Fix For: 0.14.0
>
>
> It would be useful to have compile-time macros in the headers specifying the 
> major/minor/patch versions, so that users can more easily maintain code that 
> can be built with a range of arrow versions.
> Other nice-to-haves:
> - Maybe a "combiner" func that basically spits out the value as an easy to 
> compare integer e.g. 12000 for 0.12.0 or something.
> - Git hash



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1833) [Java] Add accessor methods for data buffers that skip null checking

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779567#comment-16779567
 ] 

Wes McKinney commented on ARROW-1833:
-

cc [~atrivedi]

> [Java] Add accessor methods for data buffers that skip null checking
> 
>
> Key: ARROW-1833
> URL: https://issues.apache.org/jira/browse/ARROW-1833
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2217) [C++] Add option to use dynamic linking for compression library dependencies

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779570#comment-16779570
 ] 

Wes McKinney commented on ARROW-2217:
-

[~xhochy] are you addressing this in your CMake refactor?

> [C++] Add option to use dynamic linking for compression library dependencies
> 
>
> Key: ARROW-2217
> URL: https://issues.apache.org/jira/browse/ARROW-2217
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> See discussion in https://github.com/apache/arrow/issues/1661



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4687) [Python] FlightServerBase.run should exit on Ctrl-C

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4687:

Fix Version/s: 0.13.0

> [Python] FlightServerBase.run should exit on Ctrl-C
> ---
>
> Key: ARROW-4687
> URL: https://issues.apache.org/jira/browse/ARROW-4687
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: FlightRPC, Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, the Python Flight server does run react at all to Ctrl-C (aka 
> SIGINT). It should probably return from the `run()` method after having 
> executed Python signal handlers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2256) [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2256:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos
> 
>
> Key: ARROW-2256
> URL: https://issues.apache.org/jira/browse/ARROW-2256
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> I did a clean upgrade to 16.04 on one of my machine and ran into the problem 
> described here:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866087
> I think this can be resolved temporarily by symlinking the static library, 
> but we should document the problem so other devs know what to do when it 
> happens



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2119) [C++][Java] Handle Arrow stream with zero record batch

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779569#comment-16779569
 ] 

Wes McKinney commented on ARROW-2119:
-

This can be resolved by adding a zero-record-batch stream to the integration 
tests

> [C++][Java] Handle Arrow stream with zero record batch
> --
>
> Key: ARROW-2119
> URL: https://issues.apache.org/jira/browse/ARROW-2119
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Java
>Reporter: Jingyuan Wang
>Priority: Major
> Fix For: 0.13.0
>
>
> It looks like currently many places of the code assume that there needs to be 
> at least one record batch for streaming format. Is zero-recordbatch not 
> supported by design?
> e.g. 
> [https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45]
> {code:none}
>   public static void convert(InputStream in, OutputStream out) throws 
> IOException {
> BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
> try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) {
>   VectorSchemaRoot root = reader.getVectorSchemaRoot();
>   // load the first batch before instantiating the writer so that we have 
> any dictionaries
>   if (!reader.loadNextBatch()) {
> throw new IOException("Unable to read first record batch");
>   }
>   ...
> {code}
> Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an 
> exception originated from 
> [https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:]
> {code:none}
> Status Table::FromRecordBatches(const 
> std::vector>& batches,
> std::shared_ptr* table) {
>   if (batches.size() == 0) {
> return Status::Invalid("Must pass at least one record batch");
>   }
>   ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2366) [Python] Support reading Parquet files having a permutation of column order

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2366:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Support reading Parquet files having a permutation of column order
> ---
>
> Key: ARROW-2366
> URL: https://issues.apache.org/jira/browse/ARROW-2366
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: parquet
> Fix For: 0.14.0
>
>
> See discussion in https://github.com/dask/fastparquet/issues/320



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2103) [C++] Implement take kernel functions - string/binary value type

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2103:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Implement take kernel functions - string/binary value type
> 
>
> Key: ARROW-2103
> URL: https://issues.apache.org/jira/browse/ARROW-2103
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Jingyuan Wang
>Priority: Major
> Fix For: 0.14.0
>
>
> Should support string/binary value types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2102) [C++] Implement take kernel functions - primitive value type

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2102:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Implement take kernel functions - primitive value type
> 
>
> Key: ARROW-2102
> URL: https://issues.apache.org/jira/browse/ARROW-2102
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jingyuan Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Should implement the basic functionality of take kernel and support primitive 
> value types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1956) [Python] Support reading specific partitions from a partitioned parquet dataset

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1956:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Support reading specific partitions from a partitioned parquet 
> dataset
> ---
>
> Key: ARROW-1956
> URL: https://issues.apache.org/jira/browse/ARROW-1956
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Kernel: 4.14.8-300.fc27.x86_64
> Python: 3.6.3
>Reporter: Suvayu Ali
>Priority: Minor
>  Labels: parquet
> Fix For: 0.14.0
>
> Attachments: so-example.py
>
>
> I want to read specific partitions from a partitioned parquet dataset.  This 
> is very useful in case of large datasets.  I have attached a small script 
> that creates a dataset and shows what is expected when reading (quoting 
> salient points below).
> # There is no way to read specific partitions in Pandas
> # In pyarrow I tried to achieve the goal by providing a list of 
> files/directories to ParquetDataset, but it didn't work: 
> # In PySpark it works if I simply do:
> {code:none}
> spark.read.options('basePath', 'datadir').parquet(*list_of_partitions)
> {code}
> I also couldn't find a way to easily write partitioned parquet files.  In the 
> end I did it by hand by creating the directory hierarchies, and writing the 
> individual files myself (similar to the implementation in the attached 
> script).  Again, in PySpark I can do 
> {code:none}
> df.write.partitionBy(*list_of_partitions).parquet(output)
> {code}
> to achieve that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1621) [JAVA] Reduce Heap Usage per Vector

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1621:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [JAVA] Reduce Heap Usage per Vector
> ---
>
> Key: ARROW-1621
> URL: https://issues.apache.org/jira/browse/ARROW-1621
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
>Priority: Major
> Fix For: 0.14.0
>
>
> https://docs.google.com/document/d/1MU-ah_bBHIxXNrd7SkwewGCOOexkXJ7cgKaCis5f-PI/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-772) [C++] Implement take kernel functions

2019-02-27 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-772:
---
Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Implement take kernel functions
> -
>
> Key: ARROW-772
> URL: https://issues.apache.org/jira/browse/ARROW-772
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics, columnar-format-1.0
> Fix For: 0.14.0
>
>
> Among other things, this can be used to convert from DictionaryArray back to 
> dense array. This is equivalent to {{ndarray.take}} in NumPy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >