[jira] [Updated] (ARROW-3908) [Rust] Update rust dockerfile to use nightly toolchain

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3908:
--
Labels: pull-request-available  (was: )

> [Rust] Update rust dockerfile to use nightly toolchain
> --
>
> Key: ARROW-3908
> URL: https://issues.apache.org/jira/browse/ARROW-3908
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Krisztian Szucs
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3908) [Rust] Update rust dockerfile to use nightly toolchain

2018-12-04 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned ARROW-3908:
---

Assignee: Chao Sun

> [Rust] Update rust dockerfile to use nightly toolchain
> --
>
> Key: ARROW-3908
> URL: https://issues.apache.org/jira/browse/ARROW-3908
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Krisztian Szucs
>Assignee: Chao Sun
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3939) [Rust] Remove macro definition for ListArrayBuilder

2018-12-04 Thread Chao Sun (JIRA)
Chao Sun created ARROW-3939:
---

 Summary: [Rust] Remove macro definition for ListArrayBuilder
 Key: ARROW-3939
 URL: https://issues.apache.org/jira/browse/ARROW-3939
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Chao Sun
Assignee: Chao Sun


Currently `ListArrayBuilder` is done using macro and only implemented for a few 
value builder types. We should lift this restriction and allow creation of list 
builders with arbitrary value builder types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3939) [Rust] Remove macro definition for ListArrayBuilder

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3939:
--
Labels: pull-request-available  (was: )

> [Rust] Remove macro definition for ListArrayBuilder
> ---
>
> Key: ARROW-3939
> URL: https://issues.apache.org/jira/browse/ARROW-3939
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Affects Versions: 0.11.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>
> Currently `ListArrayBuilder` is done using macro and only implemented for a 
> few value builder types. We should lift this restriction and allow creation 
> of list builders with arbitrary value builder types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3939) [Rust] Remove macro definition for ListArrayBuilder

2018-12-04 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated ARROW-3939:

Affects Version/s: 0.11.0

> [Rust] Remove macro definition for ListArrayBuilder
> ---
>
> Key: ARROW-3939
> URL: https://issues.apache.org/jira/browse/ARROW-3939
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Affects Versions: 0.11.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> Currently `ListArrayBuilder` is done using macro and only implemented for a 
> few value builder types. We should lift this restriction and allow creation 
> of list builders with arbitrary value builder types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3938) [Packaging] Stop to refer java/pom.xml to get version information

2018-12-04 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709648#comment-16709648
 ] 

Kouhei Sutou commented on ARROW-3938:
-

Comment from [~wesmckinn] 
https://github.com/apache/arrow/pull/3096#issuecomment-444346414

One complexity is that we need to be able to produce development version 
numbers using setuptools_scm when building the Python wheels.

> [Packaging] Stop to refer java/pom.xml to get version information
> -
>
> Key: ARROW-3938
> URL: https://issues.apache.org/jira/browse/ARROW-3938
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 0.12.0
>
>
> https://github.com/apache/arrow/pull/3096#issuecomment-444345068
> I want to stop the current version sharing style. (Referring {{java/pom.xml}} 
> from C++, Python, C, Ruby, )
> It introduces complexity. For example, we generates {{version.rb}} 
> dynamically to create a Ruby package: 
> https://github.com/apache/arrow/blob/master/ruby/red-arrow/version.rb
> I think that we can just replace all versions in {{cpp/CMakeLists.txt}}, 
> {{python/setup.py}}, {{c_glib/configure.ac}}, {{ruby/*/lib/*/version.rb}}, 
> {{rust/Cargo.toml}}, ... by {{sed}} in the release process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3938) [Packaging] Stop to refer java/pom.xml to get version information

2018-12-04 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-3938:
---

 Summary: [Packaging] Stop to refer java/pom.xml to get version 
information
 Key: ARROW-3938
 URL: https://issues.apache.org/jira/browse/ARROW-3938
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.12.0


https://github.com/apache/arrow/pull/3096#issuecomment-444345068

I want to stop the current version sharing style. (Referring {{java/pom.xml}} 
from C++, Python, C, Ruby, )
It introduces complexity. For example, we generates {{version.rb}} dynamically 
to create a Ruby package: 
https://github.com/apache/arrow/blob/master/ruby/red-arrow/version.rb

I think that we can just replace all versions in {{cpp/CMakeLists.txt}}, 
{{python/setup.py}}, {{c_glib/configure.ac}}, {{ruby/*/lib/*/version.rb}}, 
{{rust/Cargo.toml}}, ... by {{sed}} in the release process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3883) [Rust] Update Rust README to reflect new functionality

2018-12-04 Thread Andy Grove (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-3883:
--
Component/s: Rust

> [Rust] Update Rust README to reflect new functionality
> --
>
> Key: ARROW-3883
> URL: https://issues.apache.org/jira/browse/ARROW-3883
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.11.1
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 0.12.0
>
>
> The Rust README is now very outdated and needs updating before we release 
> 0.12.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3883) [Rust] Update Rust README to reflect new functionality

2018-12-04 Thread Andy Grove (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-3883:
-

Assignee: Andy Grove

> [Rust] Update Rust README to reflect new functionality
> --
>
> Key: ARROW-3883
> URL: https://issues.apache.org/jira/browse/ARROW-3883
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.11.1
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 0.12.0
>
>
> The Rust README is now very outdated and needs updating before we release 
> 0.12.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3937) [Rust] Rust nightly build is failing

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3937:
--
Labels: pull-request-available  (was: )

> [Rust] Rust nightly build is failing
> 
>
> Key: ARROW-3937
> URL: https://issues.apache.org/jira/browse/ARROW-3937
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Wes McKinney
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> See recent CI failures such as 
> https://travis-ci.org/apache/arrow/jobs/463656608#L650



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3937) [Rust] Rust nightly build is failing

2018-12-04 Thread Andy Grove (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-3937:
-

Assignee: Andy Grove

> [Rust] Rust nightly build is failing
> 
>
> Key: ARROW-3937
> URL: https://issues.apache.org/jira/browse/ARROW-3937
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Wes McKinney
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.12.0
>
>
> See recent CI failures such as 
> https://travis-ci.org/apache/arrow/jobs/463656608#L650



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3937) [Rust] Rust nightly build is failing

2018-12-04 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3937:
---

 Summary: [Rust] Rust nightly build is failing
 Key: ARROW-3937
 URL: https://issues.apache.org/jira/browse/ARROW-3937
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Wes McKinney
 Fix For: 0.12.0


See recent CI failures such as 
https://travis-ci.org/apache/arrow/jobs/463656608#L650



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3543) [R] Time zone adjustment issue when reading Feather file written by Python

2018-12-04 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709587#comment-16709587
 ] 

Wes McKinney commented on ARROW-3543:
-

[~romainfrancois] can you have a look and see if this bug is still present?

> [R] Time zone adjustment issue when reading Feather file written by Python
> --
>
> Key: ARROW-3543
> URL: https://issues.apache.org/jira/browse/ARROW-3543
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Olaf
>Priority: Critical
> Fix For: 0.12.0
>
>
> Hello the dream team,
> Pasting from [https://github.com/wesm/feather/issues/351]
> Thanks for this wonderful package. I was playing with feather and some 
> timestamps and I noticed some dangerous behavior. Maybe it is a bug.
> Consider this
>  
> {code:java}
> import pandas as pd
> import feather
> import numpy as np
> df = pd.DataFrame(
> {'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'), 
> pd.to_datetime('2018-02-01 14:01:00.456'), pd.to_datetime('2018-03-05 
> 14:01:02.200')]}
> )
> df['timestamp_est'] = 
> pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
> df
>  Out[17]: 
>  string_time_utc timestamp_est
>  0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
>  1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
>  2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
> {code}
> Here I create the corresponding `EST` timestamp of my original timestamps (in 
> `UTC` time).
> Now saving the dataframe to `csv` or to `feather` will generate two 
> completely different results.
>  
> {code:java}
> df.to_csv('P://testing.csv')
> df.to_feather('P://testing.feather')
> {code}
> Switching to R.
> Using the good old `csv` gives me something a bit annoying, but expected. R 
> thinks my timezone is `UTC` by default, and wrongly attached this timezone to 
> `timestamp_est`. No big deal, I can always use `with_tz` or even better: 
> import as character and process as timestamp while in R.
>  
> {code:java}
> > dataframe <- read_csv('P://testing.csv')
>  Parsed with column specification:
>  cols(
>  X1 = col_integer(),
>  string_time_utc = col_datetime(format = ""),
>  timestamp_est = col_datetime(format = "")
>  )
>  Warning message:
>  Missing column names filled in: 'X1' [1] 
>  > 
>  > dataframe %>% mutate(mytimezone = tz(timestamp_est))
> A tibble: 3 x 4
>  X1 string_time_utc timestamp_est 
> 
>  1 0 2018-02-01 14:00:00.530 2018-02-01 09:00:00.530
>  2 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
>  3 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
>  mytimezone
>   
>  1 UTC 
>  2 UTC 
>  3 UTC  {code}
> {code:java}
> #Now look at what happens with feather:
>  
>  > dataframe <- read_feather('P://testing.feather')
>  > 
>  > dataframe %>% mutate(mytimezone = tz(timestamp_est))
> A tibble: 3 x 3
>  string_time_utc timestamp_est mytimezone
> 
>  1 2018-02-01 09:00:00.531 2018-02-01 04:00:00.531 "" 
>  2 2018-02-01 09:01:00.456 2018-02-01 04:01:00.456 "" 
>  3 2018-03-05 09:01:02.200 2018-03-05 04:01:02.200 "" {code}
> My timestamps have been converted!!! pure insanity. 
>  Am I missing something here?
> Thanks!!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3318) [C++] Convenience method for reading all batches from an IPC stream or file as arrow::Table

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3318:
--
Labels: pull-request-available  (was: )

> [C++] Convenience method for reading all batches from an IPC stream or file 
> as arrow::Table
> ---
>
> Key: ARROW-3318
> URL: https://issues.apache.org/jira/browse/ARROW-3318
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> This is being implemented more than once in binding layers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1918) [JS] Integration portion of verify-release-candidate.sh fails

2018-12-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-1918:
-
Fix Version/s: (was: JS-0.4.0)
   JS-0.5.0

> [JS] Integration portion of verify-release-candidate.sh fails
> -
>
> Key: ARROW-1918
> URL: https://issues.apache.org/jira/browse/ARROW-1918
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> I'm going to temporarily disable this in my fixes in ARROW-1917



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1918) [JS] Integration portion of verify-release-candidate.sh fails

2018-12-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned ARROW-1918:


Assignee: Brian Hulette

> [JS] Integration portion of verify-release-candidate.sh fails
> -
>
> Key: ARROW-1918
> URL: https://issues.apache.org/jira/browse/ARROW-1918
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> I'm going to temporarily disable this in my fixes in ARROW-1917



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3885) [Rust] Update version to 0.12.0 and update release instructions on wiki

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3885:
--
Labels: pull-request-available  (was: )

> [Rust] Update version to 0.12.0 and update release instructions on wiki
> ---
>
> Key: ARROW-3885
> URL: https://issues.apache.org/jira/browse/ARROW-3885
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.11.1
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> The Rust version of Arrow still has version 0.10.0 in the Cargo.toml ... we 
> need to bump this to 0.12.0 (or 0.12.0-alpha maybe) and update the 
> instructions for releasing Arrow so that this version gets updated when 
> performing a release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2993) [JS] Document minimum supported NodeJS version

2018-12-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-2993.
--
Resolution: Fixed

Issue resolved by pull request 3087
[https://github.com/apache/arrow/pull/3087]

> [JS] Document minimum supported NodeJS version
> --
>
> Key: ARROW-2993
> URL: https://issues.apache.org/jira/browse/ARROW-2993
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The integration tests fail with NodeJS 8.11.3 LTS, but pass with 10.1 and 
> higher. It would be useful to document the minimum supported NodeJS version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3894) [Python] Error reading IPC file with no record batches

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3894:
--
Labels: pull-request-available  (was: )

> [Python] Error reading IPC file with no record batches
> --
>
> Key: ARROW-3894
> URL: https://issues.apache.org/jira/browse/ARROW-3894
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.11.1
>Reporter: Rik Coenders
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> When using the RecordBatchFileWriter without actually writing a record batch. 
> The magic byte at the beginning of the file is not written. This causes the 
> exception File is smaller than indicated metadata size when reading that file 
> with the RecordBatchFileReader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-2374) [Rust] Add support for array of List

2018-12-04 Thread Andy Grove (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove closed ARROW-2374.
-
Resolution: Duplicate

> [Rust] Add support for array of List
> ---
>
> Key: ARROW-2374
> URL: https://issues.apache.org/jira/browse/ARROW-2374
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add support for List in Array types. Look at Utf8 which wraps List to 
> see how this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2374) [Rust] Add support for array of List

2018-12-04 Thread Andy Grove (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709549#comment-16709549
 ] 

Andy Grove commented on ARROW-2374:
---

I believe this can be closed. I will go ahead and close as duplicate.

> [Rust] Add support for array of List
> ---
>
> Key: ARROW-2374
> URL: https://issues.apache.org/jira/browse/ARROW-2374
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add support for List in Array types. Look at Utf8 which wraps List to 
> see how this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD

2018-12-04 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709548#comment-16709548
 ] 

Wes McKinney commented on ARROW-3933:
-

Either [~xhochy] or I can take a closer look. This code path hasn't been 
hardened too much -- and, of course, our support for nested data is very 
incomplete

> [Python] Segfault reading Parquet files from GNOMAD
> ---
>
> Key: ARROW-3933
> URL: https://issues.apache.org/jira/browse/ARROW-3933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
> Environment: Ubuntu 18.04 or Mac OS X
>Reporter: David Konerding
>Priority: Minor
>  Labels: parquet
> Fix For: 0.12.0
>
>
> I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). 
> Error also occurs out of box on Mac OS X.
> $ sudo snap install --classic google-cloud-sdk
> $ gsutil cp 
> gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>  .
> $ conda install pyarrow
> $ python test.py
> Segmentation fault (core dumped)
> test.py:
> import pyarrow.parquet as pq
> path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
> pq.read_table(path)
> gdb output:
> Thread 3 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdf199700 (LWP 13703)]
> 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, 
> unsigned long*) () from 
> /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11
> I tested fastparquet, it reads the file just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3326) [Python] Expose stream alignment function in pyarrow.NativeFile

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3326:
--
Labels: pull-request-available  (was: )

> [Python] Expose stream alignment function in pyarrow.NativeFile
> ---
>
> Key: ARROW-3326
> URL: https://issues.apache.org/jira/browse/ARROW-3326
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> See also ARROW-3319



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD

2018-12-04 Thread Francois Saint-Jacques (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709537#comment-16709537
 ] 

Francois Saint-Jacques commented on ARROW-3933:
---

Offending line 
[https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/reader.cc#L1538]
 , I don't have enough knowledge on parquet file format to decide if the file 
is corrupted or the assumption in the code is correct. 

> [Python] Segfault reading Parquet files from GNOMAD
> ---
>
> Key: ARROW-3933
> URL: https://issues.apache.org/jira/browse/ARROW-3933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
> Environment: Ubuntu 18.04 or Mac OS X
>Reporter: David Konerding
>Priority: Minor
>  Labels: parquet
> Fix For: 0.12.0
>
>
> I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). 
> Error also occurs out of box on Mac OS X.
> $ sudo snap install --classic google-cloud-sdk
> $ gsutil cp 
> gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>  .
> $ conda install pyarrow
> $ python test.py
> Segmentation fault (core dumped)
> test.py:
> import pyarrow.parquet as pq
> path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
> pq.read_table(path)
> gdb output:
> Thread 3 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdf199700 (LWP 13703)]
> 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, 
> unsigned long*) () from 
> /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11
> I tested fastparquet, it reads the file just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3885) [Rust] Update version to 0.12.0 and update release instructions on wiki

2018-12-04 Thread Andy Grove (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-3885:
-

Assignee: Andy Grove

> [Rust] Update version to 0.12.0 and update release instructions on wiki
> ---
>
> Key: ARROW-3885
> URL: https://issues.apache.org/jira/browse/ARROW-3885
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.11.1
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 0.12.0
>
>
> The Rust version of Arrow still has version 0.10.0 in the Cargo.toml ... we 
> need to bump this to 0.12.0 (or 0.12.0-alpha maybe) and update the 
> instructions for releasing Arrow so that this version gets updated when 
> performing a release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3326) [Python] Expose stream alignment function in pyarrow.NativeFile

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3326:
---

Assignee: Wes McKinney

> [Python] Expose stream alignment function in pyarrow.NativeFile
> ---
>
> Key: ARROW-3326
> URL: https://issues.apache.org/jira/browse/ARROW-3326
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> See also ARROW-3319



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2801) [Python] Implement splt_row_groups for ParquetDataset

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2801:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Implement splt_row_groups for ParquetDataset
> -
>
> Key: ARROW-2801
> URL: https://issues.apache.org/jira/browse/ARROW-2801
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Robert Gruener
>Assignee: Robert Gruener
>Priority: Minor
>  Labels: parquet, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the split_row_groups argument in ParquetDataset yields a not 
> implemented error. An easy and efficient way to implement this is by using 
> the summary metadata file instead of opening every footer file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3220) [Python] Add writeat method to writeable NativeFile

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3220:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Python] Add writeat method to writeable NativeFile
> ---
>
> Key: ARROW-3220
> URL: https://issues.apache.org/jira/browse/ARROW-3220
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Pearu Peterson
>Priority: Major
> Fix For: 0.13.0
>
>
> See https://github.com/apache/arrow/pull/2536#discussion_r216384311



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3871) [R] Replace usages of C++ GetValuesSafely with new methods on ArrayData

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3871:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [R] Replace usages of C++ GetValuesSafely with new methods on ArrayData
> ---
>
> Key: ARROW-3871
> URL: https://issues.apache.org/jira/browse/ARROW-3871
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> See https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L173



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3333) [Gandiva] Use non-platform specific integer types for lengths, indexes

2018-12-04 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709507#comment-16709507
 ] 

Wes McKinney commented on ARROW-:
-

This should be reviewed, but not necessary while Gandiva is in an Alpha / Beta 
stage

> [Gandiva] Use non-platform specific integer types for lengths, indexes
> --
>
> Key: ARROW-
> URL: https://issues.apache.org/jira/browse/ARROW-
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Gandiva
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> There are many instances of using {{unsigned int}} and {{int}} for array 
> indexes. This may cause issues on Windows



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3333) [Gandiva] Use non-platform specific integer types for lengths, indexes

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Gandiva] Use non-platform specific integer types for lengths, indexes
> --
>
> Key: ARROW-
> URL: https://issues.apache.org/jira/browse/ARROW-
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Gandiva
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> There are many instances of using {{unsigned int}} and {{int}} for array 
> indexes. This may cause issues on Windows



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3316) [R] Multi-threaded conversion from R data.frame to Arrow table / record batch

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3316:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [R] Multi-threaded conversion from R data.frame to Arrow table / record batch
> -
>
> Key: ARROW-3316
> URL: https://issues.apache.org/jira/browse/ARROW-3316
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> This is the companion issue to ARROW-2968, like {{pyarrow.Table.from_pandas}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3332) [Gandiva] Remove usages of mutable reference out arguments

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3332:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [Gandiva] Remove usages of mutable reference out arguments
> --
>
> Key: ARROW-3332
> URL: https://issues.apache.org/jira/browse/ARROW-3332
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Gandiva
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> I have noticed several usages of mutable reference out arguments, e.g. 
> gandiva/regex_util.h. We should change these to conform to the style guide 
> (out arguments as pointers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3498) [R] Make IPC APIs consistent

2018-12-04 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709503#comment-16709503
 ] 

Wes McKinney commented on ARROW-3498:
-

[~romainfrancois] where do things stand on this after the recent refactoring?

> [R] Make IPC APIs consistent
> 
>
> Key: ARROW-3498
> URL: https://issues.apache.org/jira/browse/ARROW-3498
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> There are many flavors of IPC functions:
> * Read a complete IPC stream (where schema is included in the first 
> message(s))
> * Read an IPC "file"
> * Read a schema only from a point in a buffer
> * Read a record batch given a known schema and the memory address of an 
> encapsulated IPC message
> These are partly available in R now, but with names that aren't necessarily 
> consistent. We should review each use case and normalize the API names



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3315) [R] Support for multi-threaded conversions from RecordBatch, Table to R data.frame

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3315:

Fix Version/s: (was: 0.12.0)
   0.13.0

> [R] Support for multi-threaded conversions from RecordBatch, Table to R 
> data.frame
> --
>
> Key: ARROW-3315
> URL: https://issues.apache.org/jira/browse/ARROW-3315
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> This will be like {{RecordBatch.to_pandas}} with {{use_threads=True}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3291) [C++] Convenience API for constructing arrow::io::BufferReader from std::string

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3291:
--
Labels: pull-request-available  (was: )

> [C++] Convenience API for constructing arrow::io::BufferReader from 
> std::string
> ---
>
> Key: ARROW-3291
> URL: https://issues.apache.org/jira/browse/ARROW-3291
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> See motivating code example:
> https://github.com/apache/arrow/commit/db0ef22dd68ae00e11f09da40b6734c1d9770b57#diff-6dc1b0b53e71627dfb98c60b1fd2d45cR39



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3913) [Gandiva] [GLib] Add GGandivaLiteralNode

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3913:
--
Labels: pull-request-available  (was: )

> [Gandiva] [GLib] Add GGandivaLiteralNode
> 
>
> Key: ARROW-3913
> URL: https://issues.apache.org/jira/browse/ARROW-3913
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Gandiva, GLib
>Reporter: Yosuke Shiro
>Assignee: Yosuke Shiro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3291) [C++] Convenience API for constructing arrow::io::BufferReader from std::string

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3291:
---

Assignee: Wes McKinney

> [C++] Convenience API for constructing arrow::io::BufferReader from 
> std::string
> ---
>
> Key: ARROW-3291
> URL: https://issues.apache.org/jira/browse/ARROW-3291
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> See motivating code example:
> https://github.com/apache/arrow/commit/db0ef22dd68ae00e11f09da40b6734c1d9770b57#diff-6dc1b0b53e71627dfb98c60b1fd2d45cR39



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3629) [Python] Add write_to_dataset to Python Sphinx API listing

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3629.
-
Resolution: Fixed

Issue resolved by pull request 3089
[https://github.com/apache/arrow/pull/3089]

> [Python] Add write_to_dataset to Python Sphinx API listing
> --
>
> Key: ARROW-3629
> URL: https://issues.apache.org/jira/browse/ARROW-3629
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Tanya Schlusser
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3629) [Python] Add write_to_dataset to Python Sphinx API listing

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3629:
---

Assignee: Tanya Schlusser

> [Python] Add write_to_dataset to Python Sphinx API listing
> --
>
> Key: ARROW-3629
> URL: https://issues.apache.org/jira/browse/ARROW-3629
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Tanya Schlusser
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD

2018-12-04 Thread David Konerding (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709364#comment-16709364
 ] 

David Konerding commented on ARROW-3933:


I noticed that the latest release of gnomad no longer uses Parquet (citing many 
problems with the format), so this bug is no longer a priority for me.

 

 

 

> [Python] Segfault reading Parquet files from GNOMAD
> ---
>
> Key: ARROW-3933
> URL: https://issues.apache.org/jira/browse/ARROW-3933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
> Environment: Ubuntu 18.04 or Mac OS X
>Reporter: David Konerding
>Priority: Minor
>  Labels: parquet
> Fix For: 0.12.0
>
>
> I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). 
> Error also occurs out of box on Mac OS X.
> $ sudo snap install --classic google-cloud-sdk
> $ gsutil cp 
> gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>  .
> $ conda install pyarrow
> $ python test.py
> Segmentation fault (core dumped)
> test.py:
> import pyarrow.parquet as pq
> path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
> pq.read_table(path)
> gdb output:
> Thread 3 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdf199700 (LWP 13703)]
> 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, 
> unsigned long*) () from 
> /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11
> I tested fastparquet, it reads the file just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD

2018-12-04 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709349#comment-16709349
 ] 

Wes McKinney commented on ARROW-3933:
-

Try using parquet-tools in the Java library https://github.com/apache/parquet-mr

> Does arrow have a general philosophy about releases and segfaulting

No. A regression would be considered more seriously but we can't hold a release 
hostage if someone from the community cannot fix a bug

> [Python] Segfault reading Parquet files from GNOMAD
> ---
>
> Key: ARROW-3933
> URL: https://issues.apache.org/jira/browse/ARROW-3933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
> Environment: Ubuntu 18.04 or Mac OS X
>Reporter: David Konerding
>Priority: Minor
>  Labels: parquet
> Fix For: 0.12.0
>
>
> I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). 
> Error also occurs out of box on Mac OS X.
> $ sudo snap install --classic google-cloud-sdk
> $ gsutil cp 
> gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>  .
> $ conda install pyarrow
> $ python test.py
> Segmentation fault (core dumped)
> test.py:
> import pyarrow.parquet as pq
> path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
> pq.read_table(path)
> gdb output:
> Thread 3 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdf199700 (LWP 13703)]
> 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, 
> unsigned long*) () from 
> /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11
> I tested fastparquet, it reads the file just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD

2018-12-04 Thread T Poterba (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709343#comment-16709343
 ] 

T Poterba commented on ARROW-3933:
--

I'll note that this file has a pretty huge schema, in case that could be a 
factor. The following is from a different data release, but should be similar 
in dimension:

[https://gist.github.com/tpoterba/7e44fc74d9692c9c4ccdf5693c36d370]

> [Python] Segfault reading Parquet files from GNOMAD
> ---
>
> Key: ARROW-3933
> URL: https://issues.apache.org/jira/browse/ARROW-3933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
> Environment: Ubuntu 18.04 or Mac OS X
>Reporter: David Konerding
>Priority: Minor
>  Labels: parquet
> Fix For: 0.12.0
>
>
> I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). 
> Error also occurs out of box on Mac OS X.
> $ sudo snap install --classic google-cloud-sdk
> $ gsutil cp 
> gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>  .
> $ conda install pyarrow
> $ python test.py
> Segmentation fault (core dumped)
> test.py:
> import pyarrow.parquet as pq
> path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
> pq.read_table(path)
> gdb output:
> Thread 3 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdf199700 (LWP 13703)]
> 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, 
> unsigned long*) () from 
> /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11
> I tested fastparquet, it reads the file just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3441) [Gandiva][C++] Produce fewer test executables

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3441:
--
Labels: pull-request-available  (was: )

> [Gandiva][C++] Produce fewer test executables
> -
>
> Key: ARROW-3441
> URL: https://issues.apache.org/jira/browse/ARROW-3441
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Gandiva
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> In ARROW-3254, I am adding the functionality to create test executables from 
> multiple files that use googletest. So we can continue to have relatively 
> small unit test files, but combine unit tests into groups of 
> semantically-related functionality. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3441) [Gandiva][C++] Produce fewer test executables

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3441:

Fix Version/s: (was: 0.13.0)
   0.12.0

> [Gandiva][C++] Produce fewer test executables
> -
>
> Key: ARROW-3441
> URL: https://issues.apache.org/jira/browse/ARROW-3441
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Gandiva
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> In ARROW-3254, I am adding the functionality to create test executables from 
> multiple files that use googletest. So we can continue to have relatively 
> small unit test files, but combine unit tests into groups of 
> semantically-related functionality. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3441) [Gandiva][C++] Produce fewer test executables

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3441:
---

Assignee: Wes McKinney

> [Gandiva][C++] Produce fewer test executables
> -
>
> Key: ARROW-3441
> URL: https://issues.apache.org/jira/browse/ARROW-3441
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Gandiva
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> In ARROW-3254, I am adding the functionality to create test executables from 
> multiple files that use googletest. So we can continue to have relatively 
> small unit test files, but combine unit tests into groups of 
> semantically-related functionality. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3272) [Java] Document checkstyle deviations from Google style guide

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3272:
--
Labels: pull-request-available  (was: )

> [Java] Document checkstyle deviations from Google style guide
> -
>
> Key: ARROW-3272
> URL: https://issues.apache.org/jira/browse/ARROW-3272
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD

2018-12-04 Thread David Konerding (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709222#comment-16709222
 ] 

David Konerding commented on ARROW-3933:


Is there a file validator for parquet?  Or something in arrow I can use without 
python (say, arrow-cpp simple test program) that will attempt to read the file? 

Does arrow have a general philosophy about releases and segfaulting (IE, I 
would expect that segfaulting on reading a valid parquet file would be a 
release-blocker).

> [Python] Segfault reading Parquet files from GNOMAD
> ---
>
> Key: ARROW-3933
> URL: https://issues.apache.org/jira/browse/ARROW-3933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
> Environment: Ubuntu 18.04 or Mac OS X
>Reporter: David Konerding
>Priority: Minor
>  Labels: parquet
> Fix For: 0.12.0
>
>
> I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). 
> Error also occurs out of box on Mac OS X.
> $ sudo snap install --classic google-cloud-sdk
> $ gsutil cp 
> gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>  .
> $ conda install pyarrow
> $ python test.py
> Segmentation fault (core dumped)
> test.py:
> import pyarrow.parquet as pq
> path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
> pq.read_table(path)
> gdb output:
> Thread 3 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdf199700 (LWP 13703)]
> 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, 
> unsigned long*) () from 
> /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11
> I tested fastparquet, it reads the file just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3629) [Python] Add write_to_dataset to Python Sphinx API listing

2018-12-04 Thread Tanya Schlusser (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709088#comment-16709088
 ] 

Tanya Schlusser commented on ARROW-3629:


Pull request [#3089|https://github.com/apache/arrow/pull/3089], provided I 
understood this correctly and it only entails adding a single line to the 
{color:#654982}{{python/doc/source/api.rst}}{color}.



Comment:

The doc build was difficult, but possibly because I'm a noob. I'm commenting 
rather than making a JIRA issue because I have no idea whether these are actual 
issues or just a newbie's lack of knowledge. Running 
{color:#654982}{{dev/gen_apidocs.sh}}{color} on a clean pull with my single 
line to {color:#654982}{{api.rst}}{color} changed failed:

The {color:#654982}{{iwyu}}{color} image in 
{color:#654982}{{dev/docker-compose.yml}}{color} failed with this path issue:
 - {color:#654982}{{ERROR: build path /arrow/dev/iwyu either does 
not exist, is not accessible, or is not a valid URL.}}{color}
 - I commented it out and then could continue.


The Java docs wouldn't compile either at first:
- I think because there's a {color:#654982}{{conda install}}{color} for a 
second version of {color:#654982}{{maven}}{color} below the 
{color:#654982}{{apt-get install maven}}{color} in the 
[Dockerfile|https://github.com/apache/arrow/blob/master/dev/gen_apidocs/Dockerfile],
 which puts Java 11 in the front of the {color:#654982}{{PATH}}{color} breaking 
the lookup for class {color:#654982}{{javax.annotation.Generated}}{color} which 
moves from [Java 
8|https://docs.oracle.com/javase/8/docs/api/javax/annotation/Generated.html] to 
[Java 
9|https://docs.oracle.com/javase/9/docs/api/javax/annotation/processing/Generated.html]
 (and here is where it landed in [Java 
11|https://docs.oracle.com/en/java/javase/11/docs/api/java.compiler/javax/annotation/processing/Generated.html])
- when I deleted that line in the Dockerfile, the Java code compiled but 
didn't pass a test, because of a different missing dependency (that I didn't 
note; happy to figure it out if it's actually meaningful)
- so I commented out the Java build section in 
{color:#654982}{{dev/gen_apidocs/create_documents.sh}}{color}


The Javascript docs failed on a dependency I didn't note (happy to; just didn't 
want to waste time if it's my noob problem)
 - so I commented it out too; then the remaining doc generation worked

Please disregard if it's my lack of understanding. Otherwise I am happy to 
investigate further/add issues :).

> [Python] Add write_to_dataset to Python Sphinx API listing
> --
>
> Key: ARROW-3629
> URL: https://issues.apache.org/jira/browse/ARROW-3629
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3629) [Python] Add write_to_dataset to Python Sphinx API listing

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3629:
--
Labels: pull-request-available  (was: )

> [Python] Add write_to_dataset to Python Sphinx API listing
> --
>
> Key: ARROW-3629
> URL: https://issues.apache.org/jira/browse/ARROW-3629
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3209) [C++] Rename libarrow_gpu to libarrow_cuda

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3209:
--
Labels: pull-request-available  (was: )

> [C++] Rename libarrow_gpu to libarrow_cuda
> --
>
> Key: ARROW-3209
> URL: https://issues.apache.org/jira/browse/ARROW-3209
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> I'm proposing to rename this library since we could conceivably have OpenCL 
> bindings in the repository also



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3209) [C++] Rename libarrow_gpu to libarrow_cuda

2018-12-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-3209:
-

Assignee: Antoine Pitrou

> [C++] Rename libarrow_gpu to libarrow_cuda
> --
>
> Key: ARROW-3209
> URL: https://issues.apache.org/jira/browse/ARROW-3209
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.12.0
>
>
> I'm proposing to rename this library since we could conceivably have OpenCL 
> bindings in the repository also



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2993) [JS] Document minimum supported NodeJS version

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2993:
--
Labels: pull-request-available  (was: )

> [JS] Document minimum supported NodeJS version
> --
>
> Key: ARROW-2993
> URL: https://issues.apache.org/jira/browse/ARROW-2993
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> The integration tests fail with NodeJS 8.11.3 LTS, but pass with 10.1 and 
> higher. It would be useful to document the minimum supported NodeJS version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3368) [Integration/CI/Python] Add dask integration test to docker-compose setup

2018-12-04 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-3368:
---
Summary: [Integration/CI/Python] Add dask integration test to 
docker-compose setup  (was: [Integration/CI/Python] Port Dask integration test 
to docker-compose setup)

> [Integration/CI/Python] Add dask integration test to docker-compose setup
> -
>
> Key: ARROW-3368
> URL: https://issues.apache.org/jira/browse/ARROW-3368
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Introduced by https://github.com/apache/arrow/pull/2572



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3368) [Integration/CI/Python] Add dask integration test to docker-compose setup

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3368:
--
Labels: pull-request-available  (was: )

> [Integration/CI/Python] Add dask integration test to docker-compose setup
> -
>
> Key: ARROW-3368
> URL: https://issues.apache.org/jira/browse/ARROW-3368
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Introduced by https://github.com/apache/arrow/pull/2572



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3209) [C++] Rename libarrow_gpu to libarrow_cuda

2018-12-04 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708897#comment-16708897
 ] 

Wes McKinney commented on ARROW-3209:
-

We should change the namespace; I'm not sure that changing the directory name 
is necessary as long as it's clear what files inside are cuda-related

> [C++] Rename libarrow_gpu to libarrow_cuda
> --
>
> Key: ARROW-3209
> URL: https://issues.apache.org/jira/browse/ARROW-3209
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> I'm proposing to rename this library since we could conceivably have OpenCL 
> bindings in the repository also



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3934) [Gandiva] Don't compile precompiled tests if ARROW_GANDIVA_BUILD_TESTS=off

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3934.
-
Resolution: Fixed

Issue resolved by pull request 3082
[https://github.com/apache/arrow/pull/3082]

> [Gandiva] Don't compile precompiled tests if ARROW_GANDIVA_BUILD_TESTS=off
> --
>
> Key: ARROW-3934
> URL: https://issues.apache.org/jira/browse/ARROW-3934
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently the precompiled tests are compiled in any case, even if 
> ARROW_GANDIVA_BUILD_TESTS=off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3209) [C++] Rename libarrow_gpu to libarrow_cuda

2018-12-04 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708890#comment-16708890
 ] 

Antoine Pitrou commented on ARROW-3209:
---

Should we also rename the "arrow/gpu" directory and the {{arrow::gpu}} 
namespace?

> [C++] Rename libarrow_gpu to libarrow_cuda
> --
>
> Key: ARROW-3209
> URL: https://issues.apache.org/jira/browse/ARROW-3209
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> I'm proposing to rename this library since we could conceivably have OpenCL 
> bindings in the repository also



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2993) [JS] Document minimum supported NodeJS version

2018-12-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned ARROW-2993:


Assignee: Brian Hulette

> [JS] Document minimum supported NodeJS version
> --
>
> Key: ARROW-2993
> URL: https://issues.apache.org/jira/browse/ARROW-2993
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.0
>
>
> The integration tests fail with NodeJS 8.11.3 LTS, but pass with 10.1 and 
> higher. It would be useful to document the minimum supported NodeJS version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3931) Make possible to build regardless of LANG

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3931:
---

Assignee: Kousuke Saruta

> Make possible to build regardless of LANG
> -
>
> Key: ARROW-3931
> URL: https://issues.apache.org/jira/browse/ARROW-3931
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> At the time of building C++ libs, CompilerInfo.cmake checks the version of 
> compiler to be used.
> How to check is string matching of output of gcc -v or like clang -v.
> When LANG is not related to English, build will fail because string match 
> fails.
> The following is the case of  ja_JP.UTF-8 (Japanese).
> {code}
> CMake Error at cmake_modules/CompilerInfo.cmake:92 (message): 
>   
>   
> 
>   Unknown compiler.  Version info:
>   
>   
> 
>   
>   
>   
> 
>   組み込み spec を使用しています。 
>   
>  
>   
>   
>   
> 
>   COLLECT_GCC=/usr/bin/c++
>   
>   
> 
>   
>   
>   
> 
>   COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper  
>   
>   
> 
>   
>   
>   
> 
>   ターゲット: x86_64-redhat-linux  
>   
>  
>   
>   
>   
> 
>   configure 設定: ../configure --prefix=/usr --mandir=/usr/share/man
>   
>   
>   
>   --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla 
>   
>   
> 
>   --enable-bootstrap --enable-shared --enable-threads=posix   
>   
>   
> 
>   --enable-checking=release --with-system-zlib --enable-__cxa_atexit  
>   
>   
> 
>   --disable-libunwind-exceptions --enable-gnu-unique-object   
>   
>  

[jira] [Resolved] (ARROW-3931) Make possible to build regardless of LANG

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3931.
-
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3077
[https://github.com/apache/arrow/pull/3077]

> Make possible to build regardless of LANG
> -
>
> Key: ARROW-3931
> URL: https://issues.apache.org/jira/browse/ARROW-3931
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Kousuke Saruta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> At the time of building C++ libs, CompilerInfo.cmake checks the version of 
> compiler to be used.
> How to check is string matching of output of gcc -v or like clang -v.
> When LANG is not related to English, build will fail because string match 
> fails.
> The following is the case of  ja_JP.UTF-8 (Japanese).
> {code}
> CMake Error at cmake_modules/CompilerInfo.cmake:92 (message): 
>   
>   
> 
>   Unknown compiler.  Version info:
>   
>   
> 
>   
>   
>   
> 
>   組み込み spec を使用しています。 
>   
>  
>   
>   
>   
> 
>   COLLECT_GCC=/usr/bin/c++
>   
>   
> 
>   
>   
>   
> 
>   COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper  
>   
>   
> 
>   
>   
>   
> 
>   ターゲット: x86_64-redhat-linux  
>   
>  
>   
>   
>   
> 
>   configure 設定: ../configure --prefix=/usr --mandir=/usr/share/man
>   
>   
>   
>   --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla 
>   
>   
> 
>   --enable-bootstrap --enable-shared --enable-threads=posix   
>   
>   
> 
>   --enable-checking=release --with-system-zlib --enable-__cxa_atexit  
>   
>   
> 
>   --disable-libunwind-exceptions --enable-gnu-unique-object   
>  

[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD

2018-12-04 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708875#comment-16708875
 ] 

Wes McKinney commented on ARROW-3933:
-

It's hard to say until someone has a chance to look at it. Hopefully it can get 
fixed in time for the 0.12 release

> [Python] Segfault reading Parquet files from GNOMAD
> ---
>
> Key: ARROW-3933
> URL: https://issues.apache.org/jira/browse/ARROW-3933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
> Environment: Ubuntu 18.04 or Mac OS X
>Reporter: David Konerding
>Priority: Minor
>  Labels: parquet
> Fix For: 0.12.0
>
>
> I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). 
> Error also occurs out of box on Mac OS X.
> $ sudo snap install --classic google-cloud-sdk
> $ gsutil cp 
> gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>  .
> $ conda install pyarrow
> $ python test.py
> Segmentation fault (core dumped)
> test.py:
> import pyarrow.parquet as pq
> path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
> pq.read_table(path)
> gdb output:
> Thread 3 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdf199700 (LWP 13703)]
> 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, 
> unsigned long*) () from 
> /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11
> I tested fastparquet, it reads the file just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3892) [JS] Remove any dependency on compromised NPM flatmap-stream package

2018-12-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-3892.
--
Resolution: Fixed

Issue resolved by pull request 3083
[https://github.com/apache/arrow/pull/3083]

> [JS] Remove any dependency on compromised NPM flatmap-stream package
> 
>
> Key: ARROW-3892
> URL: https://issues.apache.org/jira/browse/ARROW-3892
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are erroring out as the result of 
> https://github.com/dominictarr/event-stream/issues/116
> {code}
>  npm ERR! code ENOVERSIONS
>  npm ERR! No valid versions available for flatmap-stream
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2670) [C++/Python] Add Ubuntu 18.04 / gcc7 as a nightly build

2018-12-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-2670.
-
   Resolution: Fixed
 Assignee: Krisztian Szucs
Fix Version/s: 0.12.0

> [C++/Python] Add Ubuntu 18.04 / gcc7 as a nightly build
> ---
>
> Key: ARROW-2670
> URL: https://issues.apache.org/jira/browse/ARROW-2670
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 0.12.0
>
>
> Once the packaging thread reaches a stable state and we have the ability to 
> run non-packaging nightly tests, we should set up a Docker build on Ubuntu 
> 18.04 (which is based on gcc 7.3) so we can keep that build clean. It may be 
> a while until we have any Travis CI entries that use Bionic / 18.04



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2670) [C++/Python] Add Ubuntu 18.04 / gcc7 as a nightly build

2018-12-04 Thread Krisztian Szucs (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708857#comment-16708857
 ] 

Krisztian Szucs commented on ARROW-2670:


[~wesmckinn] I think this is done, `docker-compose run cpp` does that and the 
nighlies are running on my crossbow isntance.

> [C++/Python] Add Ubuntu 18.04 / gcc7 as a nightly build
> ---
>
> Key: ARROW-2670
> URL: https://issues.apache.org/jira/browse/ARROW-2670
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> Once the packaging thread reaches a stable state and we have the ability to 
> run non-packaging nightly tests, we should set up a Docker build on Ubuntu 
> 18.04 (which is based on gcc 7.3) so we can keep that build clean. It may be 
> a while until we have any Travis CI entries that use Bionic / 18.04



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3368) [Integration/CI/Python] Port Dask integration test to docker-compose setup

2018-12-04 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-3368:
--

Assignee: Krisztian Szucs

> [Integration/CI/Python] Port Dask integration test to docker-compose setup
> --
>
> Key: ARROW-3368
> URL: https://issues.apache.org/jira/browse/ARROW-3368
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>
> Introduced by https://github.com/apache/arrow/pull/2572



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3368) [Integration/CI/Python] Port Dask integration test to docker-compose setup

2018-12-04 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-3368:
---
Summary: [Integration/CI/Python] Port Dask integration test to 
docker-compose setup  (was: [INTEGRATION] Port Dask integration test to 
docker-compose setup)

> [Integration/CI/Python] Port Dask integration test to docker-compose setup
> --
>
> Key: ARROW-3368
> URL: https://issues.apache.org/jira/browse/ARROW-3368
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Integration, Python
>Reporter: Krisztian Szucs
>Priority: Major
>
> Introduced by https://github.com/apache/arrow/pull/2572



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD

2018-12-04 Thread David Konerding (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708821#comment-16708821
 ] 

David Konerding commented on ARROW-3933:


Do you have suggestions for a workaround?  In particular, I'm curious if the 
problem repros outside of a conda install.  I don't want to build the software 
manually but will do so if that resolves the issue (but I would also want to 
see a fixed version pushed to PyPI/conda forge).

> [Python] Segfault reading Parquet files from GNOMAD
> ---
>
> Key: ARROW-3933
> URL: https://issues.apache.org/jira/browse/ARROW-3933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
> Environment: Ubuntu 18.04 or Mac OS X
>Reporter: David Konerding
>Priority: Minor
>  Labels: parquet
> Fix For: 0.12.0
>
>
> I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). 
> Error also occurs out of box on Mac OS X.
> $ sudo snap install --classic google-cloud-sdk
> $ gsutil cp 
> gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet
>  .
> $ conda install pyarrow
> $ python test.py
> Segmentation fault (core dumped)
> test.py:
> import pyarrow.parquet as pq
> path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet"
> pq.read_table(path)
> gdb output:
> Thread 3 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdf199700 (LWP 13703)]
> 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, 
> unsigned long*) () from 
> /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11
> I tested fastparquet, it reads the file just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3368) [Integration/CI/Python] Port Dask integration test to docker-compose setup

2018-12-04 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-3368:
---
Component/s: Python
 Continuous Integration

> [Integration/CI/Python] Port Dask integration test to docker-compose setup
> --
>
> Key: ARROW-3368
> URL: https://issues.apache.org/jira/browse/ARROW-3368
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Integration, Python
>Reporter: Krisztian Szucs
>Priority: Major
>
> Introduced by https://github.com/apache/arrow/pull/2572



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3936) Add _O_NOINHERIT to the file open flags on Windows

2018-12-04 Thread Philip Felton (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Felton updated ARROW-3936:
-
Description: 
Unlike Linux, Windows doesn't let you delete files that are currently opened by 
another process. So if you create a child process while a Parquet file is open, 
with the current code the file handle is inherited to the child process, and 
the parent process can't then delete the file after closing it without the 
child process terminating first.

By default, Win32 file handles are not inheritable (likely because of the 
aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX 
compatibility.

This is a serious problem for us.

We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path is 
a sensible approach and would likely be the correct behaviour as it matches the 
main Win32 API.

However, it could be that some developers rely on the current inheritable 
behaviour. In which case, the Arrow public API should take a boolean argument 
on whether the created file descriptor should be inheritable. But this would 
break API backward compatibility (unless a new overloaded method is introduced).

Is forking and inheriting Arrow internal file descriptor something that Arrow 
actually means to support?

See [https://github.com/apache/arrow/pull/3085.] What do we think of the 
proposed fix?

  was:
Unlike Linux, Windows doesn't let you delete files that are currently opened by 
another process. So if you create a child process while a Parquet file is open, 
with the current code the file handle is inherited to the child process, and 
the parent process can't then delete the file after closing it without the 
child process terminating first.

By default, Win32 file handles are not inheritable (likely because of the 
aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX 
compatibility.

This is a serious problem for us.

We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path is 
a sensible approach and would likely be the correct behaviour as it matches the 
main Win32 API.

However, it could be that some developers rely on the current inheritable 
behaviour. In which case, the Arrow public API should take a boolean argument 
on whether the created file descriptor should be inheritable. But this would 
break API backward compatibility (unless a new overloaded method is introduced).

Is forking and inheriting Arrow internal file descriptor something that Arrow 
actually means to support?

What do we think of the proposed fix?


> Add _O_NOINHERIT to the file open flags on Windows
> --
>
> Key: ARROW-3936
> URL: https://issues.apache.org/jira/browse/ARROW-3936
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philip Felton
>Priority: Major
>  Labels: pull-request-available
>
> Unlike Linux, Windows doesn't let you delete files that are currently opened 
> by another process. So if you create a child process while a Parquet file is 
> open, with the current code the file handle is inherited to the child 
> process, and the parent process can't then delete the file after closing it 
> without the child process terminating first.
> By default, Win32 file handles are not inheritable (likely because of the 
> aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX 
> compatibility.
> This is a serious problem for us.
> We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path 
> is a sensible approach and would likely be the correct behaviour as it 
> matches the main Win32 API.
> However, it could be that some developers rely on the current inheritable 
> behaviour. In which case, the Arrow public API should take a boolean argument 
> on whether the created file descriptor should be inheritable. But this would 
> break API backward compatibility (unless a new overloaded method is 
> introduced).
> Is forking and inheriting Arrow internal file descriptor something that Arrow 
> actually means to support?
> See [https://github.com/apache/arrow/pull/3085.] What do we think of the 
> proposed fix?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3936) Add _O_NOINHERIT to the file open flags on Windows

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3936:
--
Labels: pull-request-available  (was: )

> Add _O_NOINHERIT to the file open flags on Windows
> --
>
> Key: ARROW-3936
> URL: https://issues.apache.org/jira/browse/ARROW-3936
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philip Felton
>Priority: Major
>  Labels: pull-request-available
>
> Unlike Linux, Windows doesn't let you delete files that are currently opened 
> by another process. So if you create a child process while a Parquet file is 
> open, with the current code the file handle is inherited to the child 
> process, and the parent process can't then delete the file after closing it 
> without the child process terminating first.
> By default, Win32 file handles are not inheritable (likely because of the 
> aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX 
> compatibility.
> This is a serious problem for us.
> We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path 
> is a sensible approach and would likely be the correct behaviour as it 
> matches the main Win32 API.
> However, it could be that some developers rely on the current inheritable 
> behaviour. In which case, the Arrow public API should take a boolean argument 
> on whether the created file descriptor should be inheritable. But this would 
> break API backward compatibility (unless a new overloaded method is 
> introduced).
> Is forking and inheriting Arrow internal file descriptor something that Arrow 
> actually means to support?
> See [https://github.com/apache/arrow/pull/3085.] What do we think of the 
> proposed fix?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3936) Add _O_NOINHERIT to the file open flags on Windows

2018-12-04 Thread Philip Felton (JIRA)
Philip Felton created ARROW-3936:


 Summary: Add _O_NOINHERIT to the file open flags on Windows
 Key: ARROW-3936
 URL: https://issues.apache.org/jira/browse/ARROW-3936
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Philip Felton


Unlike Linux, Windows doesn't let you delete files that are currently opened by 
another process. So if you create a child process while a Parquet file is open, 
with the current code the file handle is inherited to the child process, and 
the parent process can't then delete the file after closing it without the 
child process terminating first.

By default, Win32 file handles are not inheritable (likely because of the 
aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX 
compatibility.

This is a serious problem for us.

We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path is 
a sensible approach and would likely be the correct behaviour as it matches the 
main Win32 API.

However, it could be that some developers rely on the current inheritable 
behaviour. In which case, the Arrow public API should take a boolean argument 
on whether the created file descriptor should be inheritable. But this would 
break API backward compatibility (unless a new overloaded method is introduced).

Is forking and inheriting Arrow internal file descriptor something that Arrow 
actually means to support?

What do we think of the proposed fix?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3935) [Packaging/Docker] Mount ccache directroy in docker-compose setup

2018-12-04 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-3935:
--

 Summary: [Packaging/Docker] Mount ccache directroy in 
docker-compose setup
 Key: ARROW-3935
 URL: https://issues.apache.org/jira/browse/ARROW-3935
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Krisztian Szucs


Hopefully this will speed up compilation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3303) [C++] Enable example arrays to be written with a simplified JSON representation

2018-12-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3303:
--
Labels: pull-request-available  (was: )

> [C++] Enable example arrays to be written with a simplified JSON 
> representation
> ---
>
> Key: ARROW-3303
> URL: https://issues.apache.org/jira/browse/ARROW-3303
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> In addition to making it easier to generate random data as described in 
> ARROW-2329, I think it would be useful to reduce some of the boilerplate 
> associated with writing down explicit test cases. The benefits of this will 
> be especially pronounced when writing nested arrays. 
> Example code that could be improved this way:
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/array-test.cc#L3271
> Rather than having a ton of hand-written assertions, we could compare with 
> the expected true dataset. Of course, this itself has to be tested 
> endogenously, but I think we can write enough tests for the JSON parser bit 
> to be able to have confidence in tests that are written with it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)