[jira] [Updated] (ARROW-3908) [Rust] Update rust dockerfile to use nightly toolchain
[ https://issues.apache.org/jira/browse/ARROW-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3908: -- Labels: pull-request-available (was: ) > [Rust] Update rust dockerfile to use nightly toolchain > -- > > Key: ARROW-3908 > URL: https://issues.apache.org/jira/browse/ARROW-3908 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Krisztian Szucs >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3908) [Rust] Update rust dockerfile to use nightly toolchain
[ https://issues.apache.org/jira/browse/ARROW-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned ARROW-3908: --- Assignee: Chao Sun > [Rust] Update rust dockerfile to use nightly toolchain > -- > > Key: ARROW-3908 > URL: https://issues.apache.org/jira/browse/ARROW-3908 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Krisztian Szucs >Assignee: Chao Sun >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3939) [Rust] Remove macro definition for ListArrayBuilder
Chao Sun created ARROW-3939: --- Summary: [Rust] Remove macro definition for ListArrayBuilder Key: ARROW-3939 URL: https://issues.apache.org/jira/browse/ARROW-3939 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Chao Sun Assignee: Chao Sun Currently `ListArrayBuilder` is done using macro and only implemented for a few value builder types. We should lift this restriction and allow creation of list builders with arbitrary value builder types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3939) [Rust] Remove macro definition for ListArrayBuilder
[ https://issues.apache.org/jira/browse/ARROW-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3939: -- Labels: pull-request-available (was: ) > [Rust] Remove macro definition for ListArrayBuilder > --- > > Key: ARROW-3939 > URL: https://issues.apache.org/jira/browse/ARROW-3939 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Affects Versions: 0.11.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > > Currently `ListArrayBuilder` is done using macro and only implemented for a > few value builder types. We should lift this restriction and allow creation > of list builders with arbitrary value builder types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3939) [Rust] Remove macro definition for ListArrayBuilder
[ https://issues.apache.org/jira/browse/ARROW-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated ARROW-3939: Affects Version/s: 0.11.0 > [Rust] Remove macro definition for ListArrayBuilder > --- > > Key: ARROW-3939 > URL: https://issues.apache.org/jira/browse/ARROW-3939 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Affects Versions: 0.11.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > Currently `ListArrayBuilder` is done using macro and only implemented for a > few value builder types. We should lift this restriction and allow creation > of list builders with arbitrary value builder types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3938) [Packaging] Stop to refer java/pom.xml to get version information
[ https://issues.apache.org/jira/browse/ARROW-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709648#comment-16709648 ] Kouhei Sutou commented on ARROW-3938: - Comment from [~wesmckinn] https://github.com/apache/arrow/pull/3096#issuecomment-444346414 One complexity is that we need to be able to produce development version numbers using setuptools_scm when building the Python wheels. > [Packaging] Stop to refer java/pom.xml to get version information > - > > Key: ARROW-3938 > URL: https://issues.apache.org/jira/browse/ARROW-3938 > Project: Apache Arrow > Issue Type: New Feature > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Fix For: 0.12.0 > > > https://github.com/apache/arrow/pull/3096#issuecomment-444345068 > I want to stop the current version sharing style. (Referring {{java/pom.xml}} > from C++, Python, C, Ruby, ) > It introduces complexity. For example, we generates {{version.rb}} > dynamically to create a Ruby package: > https://github.com/apache/arrow/blob/master/ruby/red-arrow/version.rb > I think that we can just replace all versions in {{cpp/CMakeLists.txt}}, > {{python/setup.py}}, {{c_glib/configure.ac}}, {{ruby/*/lib/*/version.rb}}, > {{rust/Cargo.toml}}, ... by {{sed}} in the release process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3938) [Packaging] Stop to refer java/pom.xml to get version information
Kouhei Sutou created ARROW-3938: --- Summary: [Packaging] Stop to refer java/pom.xml to get version information Key: ARROW-3938 URL: https://issues.apache.org/jira/browse/ARROW-3938 Project: Apache Arrow Issue Type: New Feature Components: Packaging Reporter: Kouhei Sutou Assignee: Kouhei Sutou Fix For: 0.12.0 https://github.com/apache/arrow/pull/3096#issuecomment-444345068 I want to stop the current version sharing style. (Referring {{java/pom.xml}} from C++, Python, C, Ruby, ) It introduces complexity. For example, we generates {{version.rb}} dynamically to create a Ruby package: https://github.com/apache/arrow/blob/master/ruby/red-arrow/version.rb I think that we can just replace all versions in {{cpp/CMakeLists.txt}}, {{python/setup.py}}, {{c_glib/configure.ac}}, {{ruby/*/lib/*/version.rb}}, {{rust/Cargo.toml}}, ... by {{sed}} in the release process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3883) [Rust] Update Rust README to reflect new functionality
[ https://issues.apache.org/jira/browse/ARROW-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-3883: -- Component/s: Rust > [Rust] Update Rust README to reflect new functionality > -- > > Key: ARROW-3883 > URL: https://issues.apache.org/jira/browse/ARROW-3883 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.11.1 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Fix For: 0.12.0 > > > The Rust README is now very outdated and needs updating before we release > 0.12.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3883) [Rust] Update Rust README to reflect new functionality
[ https://issues.apache.org/jira/browse/ARROW-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-3883: - Assignee: Andy Grove > [Rust] Update Rust README to reflect new functionality > -- > > Key: ARROW-3883 > URL: https://issues.apache.org/jira/browse/ARROW-3883 > Project: Apache Arrow > Issue Type: Improvement >Affects Versions: 0.11.1 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Fix For: 0.12.0 > > > The Rust README is now very outdated and needs updating before we release > 0.12.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3937) [Rust] Rust nightly build is failing
[ https://issues.apache.org/jira/browse/ARROW-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3937: -- Labels: pull-request-available (was: ) > [Rust] Rust nightly build is failing > > > Key: ARROW-3937 > URL: https://issues.apache.org/jira/browse/ARROW-3937 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Wes McKinney >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > See recent CI failures such as > https://travis-ci.org/apache/arrow/jobs/463656608#L650 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3937) [Rust] Rust nightly build is failing
[ https://issues.apache.org/jira/browse/ARROW-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-3937: - Assignee: Andy Grove > [Rust] Rust nightly build is failing > > > Key: ARROW-3937 > URL: https://issues.apache.org/jira/browse/ARROW-3937 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Wes McKinney >Assignee: Andy Grove >Priority: Major > Fix For: 0.12.0 > > > See recent CI failures such as > https://travis-ci.org/apache/arrow/jobs/463656608#L650 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3937) [Rust] Rust nightly build is failing
Wes McKinney created ARROW-3937: --- Summary: [Rust] Rust nightly build is failing Key: ARROW-3937 URL: https://issues.apache.org/jira/browse/ARROW-3937 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Wes McKinney Fix For: 0.12.0 See recent CI failures such as https://travis-ci.org/apache/arrow/jobs/463656608#L650 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3543) [R] Time zone adjustment issue when reading Feather file written by Python
[ https://issues.apache.org/jira/browse/ARROW-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709587#comment-16709587 ] Wes McKinney commented on ARROW-3543: - [~romainfrancois] can you have a look and see if this bug is still present? > [R] Time zone adjustment issue when reading Feather file written by Python > -- > > Key: ARROW-3543 > URL: https://issues.apache.org/jira/browse/ARROW-3543 > Project: Apache Arrow > Issue Type: Bug >Reporter: Olaf >Priority: Critical > Fix For: 0.12.0 > > > Hello the dream team, > Pasting from [https://github.com/wesm/feather/issues/351] > Thanks for this wonderful package. I was playing with feather and some > timestamps and I noticed some dangerous behavior. Maybe it is a bug. > Consider this > > {code:java} > import pandas as pd > import feather > import numpy as np > df = pd.DataFrame( > {'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'), > pd.to_datetime('2018-02-01 14:01:00.456'), pd.to_datetime('2018-03-05 > 14:01:02.200')]} > ) > df['timestamp_est'] = > pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None) > df > Out[17]: > string_time_utc timestamp_est > 0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531 > 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456 > 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200 > {code} > Here I create the corresponding `EST` timestamp of my original timestamps (in > `UTC` time). > Now saving the dataframe to `csv` or to `feather` will generate two > completely different results. > > {code:java} > df.to_csv('P://testing.csv') > df.to_feather('P://testing.feather') > {code} > Switching to R. > Using the good old `csv` gives me something a bit annoying, but expected. R > thinks my timezone is `UTC` by default, and wrongly attached this timezone to > `timestamp_est`. No big deal, I can always use `with_tz` or even better: > import as character and process as timestamp while in R. > > {code:java} > > dataframe <- read_csv('P://testing.csv') > Parsed with column specification: > cols( > X1 = col_integer(), > string_time_utc = col_datetime(format = ""), > timestamp_est = col_datetime(format = "") > ) > Warning message: > Missing column names filled in: 'X1' [1] > > > > dataframe %>% mutate(mytimezone = tz(timestamp_est)) > A tibble: 3 x 4 > X1 string_time_utc timestamp_est > > 1 0 2018-02-01 14:00:00.530 2018-02-01 09:00:00.530 > 2 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456 > 3 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200 > mytimezone > > 1 UTC > 2 UTC > 3 UTC {code} > {code:java} > #Now look at what happens with feather: > > > dataframe <- read_feather('P://testing.feather') > > > > dataframe %>% mutate(mytimezone = tz(timestamp_est)) > A tibble: 3 x 3 > string_time_utc timestamp_est mytimezone > > 1 2018-02-01 09:00:00.531 2018-02-01 04:00:00.531 "" > 2 2018-02-01 09:01:00.456 2018-02-01 04:01:00.456 "" > 3 2018-03-05 09:01:02.200 2018-03-05 04:01:02.200 "" {code} > My timestamps have been converted!!! pure insanity. > Am I missing something here? > Thanks!! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3318) [C++] Convenience method for reading all batches from an IPC stream or file as arrow::Table
[ https://issues.apache.org/jira/browse/ARROW-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3318: -- Labels: pull-request-available (was: ) > [C++] Convenience method for reading all batches from an IPC stream or file > as arrow::Table > --- > > Key: ARROW-3318 > URL: https://issues.apache.org/jira/browse/ARROW-3318 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > This is being implemented more than once in binding layers -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1918) [JS] Integration portion of verify-release-candidate.sh fails
[ https://issues.apache.org/jira/browse/ARROW-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated ARROW-1918: - Fix Version/s: (was: JS-0.4.0) JS-0.5.0 > [JS] Integration portion of verify-release-candidate.sh fails > - > > Key: ARROW-1918 > URL: https://issues.apache.org/jira/browse/ARROW-1918 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.8.0 >Reporter: Wes McKinney >Assignee: Brian Hulette >Priority: Major > Fix For: JS-0.5.0 > > > I'm going to temporarily disable this in my fixes in ARROW-1917 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-1918) [JS] Integration portion of verify-release-candidate.sh fails
[ https://issues.apache.org/jira/browse/ARROW-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette reassigned ARROW-1918: Assignee: Brian Hulette > [JS] Integration portion of verify-release-candidate.sh fails > - > > Key: ARROW-1918 > URL: https://issues.apache.org/jira/browse/ARROW-1918 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Affects Versions: 0.8.0 >Reporter: Wes McKinney >Assignee: Brian Hulette >Priority: Major > Fix For: JS-0.5.0 > > > I'm going to temporarily disable this in my fixes in ARROW-1917 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3885) [Rust] Update version to 0.12.0 and update release instructions on wiki
[ https://issues.apache.org/jira/browse/ARROW-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3885: -- Labels: pull-request-available (was: ) > [Rust] Update version to 0.12.0 and update release instructions on wiki > --- > > Key: ARROW-3885 > URL: https://issues.apache.org/jira/browse/ARROW-3885 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.11.1 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > > The Rust version of Arrow still has version 0.10.0 in the Cargo.toml ... we > need to bump this to 0.12.0 (or 0.12.0-alpha maybe) and update the > instructions for releasing Arrow so that this version gets updated when > performing a release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2993) [JS] Document minimum supported NodeJS version
[ https://issues.apache.org/jira/browse/ARROW-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette resolved ARROW-2993. -- Resolution: Fixed Issue resolved by pull request 3087 [https://github.com/apache/arrow/pull/3087] > [JS] Document minimum supported NodeJS version > -- > > Key: ARROW-2993 > URL: https://issues.apache.org/jira/browse/ARROW-2993 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Wes McKinney >Assignee: Brian Hulette >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The integration tests fail with NodeJS 8.11.3 LTS, but pass with 10.1 and > higher. It would be useful to document the minimum supported NodeJS version -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3894) [Python] Error reading IPC file with no record batches
[ https://issues.apache.org/jira/browse/ARROW-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3894: -- Labels: pull-request-available (was: ) > [Python] Error reading IPC file with no record batches > -- > > Key: ARROW-3894 > URL: https://issues.apache.org/jira/browse/ARROW-3894 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.11.1 >Reporter: Rik Coenders >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > When using the RecordBatchFileWriter without actually writing a record batch. > The magic byte at the beginning of the file is not written. This causes the > exception File is smaller than indicated metadata size when reading that file > with the RecordBatchFileReader. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-2374) [Rust] Add support for array of List
[ https://issues.apache.org/jira/browse/ARROW-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove closed ARROW-2374. - Resolution: Duplicate > [Rust] Add support for array of List > --- > > Key: ARROW-2374 > URL: https://issues.apache.org/jira/browse/ARROW-2374 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Add support for List in Array types. Look at Utf8 which wraps List to > see how this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2374) [Rust] Add support for array of List
[ https://issues.apache.org/jira/browse/ARROW-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709549#comment-16709549 ] Andy Grove commented on ARROW-2374: --- I believe this can be closed. I will go ahead and close as duplicate. > [Rust] Add support for array of List > --- > > Key: ARROW-2374 > URL: https://issues.apache.org/jira/browse/ARROW-2374 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Add support for List in Array types. Look at Utf8 which wraps List to > see how this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709548#comment-16709548 ] Wes McKinney commented on ARROW-3933: - Either [~xhochy] or I can take a closer look. This code path hasn't been hardened too much -- and, of course, our support for nested data is very incomplete > [Python] Segfault reading Parquet files from GNOMAD > --- > > Key: ARROW-3933 > URL: https://issues.apache.org/jira/browse/ARROW-3933 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Ubuntu 18.04 or Mac OS X >Reporter: David Konerding >Priority: Minor > Labels: parquet > Fix For: 0.12.0 > > > I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). > Error also occurs out of box on Mac OS X. > $ sudo snap install --classic google-cloud-sdk > $ gsutil cp > gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > . > $ conda install pyarrow > $ python test.py > Segmentation fault (core dumped) > test.py: > import pyarrow.parquet as pq > path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet" > pq.read_table(path) > gdb output: > Thread 3 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffdf199700 (LWP 13703)] > 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, > unsigned long*) () from > /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11 > I tested fastparquet, it reads the file just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3326) [Python] Expose stream alignment function in pyarrow.NativeFile
[ https://issues.apache.org/jira/browse/ARROW-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3326: -- Labels: pull-request-available (was: ) > [Python] Expose stream alignment function in pyarrow.NativeFile > --- > > Key: ARROW-3326 > URL: https://issues.apache.org/jira/browse/ARROW-3326 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > See also ARROW-3319 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709537#comment-16709537 ] Francois Saint-Jacques commented on ARROW-3933: --- Offending line [https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/reader.cc#L1538] , I don't have enough knowledge on parquet file format to decide if the file is corrupted or the assumption in the code is correct. > [Python] Segfault reading Parquet files from GNOMAD > --- > > Key: ARROW-3933 > URL: https://issues.apache.org/jira/browse/ARROW-3933 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Ubuntu 18.04 or Mac OS X >Reporter: David Konerding >Priority: Minor > Labels: parquet > Fix For: 0.12.0 > > > I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). > Error also occurs out of box on Mac OS X. > $ sudo snap install --classic google-cloud-sdk > $ gsutil cp > gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > . > $ conda install pyarrow > $ python test.py > Segmentation fault (core dumped) > test.py: > import pyarrow.parquet as pq > path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet" > pq.read_table(path) > gdb output: > Thread 3 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffdf199700 (LWP 13703)] > 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, > unsigned long*) () from > /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11 > I tested fastparquet, it reads the file just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3885) [Rust] Update version to 0.12.0 and update release instructions on wiki
[ https://issues.apache.org/jira/browse/ARROW-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reassigned ARROW-3885: - Assignee: Andy Grove > [Rust] Update version to 0.12.0 and update release instructions on wiki > --- > > Key: ARROW-3885 > URL: https://issues.apache.org/jira/browse/ARROW-3885 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.11.1 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Fix For: 0.12.0 > > > The Rust version of Arrow still has version 0.10.0 in the Cargo.toml ... we > need to bump this to 0.12.0 (or 0.12.0-alpha maybe) and update the > instructions for releasing Arrow so that this version gets updated when > performing a release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3326) [Python] Expose stream alignment function in pyarrow.NativeFile
[ https://issues.apache.org/jira/browse/ARROW-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-3326: --- Assignee: Wes McKinney > [Python] Expose stream alignment function in pyarrow.NativeFile > --- > > Key: ARROW-3326 > URL: https://issues.apache.org/jira/browse/ARROW-3326 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.12.0 > > > See also ARROW-3319 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2801) [Python] Implement splt_row_groups for ParquetDataset
[ https://issues.apache.org/jira/browse/ARROW-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2801: Fix Version/s: (was: 0.12.0) 0.13.0 > [Python] Implement splt_row_groups for ParquetDataset > - > > Key: ARROW-2801 > URL: https://issues.apache.org/jira/browse/ARROW-2801 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Robert Gruener >Assignee: Robert Gruener >Priority: Minor > Labels: parquet, pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently the split_row_groups argument in ParquetDataset yields a not > implemented error. An easy and efficient way to implement this is by using > the summary metadata file instead of opening every footer file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3220) [Python] Add writeat method to writeable NativeFile
[ https://issues.apache.org/jira/browse/ARROW-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3220: Fix Version/s: (was: 0.12.0) 0.13.0 > [Python] Add writeat method to writeable NativeFile > --- > > Key: ARROW-3220 > URL: https://issues.apache.org/jira/browse/ARROW-3220 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Pearu Peterson >Priority: Major > Fix For: 0.13.0 > > > See https://github.com/apache/arrow/pull/2536#discussion_r216384311 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3871) [R] Replace usages of C++ GetValuesSafely with new methods on ArrayData
[ https://issues.apache.org/jira/browse/ARROW-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3871: Fix Version/s: (was: 0.12.0) 0.13.0 > [R] Replace usages of C++ GetValuesSafely with new methods on ArrayData > --- > > Key: ARROW-3871 > URL: https://issues.apache.org/jira/browse/ARROW-3871 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > See https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L173 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3333) [Gandiva] Use non-platform specific integer types for lengths, indexes
[ https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709507#comment-16709507 ] Wes McKinney commented on ARROW-: - This should be reviewed, but not necessary while Gandiva is in an Alpha / Beta stage > [Gandiva] Use non-platform specific integer types for lengths, indexes > -- > > Key: ARROW- > URL: https://issues.apache.org/jira/browse/ARROW- > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Gandiva >Reporter: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > There are many instances of using {{unsigned int}} and {{int}} for array > indexes. This may cause issues on Windows -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3333) [Gandiva] Use non-platform specific integer types for lengths, indexes
[ https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-: Fix Version/s: (was: 0.12.0) 0.13.0 > [Gandiva] Use non-platform specific integer types for lengths, indexes > -- > > Key: ARROW- > URL: https://issues.apache.org/jira/browse/ARROW- > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Gandiva >Reporter: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > There are many instances of using {{unsigned int}} and {{int}} for array > indexes. This may cause issues on Windows -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3316) [R] Multi-threaded conversion from R data.frame to Arrow table / record batch
[ https://issues.apache.org/jira/browse/ARROW-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3316: Fix Version/s: (was: 0.12.0) 0.13.0 > [R] Multi-threaded conversion from R data.frame to Arrow table / record batch > - > > Key: ARROW-3316 > URL: https://issues.apache.org/jira/browse/ARROW-3316 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > This is the companion issue to ARROW-2968, like {{pyarrow.Table.from_pandas}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3332) [Gandiva] Remove usages of mutable reference out arguments
[ https://issues.apache.org/jira/browse/ARROW-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3332: Fix Version/s: (was: 0.12.0) 0.13.0 > [Gandiva] Remove usages of mutable reference out arguments > -- > > Key: ARROW-3332 > URL: https://issues.apache.org/jira/browse/ARROW-3332 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Gandiva >Reporter: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > I have noticed several usages of mutable reference out arguments, e.g. > gandiva/regex_util.h. We should change these to conform to the style guide > (out arguments as pointers) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3498) [R] Make IPC APIs consistent
[ https://issues.apache.org/jira/browse/ARROW-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709503#comment-16709503 ] Wes McKinney commented on ARROW-3498: - [~romainfrancois] where do things stand on this after the recent refactoring? > [R] Make IPC APIs consistent > > > Key: ARROW-3498 > URL: https://issues.apache.org/jira/browse/ARROW-3498 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Wes McKinney >Priority: Major > Fix For: 0.12.0 > > > There are many flavors of IPC functions: > * Read a complete IPC stream (where schema is included in the first > message(s)) > * Read an IPC "file" > * Read a schema only from a point in a buffer > * Read a record batch given a known schema and the memory address of an > encapsulated IPC message > These are partly available in R now, but with names that aren't necessarily > consistent. We should review each use case and normalize the API names -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3315) [R] Support for multi-threaded conversions from RecordBatch, Table to R data.frame
[ https://issues.apache.org/jira/browse/ARROW-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3315: Fix Version/s: (was: 0.12.0) 0.13.0 > [R] Support for multi-threaded conversions from RecordBatch, Table to R > data.frame > -- > > Key: ARROW-3315 > URL: https://issues.apache.org/jira/browse/ARROW-3315 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > This will be like {{RecordBatch.to_pandas}} with {{use_threads=True}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3291) [C++] Convenience API for constructing arrow::io::BufferReader from std::string
[ https://issues.apache.org/jira/browse/ARROW-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3291: -- Labels: pull-request-available (was: ) > [C++] Convenience API for constructing arrow::io::BufferReader from > std::string > --- > > Key: ARROW-3291 > URL: https://issues.apache.org/jira/browse/ARROW-3291 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > See motivating code example: > https://github.com/apache/arrow/commit/db0ef22dd68ae00e11f09da40b6734c1d9770b57#diff-6dc1b0b53e71627dfb98c60b1fd2d45cR39 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3913) [Gandiva] [GLib] Add GGandivaLiteralNode
[ https://issues.apache.org/jira/browse/ARROW-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3913: -- Labels: pull-request-available (was: ) > [Gandiva] [GLib] Add GGandivaLiteralNode > > > Key: ARROW-3913 > URL: https://issues.apache.org/jira/browse/ARROW-3913 > Project: Apache Arrow > Issue Type: Improvement > Components: Gandiva, GLib >Reporter: Yosuke Shiro >Assignee: Yosuke Shiro >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3291) [C++] Convenience API for constructing arrow::io::BufferReader from std::string
[ https://issues.apache.org/jira/browse/ARROW-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-3291: --- Assignee: Wes McKinney > [C++] Convenience API for constructing arrow::io::BufferReader from > std::string > --- > > Key: ARROW-3291 > URL: https://issues.apache.org/jira/browse/ARROW-3291 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.12.0 > > > See motivating code example: > https://github.com/apache/arrow/commit/db0ef22dd68ae00e11f09da40b6734c1d9770b57#diff-6dc1b0b53e71627dfb98c60b1fd2d45cR39 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3629) [Python] Add write_to_dataset to Python Sphinx API listing
[ https://issues.apache.org/jira/browse/ARROW-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3629. - Resolution: Fixed Issue resolved by pull request 3089 [https://github.com/apache/arrow/pull/3089] > [Python] Add write_to_dataset to Python Sphinx API listing > -- > > Key: ARROW-3629 > URL: https://issues.apache.org/jira/browse/ARROW-3629 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Tanya Schlusser >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3629) [Python] Add write_to_dataset to Python Sphinx API listing
[ https://issues.apache.org/jira/browse/ARROW-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-3629: --- Assignee: Tanya Schlusser > [Python] Add write_to_dataset to Python Sphinx API listing > -- > > Key: ARROW-3629 > URL: https://issues.apache.org/jira/browse/ARROW-3629 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Tanya Schlusser >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709364#comment-16709364 ] David Konerding commented on ARROW-3933: I noticed that the latest release of gnomad no longer uses Parquet (citing many problems with the format), so this bug is no longer a priority for me. > [Python] Segfault reading Parquet files from GNOMAD > --- > > Key: ARROW-3933 > URL: https://issues.apache.org/jira/browse/ARROW-3933 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Ubuntu 18.04 or Mac OS X >Reporter: David Konerding >Priority: Minor > Labels: parquet > Fix For: 0.12.0 > > > I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). > Error also occurs out of box on Mac OS X. > $ sudo snap install --classic google-cloud-sdk > $ gsutil cp > gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > . > $ conda install pyarrow > $ python test.py > Segmentation fault (core dumped) > test.py: > import pyarrow.parquet as pq > path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet" > pq.read_table(path) > gdb output: > Thread 3 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffdf199700 (LWP 13703)] > 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, > unsigned long*) () from > /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11 > I tested fastparquet, it reads the file just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709349#comment-16709349 ] Wes McKinney commented on ARROW-3933: - Try using parquet-tools in the Java library https://github.com/apache/parquet-mr > Does arrow have a general philosophy about releases and segfaulting No. A regression would be considered more seriously but we can't hold a release hostage if someone from the community cannot fix a bug > [Python] Segfault reading Parquet files from GNOMAD > --- > > Key: ARROW-3933 > URL: https://issues.apache.org/jira/browse/ARROW-3933 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Ubuntu 18.04 or Mac OS X >Reporter: David Konerding >Priority: Minor > Labels: parquet > Fix For: 0.12.0 > > > I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). > Error also occurs out of box on Mac OS X. > $ sudo snap install --classic google-cloud-sdk > $ gsutil cp > gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > . > $ conda install pyarrow > $ python test.py > Segmentation fault (core dumped) > test.py: > import pyarrow.parquet as pq > path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet" > pq.read_table(path) > gdb output: > Thread 3 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffdf199700 (LWP 13703)] > 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, > unsigned long*) () from > /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11 > I tested fastparquet, it reads the file just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709343#comment-16709343 ] T Poterba commented on ARROW-3933: -- I'll note that this file has a pretty huge schema, in case that could be a factor. The following is from a different data release, but should be similar in dimension: [https://gist.github.com/tpoterba/7e44fc74d9692c9c4ccdf5693c36d370] > [Python] Segfault reading Parquet files from GNOMAD > --- > > Key: ARROW-3933 > URL: https://issues.apache.org/jira/browse/ARROW-3933 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Ubuntu 18.04 or Mac OS X >Reporter: David Konerding >Priority: Minor > Labels: parquet > Fix For: 0.12.0 > > > I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). > Error also occurs out of box on Mac OS X. > $ sudo snap install --classic google-cloud-sdk > $ gsutil cp > gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > . > $ conda install pyarrow > $ python test.py > Segmentation fault (core dumped) > test.py: > import pyarrow.parquet as pq > path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet" > pq.read_table(path) > gdb output: > Thread 3 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffdf199700 (LWP 13703)] > 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, > unsigned long*) () from > /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11 > I tested fastparquet, it reads the file just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3441) [Gandiva][C++] Produce fewer test executables
[ https://issues.apache.org/jira/browse/ARROW-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3441: -- Labels: pull-request-available (was: ) > [Gandiva][C++] Produce fewer test executables > - > > Key: ARROW-3441 > URL: https://issues.apache.org/jira/browse/ARROW-3441 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Gandiva >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > In ARROW-3254, I am adding the functionality to create test executables from > multiple files that use googletest. So we can continue to have relatively > small unit test files, but combine unit tests into groups of > semantically-related functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3441) [Gandiva][C++] Produce fewer test executables
[ https://issues.apache.org/jira/browse/ARROW-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3441: Fix Version/s: (was: 0.13.0) 0.12.0 > [Gandiva][C++] Produce fewer test executables > - > > Key: ARROW-3441 > URL: https://issues.apache.org/jira/browse/ARROW-3441 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Gandiva >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.12.0 > > > In ARROW-3254, I am adding the functionality to create test executables from > multiple files that use googletest. So we can continue to have relatively > small unit test files, but combine unit tests into groups of > semantically-related functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3441) [Gandiva][C++] Produce fewer test executables
[ https://issues.apache.org/jira/browse/ARROW-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-3441: --- Assignee: Wes McKinney > [Gandiva][C++] Produce fewer test executables > - > > Key: ARROW-3441 > URL: https://issues.apache.org/jira/browse/ARROW-3441 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Gandiva >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > In ARROW-3254, I am adding the functionality to create test executables from > multiple files that use googletest. So we can continue to have relatively > small unit test files, but combine unit tests into groups of > semantically-related functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3272) [Java] Document checkstyle deviations from Google style guide
[ https://issues.apache.org/jira/browse/ARROW-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3272: -- Labels: pull-request-available (was: ) > [Java] Document checkstyle deviations from Google style guide > - > > Key: ARROW-3272 > URL: https://issues.apache.org/jira/browse/ARROW-3272 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java >Reporter: Bryan Cutler >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709222#comment-16709222 ] David Konerding commented on ARROW-3933: Is there a file validator for parquet? Or something in arrow I can use without python (say, arrow-cpp simple test program) that will attempt to read the file? Does arrow have a general philosophy about releases and segfaulting (IE, I would expect that segfaulting on reading a valid parquet file would be a release-blocker). > [Python] Segfault reading Parquet files from GNOMAD > --- > > Key: ARROW-3933 > URL: https://issues.apache.org/jira/browse/ARROW-3933 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Ubuntu 18.04 or Mac OS X >Reporter: David Konerding >Priority: Minor > Labels: parquet > Fix For: 0.12.0 > > > I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). > Error also occurs out of box on Mac OS X. > $ sudo snap install --classic google-cloud-sdk > $ gsutil cp > gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > . > $ conda install pyarrow > $ python test.py > Segmentation fault (core dumped) > test.py: > import pyarrow.parquet as pq > path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet" > pq.read_table(path) > gdb output: > Thread 3 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffdf199700 (LWP 13703)] > 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, > unsigned long*) () from > /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11 > I tested fastparquet, it reads the file just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3629) [Python] Add write_to_dataset to Python Sphinx API listing
[ https://issues.apache.org/jira/browse/ARROW-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709088#comment-16709088 ] Tanya Schlusser commented on ARROW-3629: Pull request [#3089|https://github.com/apache/arrow/pull/3089], provided I understood this correctly and it only entails adding a single line to the {color:#654982}{{python/doc/source/api.rst}}{color}. Comment: The doc build was difficult, but possibly because I'm a noob. I'm commenting rather than making a JIRA issue because I have no idea whether these are actual issues or just a newbie's lack of knowledge. Running {color:#654982}{{dev/gen_apidocs.sh}}{color} on a clean pull with my single line to {color:#654982}{{api.rst}}{color} changed failed: The {color:#654982}{{iwyu}}{color} image in {color:#654982}{{dev/docker-compose.yml}}{color} failed with this path issue: - {color:#654982}{{ERROR: build path /arrow/dev/iwyu either does not exist, is not accessible, or is not a valid URL.}}{color} - I commented it out and then could continue. The Java docs wouldn't compile either at first: - I think because there's a {color:#654982}{{conda install}}{color} for a second version of {color:#654982}{{maven}}{color} below the {color:#654982}{{apt-get install maven}}{color} in the [Dockerfile|https://github.com/apache/arrow/blob/master/dev/gen_apidocs/Dockerfile], which puts Java 11 in the front of the {color:#654982}{{PATH}}{color} breaking the lookup for class {color:#654982}{{javax.annotation.Generated}}{color} which moves from [Java 8|https://docs.oracle.com/javase/8/docs/api/javax/annotation/Generated.html] to [Java 9|https://docs.oracle.com/javase/9/docs/api/javax/annotation/processing/Generated.html] (and here is where it landed in [Java 11|https://docs.oracle.com/en/java/javase/11/docs/api/java.compiler/javax/annotation/processing/Generated.html]) - when I deleted that line in the Dockerfile, the Java code compiled but didn't pass a test, because of a different missing dependency (that I didn't note; happy to figure it out if it's actually meaningful) - so I commented out the Java build section in {color:#654982}{{dev/gen_apidocs/create_documents.sh}}{color} The Javascript docs failed on a dependency I didn't note (happy to; just didn't want to waste time if it's my noob problem) - so I commented it out too; then the remaining doc generation worked Please disregard if it's my lack of understanding. Otherwise I am happy to investigate further/add issues :). > [Python] Add write_to_dataset to Python Sphinx API listing > -- > > Key: ARROW-3629 > URL: https://issues.apache.org/jira/browse/ARROW-3629 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3629) [Python] Add write_to_dataset to Python Sphinx API listing
[ https://issues.apache.org/jira/browse/ARROW-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3629: -- Labels: pull-request-available (was: ) > [Python] Add write_to_dataset to Python Sphinx API listing > -- > > Key: ARROW-3629 > URL: https://issues.apache.org/jira/browse/ARROW-3629 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3209) [C++] Rename libarrow_gpu to libarrow_cuda
[ https://issues.apache.org/jira/browse/ARROW-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3209: -- Labels: pull-request-available (was: ) > [C++] Rename libarrow_gpu to libarrow_cuda > -- > > Key: ARROW-3209 > URL: https://issues.apache.org/jira/browse/ARROW-3209 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > I'm proposing to rename this library since we could conceivably have OpenCL > bindings in the repository also -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3209) [C++] Rename libarrow_gpu to libarrow_cuda
[ https://issues.apache.org/jira/browse/ARROW-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-3209: - Assignee: Antoine Pitrou > [C++] Rename libarrow_gpu to libarrow_cuda > -- > > Key: ARROW-3209 > URL: https://issues.apache.org/jira/browse/ARROW-3209 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.12.0 > > > I'm proposing to rename this library since we could conceivably have OpenCL > bindings in the repository also -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2993) [JS] Document minimum supported NodeJS version
[ https://issues.apache.org/jira/browse/ARROW-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2993: -- Labels: pull-request-available (was: ) > [JS] Document minimum supported NodeJS version > -- > > Key: ARROW-2993 > URL: https://issues.apache.org/jira/browse/ARROW-2993 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Wes McKinney >Assignee: Brian Hulette >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.0 > > > The integration tests fail with NodeJS 8.11.3 LTS, but pass with 10.1 and > higher. It would be useful to document the minimum supported NodeJS version -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3368) [Integration/CI/Python] Add dask integration test to docker-compose setup
[ https://issues.apache.org/jira/browse/ARROW-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-3368: --- Summary: [Integration/CI/Python] Add dask integration test to docker-compose setup (was: [Integration/CI/Python] Port Dask integration test to docker-compose setup) > [Integration/CI/Python] Add dask integration test to docker-compose setup > - > > Key: ARROW-3368 > URL: https://issues.apache.org/jira/browse/ARROW-3368 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Integration, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > > Introduced by https://github.com/apache/arrow/pull/2572 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3368) [Integration/CI/Python] Add dask integration test to docker-compose setup
[ https://issues.apache.org/jira/browse/ARROW-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3368: -- Labels: pull-request-available (was: ) > [Integration/CI/Python] Add dask integration test to docker-compose setup > - > > Key: ARROW-3368 > URL: https://issues.apache.org/jira/browse/ARROW-3368 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Integration, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > > Introduced by https://github.com/apache/arrow/pull/2572 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3209) [C++] Rename libarrow_gpu to libarrow_cuda
[ https://issues.apache.org/jira/browse/ARROW-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708897#comment-16708897 ] Wes McKinney commented on ARROW-3209: - We should change the namespace; I'm not sure that changing the directory name is necessary as long as it's clear what files inside are cuda-related > [C++] Rename libarrow_gpu to libarrow_cuda > -- > > Key: ARROW-3209 > URL: https://issues.apache.org/jira/browse/ARROW-3209 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.12.0 > > > I'm proposing to rename this library since we could conceivably have OpenCL > bindings in the repository also -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3934) [Gandiva] Don't compile precompiled tests if ARROW_GANDIVA_BUILD_TESTS=off
[ https://issues.apache.org/jira/browse/ARROW-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3934. - Resolution: Fixed Issue resolved by pull request 3082 [https://github.com/apache/arrow/pull/3082] > [Gandiva] Don't compile precompiled tests if ARROW_GANDIVA_BUILD_TESTS=off > -- > > Key: ARROW-3934 > URL: https://issues.apache.org/jira/browse/ARROW-3934 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently the precompiled tests are compiled in any case, even if > ARROW_GANDIVA_BUILD_TESTS=off. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3209) [C++] Rename libarrow_gpu to libarrow_cuda
[ https://issues.apache.org/jira/browse/ARROW-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708890#comment-16708890 ] Antoine Pitrou commented on ARROW-3209: --- Should we also rename the "arrow/gpu" directory and the {{arrow::gpu}} namespace? > [C++] Rename libarrow_gpu to libarrow_cuda > -- > > Key: ARROW-3209 > URL: https://issues.apache.org/jira/browse/ARROW-3209 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.12.0 > > > I'm proposing to rename this library since we could conceivably have OpenCL > bindings in the repository also -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2993) [JS] Document minimum supported NodeJS version
[ https://issues.apache.org/jira/browse/ARROW-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette reassigned ARROW-2993: Assignee: Brian Hulette > [JS] Document minimum supported NodeJS version > -- > > Key: ARROW-2993 > URL: https://issues.apache.org/jira/browse/ARROW-2993 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Wes McKinney >Assignee: Brian Hulette >Priority: Major > Fix For: JS-0.4.0 > > > The integration tests fail with NodeJS 8.11.3 LTS, but pass with 10.1 and > higher. It would be useful to document the minimum supported NodeJS version -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3931) Make possible to build regardless of LANG
[ https://issues.apache.org/jira/browse/ARROW-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-3931: --- Assignee: Kousuke Saruta > Make possible to build regardless of LANG > - > > Key: ARROW-3931 > URL: https://issues.apache.org/jira/browse/ARROW-3931 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.12.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 40m > Remaining Estimate: 0h > > At the time of building C++ libs, CompilerInfo.cmake checks the version of > compiler to be used. > How to check is string matching of output of gcc -v or like clang -v. > When LANG is not related to English, build will fail because string match > fails. > The following is the case of ja_JP.UTF-8 (Japanese). > {code} > CMake Error at cmake_modules/CompilerInfo.cmake:92 (message): > > > > Unknown compiler. Version info: > > > > > > > > 組み込み spec を使用しています。 > > > > > > > COLLECT_GCC=/usr/bin/c++ > > > > > > > > COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper > > > > > > > > ターゲット: x86_64-redhat-linux > > > > > > > configure 設定: ../configure --prefix=/usr --mandir=/usr/share/man > > > > --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla > > > > --enable-bootstrap --enable-shared --enable-threads=posix > > > > --enable-checking=release --with-system-zlib --enable-__cxa_atexit > > > > --disable-libunwind-exceptions --enable-gnu-unique-object > >
[jira] [Resolved] (ARROW-3931) Make possible to build regardless of LANG
[ https://issues.apache.org/jira/browse/ARROW-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3931. - Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 3077 [https://github.com/apache/arrow/pull/3077] > Make possible to build regardless of LANG > - > > Key: ARROW-3931 > URL: https://issues.apache.org/jira/browse/ARROW-3931 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.12.0 >Reporter: Kousuke Saruta >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > At the time of building C++ libs, CompilerInfo.cmake checks the version of > compiler to be used. > How to check is string matching of output of gcc -v or like clang -v. > When LANG is not related to English, build will fail because string match > fails. > The following is the case of ja_JP.UTF-8 (Japanese). > {code} > CMake Error at cmake_modules/CompilerInfo.cmake:92 (message): > > > > Unknown compiler. Version info: > > > > > > > > 組み込み spec を使用しています。 > > > > > > > COLLECT_GCC=/usr/bin/c++ > > > > > > > > COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper > > > > > > > > ターゲット: x86_64-redhat-linux > > > > > > > configure 設定: ../configure --prefix=/usr --mandir=/usr/share/man > > > > --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla > > > > --enable-bootstrap --enable-shared --enable-threads=posix > > > > --enable-checking=release --with-system-zlib --enable-__cxa_atexit > > > > --disable-libunwind-exceptions --enable-gnu-unique-object >
[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708875#comment-16708875 ] Wes McKinney commented on ARROW-3933: - It's hard to say until someone has a chance to look at it. Hopefully it can get fixed in time for the 0.12 release > [Python] Segfault reading Parquet files from GNOMAD > --- > > Key: ARROW-3933 > URL: https://issues.apache.org/jira/browse/ARROW-3933 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Ubuntu 18.04 or Mac OS X >Reporter: David Konerding >Priority: Minor > Labels: parquet > Fix For: 0.12.0 > > > I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). > Error also occurs out of box on Mac OS X. > $ sudo snap install --classic google-cloud-sdk > $ gsutil cp > gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > . > $ conda install pyarrow > $ python test.py > Segmentation fault (core dumped) > test.py: > import pyarrow.parquet as pq > path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet" > pq.read_table(path) > gdb output: > Thread 3 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffdf199700 (LWP 13703)] > 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, > unsigned long*) () from > /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11 > I tested fastparquet, it reads the file just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3892) [JS] Remove any dependency on compromised NPM flatmap-stream package
[ https://issues.apache.org/jira/browse/ARROW-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette resolved ARROW-3892. -- Resolution: Fixed Issue resolved by pull request 3083 [https://github.com/apache/arrow/pull/3083] > [JS] Remove any dependency on compromised NPM flatmap-stream package > > > Key: ARROW-3892 > URL: https://issues.apache.org/jira/browse/ARROW-3892 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Wes McKinney >Assignee: Brian Hulette >Priority: Major > Labels: pull-request-available > Fix For: JS-0.4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We are erroring out as the result of > https://github.com/dominictarr/event-stream/issues/116 > {code} > npm ERR! code ENOVERSIONS > npm ERR! No valid versions available for flatmap-stream > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2670) [C++/Python] Add Ubuntu 18.04 / gcc7 as a nightly build
[ https://issues.apache.org/jira/browse/ARROW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-2670. - Resolution: Fixed Assignee: Krisztian Szucs Fix Version/s: 0.12.0 > [C++/Python] Add Ubuntu 18.04 / gcc7 as a nightly build > --- > > Key: ARROW-2670 > URL: https://issues.apache.org/jira/browse/ARROW-2670 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > Fix For: 0.12.0 > > > Once the packaging thread reaches a stable state and we have the ability to > run non-packaging nightly tests, we should set up a Docker build on Ubuntu > 18.04 (which is based on gcc 7.3) so we can keep that build clean. It may be > a while until we have any Travis CI entries that use Bionic / 18.04 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2670) [C++/Python] Add Ubuntu 18.04 / gcc7 as a nightly build
[ https://issues.apache.org/jira/browse/ARROW-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708857#comment-16708857 ] Krisztian Szucs commented on ARROW-2670: [~wesmckinn] I think this is done, `docker-compose run cpp` does that and the nighlies are running on my crossbow isntance. > [C++/Python] Add Ubuntu 18.04 / gcc7 as a nightly build > --- > > Key: ARROW-2670 > URL: https://issues.apache.org/jira/browse/ARROW-2670 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > Once the packaging thread reaches a stable state and we have the ability to > run non-packaging nightly tests, we should set up a Docker build on Ubuntu > 18.04 (which is based on gcc 7.3) so we can keep that build clean. It may be > a while until we have any Travis CI entries that use Bionic / 18.04 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3368) [Integration/CI/Python] Port Dask integration test to docker-compose setup
[ https://issues.apache.org/jira/browse/ARROW-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reassigned ARROW-3368: -- Assignee: Krisztian Szucs > [Integration/CI/Python] Port Dask integration test to docker-compose setup > -- > > Key: ARROW-3368 > URL: https://issues.apache.org/jira/browse/ARROW-3368 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Integration, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > > Introduced by https://github.com/apache/arrow/pull/2572 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3368) [Integration/CI/Python] Port Dask integration test to docker-compose setup
[ https://issues.apache.org/jira/browse/ARROW-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-3368: --- Summary: [Integration/CI/Python] Port Dask integration test to docker-compose setup (was: [INTEGRATION] Port Dask integration test to docker-compose setup) > [Integration/CI/Python] Port Dask integration test to docker-compose setup > -- > > Key: ARROW-3368 > URL: https://issues.apache.org/jira/browse/ARROW-3368 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Integration, Python >Reporter: Krisztian Szucs >Priority: Major > > Introduced by https://github.com/apache/arrow/pull/2572 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708821#comment-16708821 ] David Konerding commented on ARROW-3933: Do you have suggestions for a workaround? In particular, I'm curious if the problem repros outside of a conda install. I don't want to build the software manually but will do so if that resolves the issue (but I would also want to see a fixed version pushed to PyPI/conda forge). > [Python] Segfault reading Parquet files from GNOMAD > --- > > Key: ARROW-3933 > URL: https://issues.apache.org/jira/browse/ARROW-3933 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Environment: Ubuntu 18.04 or Mac OS X >Reporter: David Konerding >Priority: Minor > Labels: parquet > Fix For: 0.12.0 > > > I am getting segfault trying to run a basic program Ubuntu 18.04 VM (AWS). > Error also occurs out of box on Mac OS X. > $ sudo snap install --classic google-cloud-sdk > $ gsutil cp > gs://gnomad-public/release/2.0.2/vds/exomes/gnomad.exomes.r2.0.2.sites.vds/rdd.parquet/part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet > . > $ conda install pyarrow > $ python test.py > Segmentation fault (core dumped) > test.py: > import pyarrow.parquet as pq > path = "part-r-0-31fcf9bd-682f-4c20-bbe5-b0bd08699104.snappy.parquet" > pq.read_table(path) > gdb output: > Thread 3 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffdf199700 (LWP 13703)] > 0x7fffdfc2a470 in parquet::arrow::StructImpl::GetDefLevels(short const**, > unsigned long*) () from > /home/ubuntu/miniconda2/lib/python2.7/site-packages/pyarrow/../../../libparquet.so.11 > I tested fastparquet, it reads the file just fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3368) [Integration/CI/Python] Port Dask integration test to docker-compose setup
[ https://issues.apache.org/jira/browse/ARROW-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-3368: --- Component/s: Python Continuous Integration > [Integration/CI/Python] Port Dask integration test to docker-compose setup > -- > > Key: ARROW-3368 > URL: https://issues.apache.org/jira/browse/ARROW-3368 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Integration, Python >Reporter: Krisztian Szucs >Priority: Major > > Introduced by https://github.com/apache/arrow/pull/2572 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3936) Add _O_NOINHERIT to the file open flags on Windows
[ https://issues.apache.org/jira/browse/ARROW-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Felton updated ARROW-3936: - Description: Unlike Linux, Windows doesn't let you delete files that are currently opened by another process. So if you create a child process while a Parquet file is open, with the current code the file handle is inherited to the child process, and the parent process can't then delete the file after closing it without the child process terminating first. By default, Win32 file handles are not inheritable (likely because of the aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX compatibility. This is a serious problem for us. We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path is a sensible approach and would likely be the correct behaviour as it matches the main Win32 API. However, it could be that some developers rely on the current inheritable behaviour. In which case, the Arrow public API should take a boolean argument on whether the created file descriptor should be inheritable. But this would break API backward compatibility (unless a new overloaded method is introduced). Is forking and inheriting Arrow internal file descriptor something that Arrow actually means to support? See [https://github.com/apache/arrow/pull/3085.] What do we think of the proposed fix? was: Unlike Linux, Windows doesn't let you delete files that are currently opened by another process. So if you create a child process while a Parquet file is open, with the current code the file handle is inherited to the child process, and the parent process can't then delete the file after closing it without the child process terminating first. By default, Win32 file handles are not inheritable (likely because of the aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX compatibility. This is a serious problem for us. We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path is a sensible approach and would likely be the correct behaviour as it matches the main Win32 API. However, it could be that some developers rely on the current inheritable behaviour. In which case, the Arrow public API should take a boolean argument on whether the created file descriptor should be inheritable. But this would break API backward compatibility (unless a new overloaded method is introduced). Is forking and inheriting Arrow internal file descriptor something that Arrow actually means to support? What do we think of the proposed fix? > Add _O_NOINHERIT to the file open flags on Windows > -- > > Key: ARROW-3936 > URL: https://issues.apache.org/jira/browse/ARROW-3936 > Project: Apache Arrow > Issue Type: Bug >Reporter: Philip Felton >Priority: Major > Labels: pull-request-available > > Unlike Linux, Windows doesn't let you delete files that are currently opened > by another process. So if you create a child process while a Parquet file is > open, with the current code the file handle is inherited to the child > process, and the parent process can't then delete the file after closing it > without the child process terminating first. > By default, Win32 file handles are not inheritable (likely because of the > aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX > compatibility. > This is a serious problem for us. > We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path > is a sensible approach and would likely be the correct behaviour as it > matches the main Win32 API. > However, it could be that some developers rely on the current inheritable > behaviour. In which case, the Arrow public API should take a boolean argument > on whether the created file descriptor should be inheritable. But this would > break API backward compatibility (unless a new overloaded method is > introduced). > Is forking and inheriting Arrow internal file descriptor something that Arrow > actually means to support? > See [https://github.com/apache/arrow/pull/3085.] What do we think of the > proposed fix? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3936) Add _O_NOINHERIT to the file open flags on Windows
[ https://issues.apache.org/jira/browse/ARROW-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3936: -- Labels: pull-request-available (was: ) > Add _O_NOINHERIT to the file open flags on Windows > -- > > Key: ARROW-3936 > URL: https://issues.apache.org/jira/browse/ARROW-3936 > Project: Apache Arrow > Issue Type: Bug >Reporter: Philip Felton >Priority: Major > Labels: pull-request-available > > Unlike Linux, Windows doesn't let you delete files that are currently opened > by another process. So if you create a child process while a Parquet file is > open, with the current code the file handle is inherited to the child > process, and the parent process can't then delete the file after closing it > without the child process terminating first. > By default, Win32 file handles are not inheritable (likely because of the > aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX > compatibility. > This is a serious problem for us. > We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path > is a sensible approach and would likely be the correct behaviour as it > matches the main Win32 API. > However, it could be that some developers rely on the current inheritable > behaviour. In which case, the Arrow public API should take a boolean argument > on whether the created file descriptor should be inheritable. But this would > break API backward compatibility (unless a new overloaded method is > introduced). > Is forking and inheriting Arrow internal file descriptor something that Arrow > actually means to support? > See [https://github.com/apache/arrow/pull/3085.] What do we think of the > proposed fix? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3936) Add _O_NOINHERIT to the file open flags on Windows
Philip Felton created ARROW-3936: Summary: Add _O_NOINHERIT to the file open flags on Windows Key: ARROW-3936 URL: https://issues.apache.org/jira/browse/ARROW-3936 Project: Apache Arrow Issue Type: Bug Reporter: Philip Felton Unlike Linux, Windows doesn't let you delete files that are currently opened by another process. So if you create a child process while a Parquet file is open, with the current code the file handle is inherited to the child process, and the parent process can't then delete the file after closing it without the child process terminating first. By default, Win32 file handles are not inheritable (likely because of the aforementioned problems). Except for _wsopen_s, which tries to maintain POSIX compatibility. This is a serious problem for us. We would argue that specifying _O_NOINHERIT by default in the _MSC_VER path is a sensible approach and would likely be the correct behaviour as it matches the main Win32 API. However, it could be that some developers rely on the current inheritable behaviour. In which case, the Arrow public API should take a boolean argument on whether the created file descriptor should be inheritable. But this would break API backward compatibility (unless a new overloaded method is introduced). Is forking and inheriting Arrow internal file descriptor something that Arrow actually means to support? What do we think of the proposed fix? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3935) [Packaging/Docker] Mount ccache directroy in docker-compose setup
Krisztian Szucs created ARROW-3935: -- Summary: [Packaging/Docker] Mount ccache directroy in docker-compose setup Key: ARROW-3935 URL: https://issues.apache.org/jira/browse/ARROW-3935 Project: Apache Arrow Issue Type: Improvement Components: Packaging Reporter: Krisztian Szucs Hopefully this will speed up compilation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3303) [C++] Enable example arrays to be written with a simplified JSON representation
[ https://issues.apache.org/jira/browse/ARROW-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3303: -- Labels: pull-request-available (was: ) > [C++] Enable example arrays to be written with a simplified JSON > representation > --- > > Key: ARROW-3303 > URL: https://issues.apache.org/jira/browse/ARROW-3303 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > In addition to making it easier to generate random data as described in > ARROW-2329, I think it would be useful to reduce some of the boilerplate > associated with writing down explicit test cases. The benefits of this will > be especially pronounced when writing nested arrays. > Example code that could be improved this way: > https://github.com/apache/arrow/blob/master/cpp/src/arrow/array-test.cc#L3271 > Rather than having a ton of hand-written assertions, we could compare with > the expected true dataset. Of course, this itself has to be tested > endogenously, but I think we can write enough tests for the JSON parser bit > to be able to have confidence in tests that are written with it -- This message was sent by Atlassian JIRA (v7.6.3#76005)