[ https://issues.apache.org/jira/browse/ARROW-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887352#comment-16887352 ]
H. Vetinari edited comment on ARROW-5965 at 7/18/19 6:36 AM: ------------------------------------------------------------- [~wesmckinn] Would like to provide it, but would only be able to install through conda (which has a hole in the firewall). Unfortunately, {{# conda install pyarrow=0.14 gdb}} {{Collecting package metadata (current_repodata.json): done}} {{Solving environment: failed}} {{Collecting package metadata (repodata.json): done}} {{Solving environment: failed}} {{UnsatisfiableError: The following specifications were found to be incompatible with each other:}} {{ - pip -> python[version='>=3.7,<3.8.0a0']}} which, I believe, is due to the fact that gdb has [not yet](https://github.com/conda-forge/gdb-feedstock/pull/12) been built for python 3.7. (although, just as I was preparing this message, I triggered a rerender there and this has caused some further action and the first passing 3.7 build; not yet merged because 2.7 is failing). In the meantime I tried downgrading my whole environment to 3.6, where the program also crashes or hangs on v0.14. However, I haven't yet been able to get a gdb output. Might need some more reading of the GDB manual... EDIT: can't seem to format the code-block correctly, sorry. was (Author: h-vetinari): [~wesmckinn] Would like to provide it, but would only be able to install through conda (which has a hole in the firewall). Unfortunately, {{ # conda install pyarrow=0.14 gdb Collecting package metadata (current_repodata.json): done Solving environment: failed Collecting package metadata (repodata.json): done Solving environment: failed UnsatisfiableError: The following specifications were found to be incompatible with each other: - pip -> python[version='>=3.7,<3.8.0a0'] }} which, I believe, is due to the fact that gdb has [not yet](https://github.com/conda-forge/gdb-feedstock/pull/12) been built for python 3.7. (although, just as I was preparing this message, I triggered a rerender there and this has caused some further action and the first passing 3.7 build; not yet merged because 2.7 is failing). In the meantime I tried downgrading my whole environment to 3.6, where the program also crashes or hangs on v0.14. However, I haven't yet been able to get a gdb output. Might need some more reading of the GDB manual... EDIT: can't seem to format the code-block correctly, sorry. > [Python] Regression: segfault when reading hive table with v0.14 > ---------------------------------------------------------------- > > Key: ARROW-5965 > URL: https://issues.apache.org/jira/browse/ARROW-5965 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.14.0 > Reporter: H. Vetinari > Priority: Critical > Labels: parquet > > I'm working with pyarrow on a cloudera cluster (CDH 6.1.1), with pyarrow > installed in a conda env. > The data I'm reading is a hive(-registered) table written as parquet, and > with v0.13, reading this table (that is partitioned) does not cause any > issues. > The code that worked before and now crashes with v0.14 is simply: > ``` > import pyarrow.parquet as pq > pq.ParquetDataset('hdfs:///data/raw/source/table').read() > ``` > Since it completely crashes my notebook (resp. my REPL ends with "Killed"), I > cannot report much more, but this is a pretty severe usability restriction. > So far the solution is to enforce `pyarrow<0.14` -- This message was sent by Atlassian JIRA (v7.6.14#76016)