[ https://issues.apache.org/jira/browse/ARROW-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marc Bernot updated ARROW-7939: ------------------------------- Description: When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 would make python crash. I drilled down to the simplest example I could find. It happens that some parquet files created with pyarrow 0.16 cannot either be read back. The example below works fine with arrays_ok but python crashes with arrays_nok (and as soon as they are at least three different values apparently). Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The problem seems to happen only with snappy. {code:python} import pyarrow.parquet as pq import pyarrow as pa arrays_ok = [[0,1]] arrays_ok = [[0,1,1]] arrays_nok = [[0,1,2]] table = pa.Table.from_arrays(arrays_nok,names=['a']) pq.write_table(table,'foo.parquet',compression='snappy') pq.read_table('foo.parquet') {code} was: When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 would make python crash. I drilled down to the simplest example I could find. It happens that some parquet files created with pyarrow 0.16 cannot either be read back. The example below works fine with arrays_ok but python crashes with arrays_nok. Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The problem seems to happen only with snappy. {code:python} import pyarrow.parquet as pq import pyarrow as pa arrays_ok = [[0,1]] arrays_ok = [[0,1,1]] arrays_nok = [[0,1,2]] table = pa.Table.from_arrays(arrays_nok,names=['a']) pq.write_table(table,'foo.parquet',compression='snappy') pq.read_table('foo.parquet') {code} > [Python] crashes when reading parquet file compressed with snappy > ----------------------------------------------------------------- > > Key: ARROW-7939 > URL: https://issues.apache.org/jira/browse/ARROW-7939 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.16.0 > Environment: Windows 7 > python 3.6.9 > pyarrow 0.16 from conda-forge > Reporter: Marc Bernot > Priority: Major > > When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 > would make python crash. I drilled down to the simplest example I could find. > It happens that some parquet files created with pyarrow 0.16 cannot either be > read back. The example below works fine with arrays_ok but python crashes > with arrays_nok (and as soon as they are at least three different values > apparently). > Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The > problem seems to happen only with snappy. > {code:python} > import pyarrow.parquet as pq > import pyarrow as pa > arrays_ok = [[0,1]] > arrays_ok = [[0,1,1]] > arrays_nok = [[0,1,2]] > table = pa.Table.from_arrays(arrays_nok,names=['a']) > pq.write_table(table,'foo.parquet',compression='snappy') > pq.read_table('foo.parquet') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)