[ https://issues.apache.org/jira/browse/ARROW-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073860#comment-17073860 ]
Antoine Pitrou commented on ARROW-7939: --------------------------------------- I've also checked that the nightly builds work fine. [~marcbernot] Can you try to install a nightly build? {code:java} conda update -c arrow-nightlies pyarrow {code} > [Python] crashes when reading parquet file compressed with snappy > ----------------------------------------------------------------- > > Key: ARROW-7939 > URL: https://issues.apache.org/jira/browse/ARROW-7939 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.16.0 > Environment: Windows 7 > python 3.6.9 > pyarrow 0.16 from conda-forge > Reporter: Marc Bernot > Assignee: Wes McKinney > Priority: Major > Fix For: 0.17.0 > > > When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 > would make python crash. I drilled down to the simplest example I could find. > It happens that some parquet files created with pyarrow 0.16 cannot either be > read back. The example below works fine with arrays_ok but python crashes > with arrays_nok (and as soon as they are at least three different values > apparently). > Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The > problem seems to happen only with snappy. > {code:python} > import pyarrow.parquet as pq > import pyarrow as pa > arrays_ok = [[0,1]] > arrays_ok = [[0,1,1]] > arrays_nok = [[0,1,2]] > table = pa.Table.from_arrays(arrays_nok,names=['a']) > pq.write_table(table,'foo.parquet',compression='snappy') > pq.read_table('foo.parquet') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)