[ 
https://issues.apache.org/jira/browse/ARROW-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Bernot updated ARROW-7939:
-------------------------------
    Description: 
When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 
would make python crash. I drilled down to the simplest example I could find.

It happens that some parquet files created with pyarrow 0.16 cannot either be 
read back. The example below works fine with arrays_ok but python crashes with 
arrays_nok (and as soon as they are at least three different values apparently).

Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The 
problem seems to happen only with snappy.
{code:python}
import pyarrow.parquet as pq
import pyarrow as pa
arrays_ok = [[0,1]]
arrays_ok = [[0,1,1]]
arrays_nok = [[0,1,2]]
table = pa.Table.from_arrays(arrays_nok,names=['a'])
pq.write_table(table,'foo.parquet',compression='snappy')
pq.read_table('foo.parquet')
{code}

  was:
When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 
would make python crash. I drilled down to the simplest example I could find.

It happens that some parquet files created with pyarrow 0.16 cannot either be 
read back. The example below works fine with arrays_ok but python crashes with 
arrays_nok.

Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The 
problem seems to happen only with snappy.
{code:python}
import pyarrow.parquet as pq
import pyarrow as pa
arrays_ok = [[0,1]]
arrays_ok = [[0,1,1]]
arrays_nok = [[0,1,2]]
table = pa.Table.from_arrays(arrays_nok,names=['a'])
pq.write_table(table,'foo.parquet',compression='snappy')
pq.read_table('foo.parquet')
{code}


> [Python] crashes when reading parquet file compressed with snappy
> -----------------------------------------------------------------
>
>                 Key: ARROW-7939
>                 URL: https://issues.apache.org/jira/browse/ARROW-7939
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.16.0
>         Environment: Windows 7
> python 3.6.9
> pyarrow 0.16 from conda-forge
>            Reporter: Marc Bernot
>            Priority: Major
>
> When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 
> would make python crash. I drilled down to the simplest example I could find.
> It happens that some parquet files created with pyarrow 0.16 cannot either be 
> read back. The example below works fine with arrays_ok but python crashes 
> with arrays_nok (and as soon as they are at least three different values 
> apparently).
> Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The 
> problem seems to happen only with snappy.
> {code:python}
> import pyarrow.parquet as pq
> import pyarrow as pa
> arrays_ok = [[0,1]]
> arrays_ok = [[0,1,1]]
> arrays_nok = [[0,1,2]]
> table = pa.Table.from_arrays(arrays_nok,names=['a'])
> pq.write_table(table,'foo.parquet',compression='snappy')
> pq.read_table('foo.parquet')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to