Geoff Quested-Joens created ARROW-8385:
------------------------------------------

             Summary: Crash on parquet.read_table on windows python 3.82
                 Key: ARROW-8385
                 URL: https://issues.apache.org/jira/browse/ARROW-8385
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.16.0
         Environment: Window 10 
python 3.8.2 pip 20.0.2
pip freeze ->
numpy==1.18.2
pandas==1.0.3
pyarrow==0.16.0
python-dateutil==2.8.1
pytz==2019.3
six==1.14.0
            Reporter: Geoff Quested-Joens
         Attachments: crash.parquet

On read of parquet file using pyarrow the program spontaneously exits no thrown 
exceptions windows only. Testing the same setup on linux (debian 10 in a 
Docker) reading the same parquet file is done without issue.

The follow can reproduce the crash in a python 3.8.2 environment env listed 
bellow but is essentially pip install pandas and pyarrow.
{code:python}
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

def test_pandas_write_read():
    df_out = pd.DataFrame.from_dict([{"A":i} for i in range(3)])
    df_out.to_parquet("crash.parquet")
    df_in = pd.read_parquet("crash.parquet")
    print(df_in)

def test_arrow_write_read():
    df = pd.DataFrame.from_dict([{"A":i} for i in range(3)])
    table_out = pa.Table.from_pandas(df)
    pq.write_table(table_out, 'crash.parquet')
    table_in = pq.read_table('crash.parquet')
    print(table_in)

if _name_ == "_main_":
    test_pandas_write_read()
    test_arrow_write_read()
{code}
 The interpreter never reaches the print statements crashing somewhere in the 
call on line 252 of {{parquet.py}} no error is thrown just spontaneous program 
exit.
{code:python}
    self.reader.read_all(...
{code}
In contrast running the same code and python environment in debian 10 there is 
no error reading the parquet files generated by the same windows code. The 
sha2sum compare equal for the crash.parquet generated running on debian and 
windows so something appears to be up with the read. Attached is the 
crash.parquet file generated on my machine.

Obtusely changing the {{range(3)}} to {{range(2)}} gets rid of the crash on 
windows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to