[ 
https://issues.apache.org/jira/browse/ARROW-8385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17081236#comment-17081236
 ] 

Geoff Quested-Jones commented on ARROW-8385:
--------------------------------------------

Hi Wes, thanks for looking into this, sorry for the delay in getting back to 
you I'm currently marooned in Bangladesh. My machine is possibly a little bit 
on the venerable side of things cpu: i7-3610QM ram 24 GB gpu: nvidia quadro 
k2000m.

I have had some mixed results:
 * Following your tests above I did a conda install of pyarrow in anaconda 
2020.02 python 3.8.1. This was successfully able to read without any crash. (/)
 ** (Python 3.8.1 (default, Mar 2 2020, 13:06:26) [MSC v.1916 64 bit (AMD64)] 
:: Anaconda, Inc. on win32)
 ** NB for this I used: {{conda install -c conda-forge pyarrow}}
 * I have also created fresh env in python 3.7.7 x64 (python.org) and loaded 
the the file un-successfully (x)
 ** Python 3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020, 10:41:24) [MSC v.1900 64 
bit (AMD64)] on win32
 ** python -m pip install pyarrow
 * I am in the process of pulling and building the pyarrow from the git master 
branch and will try to see if I can reproduce here as then i might be able to 
step into the dll I will report back.

 

> [Python][Parquet] Crash on parquet.read_table on windows python 3.82
> --------------------------------------------------------------------
>
>                 Key: ARROW-8385
>                 URL: https://issues.apache.org/jira/browse/ARROW-8385
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.16.0
>         Environment: Window 10 
> python 3.8.2 pip 20.0.2
> pip freeze ->
> numpy==1.18.2
> pandas==1.0.3
> pyarrow==0.16.0
> python-dateutil==2.8.1
> pytz==2019.3
> six==1.14.0
>            Reporter: Geoff Quested-Jones
>            Priority: Major
>         Attachments: crash.parquet
>
>
> On read of parquet file using pyarrow the program spontaneously exits no 
> thrown exceptions windows only. Testing the same setup on linux (debian 10 in 
> a Docker) reading the same parquet file is done without issue.
> The follow can reproduce the crash in a python 3.8.2 environment env listed 
> bellow but is essentially pip install pandas and pyarrow.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> def test_pandas_write_read():
>     df_out = pd.DataFrame.from_dict([{"A":i} for i in range(3)])
>     df_out.to_parquet("crash.parquet")
>     df_in = pd.read_parquet("crash.parquet")
>     print(df_in)
> def test_arrow_write_read():
>     df = pd.DataFrame.from_dict([{"A":i} for i in range(3)])
>     table_out = pa.Table.from_pandas(df)
>     pq.write_table(table_out, 'crash.parquet')
>     table_in = pq.read_table('crash.parquet')
>     print(table_in)
> if _name_ == "_main_":
>     test_pandas_write_read()
>     test_arrow_write_read()
> {code}
>  The interpreter never reaches the print statements crashing somewhere in the 
> call on line 252 of {{parquet.py}} no error is thrown just spontaneous 
> program exit.
> {code:python}
>     self.reader.read_all(...
> {code}
> In contrast running the same code and python environment in debian 10 there 
> is no error reading the parquet files generated by the same windows code. The 
> sha2sum compare equal for the crash.parquet generated running on debian and 
> windows so something appears to be up with the read. Attached is the 
> crash.parquet file generated on my machine.
> Obtusely changing the {{range(3)}} to {{range(2)}} gets rid of the crash on 
> windows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to