[ https://issues.apache.org/jira/browse/ARROW-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Keith Curtis updated ARROW-1311: -------------------------------- Attachment: thread-apply-all-bt-full.txt Okay, to get the new backtrace, attached, I had to install the 0.5.0 version from conda: pyarrow: 0.5.0-np112py35_0 conda-forge I see there's a lot of threads in there (64?), more than I expected. I ran it from the ipython qtconsole, maybe that has something to do with it. Hope that helps. > python hangs after write a few parquet tables > --------------------------------------------- > > Key: ARROW-1311 > URL: https://issues.apache.org/jira/browse/ARROW-1311 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.5.0 > Environment: Python 3.5.2, pyarrow 0.5.0 > Reporter: Keith Curtis > Assignee: Wes McKinney > Fix For: 0.6.0 > > Attachments: backtrace.txt, thread-apply-all-bt-full.txt > > > I had a program to read some csv files (a few million rows each, 9 columns), > and converted with: > {code} > import os > import pandas as pd > import pyarrow.parquet as pq > import pyarrow > def to_parquet(output_file, csv_file): > df = pd.read_csv(csv_file) > table = pyarrow.Table.from_pandas(df) > pq.write_table(table, output_file) > {code} > The first csv file would always complete, but python would hang on the second > or third file, and sometimes on a much later file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)