[ https://issues.apache.org/jira/browse/ARROW-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou resolved ARROW-6060. ----------------------------------- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request 5016 [https://github.com/apache/arrow/pull/5016] > [Python] too large memory cost using pyarrow.parquet.read_table with > use_threads=True > ------------------------------------------------------------------------------------- > > Key: ARROW-6060 > URL: https://issues.apache.org/jira/browse/ARROW-6060 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.14.1 > Reporter: Kun Liu > Assignee: Benjamin Kietzman > Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 3h > Remaining Estimate: 0h > > I tried to load a parquet file of about 1.8Gb using the following code. It > crashed due to out of memory issue. > {code:java} > import pyarrow.parquet as pq > pq.read_table('/tmp/test.parquet'){code} > However, it worked well with use_threads=True as follows > {code:java} > pq.read_table('/tmp/test.parquet', use_threads=False){code} > If pyarrow is downgraded to 0.12.1, there is no such problem. -- This message was sent by Atlassian JIRA (v7.6.14#76016)