[ https://issues.apache.org/jira/browse/ARROW-13763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alessandro Molina updated ARROW-13763: -------------------------------------- Fix Version/s: 8.0.0 (was: 7.0.0) > [Python] Files opened for read with pyarrow.parquet are not explicitly closed > ----------------------------------------------------------------------------- > > Key: ARROW-13763 > URL: https://issues.apache.org/jira/browse/ARROW-13763 > Project: Apache Arrow > Issue Type: Bug > Components: Parquet, Python > Affects Versions: 5.0.0 > Environment: fsspec 2021.4.0 > Reporter: Richard Kimoto > Assignee: Alessandro Molina > Priority: Major > Fix For: 8.0.0 > > Attachments: test.py > > > It appears that files opened for read using pyarrow.parquet.read_table (and > therefore pyarrow.parquet.ParquetDataset) are not explicitly closed. > This seems to be the case for both use_legacy_dataset=True and False. The > files don't remain open at the os level (verified using lsof). They do > however seem to rely on the python gc to close. > My use case is that i'd like to use a custom fsspec filesystem that > interfaces to an s3 like API. It handles the remote download of the parquet > file and passes to pyarrow a handle of a temporary file downloaded locally. > It then is looking for an explicit close() or __exit__() to then clean up the > temp file. -- This message was sent by Atlassian Jira (v8.20.1#820001)