[ https://issues.apache.org/jira/browse/ARROW-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878763#comment-16878763 ]
Joris Van den Bossche commented on ARROW-5825: ---------------------------------------------- [~gsakkis] do you have a reproducible example? You need a parquet dataset with eg an invalid file that raises an error when reading the metadata? > [Python] Exceptions swallowed in ParquetManifest._visit_directories > ------------------------------------------------------------------- > > Key: ARROW-5825 > URL: https://issues.apache.org/jira/browse/ARROW-5825 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: George Sakkis > Priority: Major > Labels: Parquet > > {{ParquetManifest._visit_directories}} uses a {{ThreadPoolExecutor}} to visit > partitioned parquet datasets concurrently, it waits for them to finish but > doesn't check if the respective futures have failed or not. This is quite > tricky to detect and debug as an exception is either raised later as a a > side-effect or (perhaps worse) it passes silently. > Observed on 0.12.1 but appears to be on latest master too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)