[ https://issues.apache.org/jira/browse/ARROW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838442#comment-16838442 ]
Joris Van den Bossche commented on ARROW-3424: ---------------------------------------------- Currently, a list of files is already supported in {{ParquetDataset}}. So something like this (that would address the SO question, I think) works: {code:java} dataset = pq.ParquetDataset(['part0.parquet', 'part1.parquet']) dataset.read_pandas().to_pandas() {code} Do we think that is enough support? (if so, this issue can be closed I think) Or do we want to add this to {{pq.read_table}} ? (which eg also accepts a directory name, which is then passed through to {{ParquetDataset}}. We could do a similar pass through for a list of paths) > [Python] Improved workflow for loading an arbitrary collection of Parquet > files > ------------------------------------------------------------------------------- > > Key: ARROW-3424 > URL: https://issues.apache.org/jira/browse/ARROW-3424 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: Wes McKinney > Priority: Major > Labels: parquet > Fix For: 0.14.0 > > > See SO question for use case: > https://stackoverflow.com/questions/52613682/load-multiple-parquet-files-into-dataframe-for-analysis -- This message was sent by Atlassian JIRA (v7.6.3#76005)