[ https://issues.apache.org/jira/browse/ARROW-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-3916: ---------------------------------- Labels: parquet pull-request-available (was: parquet) > [Python] Support caller-provided filesystem in `ParquetWriter` constructor > -------------------------------------------------------------------------- > > Key: ARROW-3916 > URL: https://issues.apache.org/jira/browse/ARROW-3916 > Project: Apache Arrow > Issue Type: New Feature > Components: Python > Affects Versions: 0.11.1 > Reporter: Mackenzie > Priority: Major > Labels: parquet, pull-request-available > Fix For: 0.12.0 > > > Currently to write files incrementally to S3, the following pattern appears > necessary: > {{def write_dfs_to_s3(dfs, fname):}} > {{ first_df = dfs[0]}} > {{ table = pa.Table.from_pandas(first_df, preserve_index=False)}} > {{ fs = s3fs.S3FileSystem()}} > {{ fh = fs.open(fname, 'wb')}} > {{ with pq.ParquetWriter(fh, table.schema) as writer:}} > {{ # set file handle on writer so writer manages closing it when it > is itself closed}} > {{ writer.file_handle = fh}} > {{ writer.write_table(table=table)}} > {{ for df in dfs[1:]:}} > {{ table = pa.Table.from_pandas(df, preserve_index=False)}} > {{ writer.write_table(table=table)}} > This works as expected, but is quite roundabout. It would be much easier if > `ParquetWriter` supported `filesystem` as a keyword argument in its > constructor, in which case the `_get_fs_from_path` would be overriden by the > usual pattern of using the kwarg after ensuring it is a proper file system > with `_ensure_filesystem`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)