Mackenzie created ARROW-3916:
--------------------------------
Summary: [Python] Support caller-provided filesystem in
`ParquetWriter` constructor
Key: ARROW-3916
URL: https://issues.apache.org/jira/browse/ARROW-3916
Project: Apache Arrow
Issue Type: New Feature
Components: Python
Affects Versions: 0.11.1
Reporter: Mackenzie
Currently to write files incrementally to S3, the following pattern appears
necessary:
{{def write_dfs_to_s3(dfs, fname):}}
{{ first_df = dfs[0]}}
{{ table = pa.Table.from_pandas(first_df, preserve_index=False)}}
{{ fs = s3fs.S3FileSystem()}}
{{ fh = fs.open(fname, 'wb')}}
{{ with pq.ParquetWriter(fh, table.schema) as writer:}}
{{ # set file handle on writer so writer manages closing it when it is
itself closed}}
{{ writer.file_handle = fh}}
{{ writer.write_table(table=table)}}
{{ for df in dfs[1:]:}}
{{ table = pa.Table.from_pandas(df, preserve_index=False)}}
{{ writer.write_table(table=table)}}
This works as expected, but is quite roundabout. It would be much easier if
`ParquetWriter` supported `filesystem` as a keyword argument in its
constructor, in which case the `_get_fs_from_path` would be overriden by the
usual pattern of using the kwarg after ensuring it is a proper file system with
`_ensure_filesystem`.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)