Joris Van den Bossche created ARROW-7547:
--------------------------------------------
Summary: [C++] [Python] [Dataset] Additional reader options in
ParquetFileFormat
Key: ARROW-7547
URL: https://issues.apache.org/jira/browse/ARROW-7547
Project: Apache Arrow
Issue Type: Improvement
Components: C++ - Dataset, Python
Reporter: Joris Van den Bossche
[looking into using the datasets machinery in the current python parquet code]
In the current python API, we expose several options that influence reading the
parquet file (eg {{read_dictionary}} to indicate to read certain BYTE_ARRAY
columns directly into a dictionary type, or {{memory_map}}, {{buffer_size}}).
Those could be added to {{ParquetFileFormat}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)