[ https://issues.apache.org/jira/browse/PARQUET-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902357#comment-16902357 ]
Wes McKinney commented on PARQUET-1634: --------------------------------------- To assist with this it would make sense to first make sure we have a mock "high latency" filesystem for testing / benchmarking > [C++] Factor out data/dictionary page writes to allow for page buffering > ------------------------------------------------------------------------- > > Key: PARQUET-1634 > URL: https://issues.apache.org/jira/browse/PARQUET-1634 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp > Reporter: Wes McKinney > Priority: Major > Fix For: cpp-1.6.0 > > > Logic that eagerly writes out data pages is hard-coded into the column writer > implementation > https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L565 > For higher-latency file systems like Amazon S3, it makes more sense to buffer > pages in memory and write them in larger batches (and preferably > asynchronously). We should refactor this logic so we have the ability to > choose rather than have the behavior hard-coded -- This message was sent by Atlassian JIRA (v7.6.14#76016)