[ https://issues.apache.org/jira/browse/FLINK-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749744#comment-16749744 ]
Fokko Driesprong commented on FLINK-11401: ------------------------------------------ Thanks for the comment [~StephanEwen] The RollOnCheckpoint behavior works very well for our use case, which is just ETL'ing the data from Kafka to a bucket. Since we're using an object store FS Backend (GCS), the renaming constant renaming of the files to `.in-progress` to `.pending` to `.avro` are far from optimal since renaming is very expensive. On HDFS this is a constant and atomic logic operation, in contrast when using an object store where this implies copying the whole file. In the near future, we'll open a PR for the Avro writer, implementing the BulkWriter. Since Avro is still in a container (we want to include the schema in the header of the file), we still need to write a header, before writing the actual rows. Writing this header first would require changing some interfaces. > Allow compression on ParquetBulkWriter > -------------------------------------- > > Key: FLINK-11401 > URL: https://issues.apache.org/jira/browse/FLINK-11401 > Project: Flink > Issue Type: Improvement > Components: Batch Connectors and Input/Output Formats > Affects Versions: 1.7.1 > Reporter: Fokko Driesprong > Assignee: Fokko Driesprong > Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)