Shengnan YU created FLINK-32562: ----------------------------------- Summary: FileSink Compactor Service should not use FileWriter from Sink for OutputStreamBasedFileCompactor Key: FLINK-32562 URL: https://issues.apache.org/jira/browse/FLINK-32562 Project: Flink Issue Type: Improvement Components: Connectors / FileSystem Affects Versions: 1.18.0 Reporter: Shengnan YU
Gzip format is designed to be concatenatable but it will be broken by Compactor in FileSink. It is because when Compactor Service create new compacted file by using GzipOutputStream, which will create extra bytes at header, which cause the final file will have extra bytes in header. (Gzip header is presented in every finished part file, we don't need an extra header in compacted file). This is because in Compactor Service, it uses the FileWriter specified in FileSink to create the compacted outputstream. I think will should use an simple bytes ouputstream to concat stream instead, or at least give a option. Currently the ConcatFileCompactor only supports pure text file. Many compressed codec support concating like gzip, bzip2. I think we should support those kind of concating, otherwise people must use RecordWiseCompactorFactor which is very ineffcient. -- This message was sent by Atlassian Jira (v8.20.10#820010)