Hi Flavio, others might have better ideas to solve this but I'll give it a try: Have you considered extending FileOutputFormat to achieve what you need? That approach (which is discussed in [1]) sounds like something you could do. Another pointer I want to give is the DefaultRollingPolicy [2]. It looks like it partially does what you're looking for. I'm adding Kostas to this conversation as he worked on the RollingPolicy. Maybe, he has some more insights.
I hope that helps. Best, Matthias [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/output-writer-td2296.html [2] https://github.com/apache/flink/blob/5ff96966b59e0d9a7b55ebae9e252b1c9aafd4ea/flink-connectors/flink-file-sink-common/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.java#L40 On Fri, Nov 27, 2020 at 11:07 AM Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hello guys, > I have to write my batch data (Dataset<Row>) to a file format. Actually > what I need to do is: > > 1. split the data if it exceeds some size threshold (by line count or > max MB) > 2. compress the output data (possibly without converting to the hadoop > format) > > Are there any suggestions / recommendations about that? > > Best, > Flavio >