Re: Batch compressed file output

2020-11-27 Thread Matthias Pohl
Hi Flavio,
others might have better ideas to solve this but I'll give it a try: Have
you considered extending FileOutputFormat to achieve what you need? That
approach (which is discussed in [1]) sounds like something you could do.
Another pointer I want to give is the DefaultRollingPolicy [2]. It looks
like it partially does what you're looking for. I'm adding Kostas to this
conversation as he worked on the RollingPolicy. Maybe, he has some more
insights.

I hope that helps.

Best,
Matthias

[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/output-writer-td2296.html
[2]
https://github.com/apache/flink/blob/5ff96966b59e0d9a7b55ebae9e252b1c9aafd4ea/flink-connectors/flink-file-sink-common/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.java#L40

On Fri, Nov 27, 2020 at 11:07 AM Flavio Pompermaier 
wrote:

> Hello guys,
> I have to write my batch data (Dataset) to a file format. Actually
> what I need to do is:
>
>1. split the data if it exceeds some size threshold  (by line count or
>max MB)
>2. compress the output data (possibly without converting to the hadoop
>format)
>
> Are there any suggestions / recommendations about that?
>
> Best,
> Flavio
>


Batch compressed file output

2020-11-27 Thread Flavio Pompermaier
Hello guys,
I have to write my batch data (Dataset) to a file format. Actually
what I need to do is:

   1. split the data if it exceeds some size threshold  (by line count or
   max MB)
   2. compress the output data (possibly without converting to the hadoop
   format)

Are there any suggestions / recommendations about that?

Best,
Flavio