Hi Fabian, thanks for drafting the FLIP! This is a really nice and useful
topic to target ;)

Few thoughts on the option 2)

The file compaction is by definition quite costly IO bound operation. If I
understand the proposal correctly, the aggregation itself would run during
operator (aggregator) checkpoint. Would this significantly increase the
checkpoint duration?

Compaction between different sub-tasks incur additional network IO (to
fetch the raw non-compacted files from the remote filesystem), so this
could quickly become a bottleneck. Basically we're decreasing the sink
parallelism (possible throughput) to parallelism of the aggregator.

To be really effective here, compaction would ideally be able to compact
files from multiple checkpoints. However there is a huge tradeoff between
latency a efficiency (especially with exactly once). Is this something
worth exploring?

Best,
D.

On Wed, Nov 3, 2021 at 11:22 AM Till Rohrmann <trohrm...@apache.org> wrote:

> Thanks for creating this FLIP Fabian.
>
> From your description I would be in favour of option 2 for the following
> reasons: Assuming that option 2 solves all our current problems, it seems
> like the least invasive change and smallest in scope. Your main concern is
> that it might not cover future use cases. Do you have some specific use
> cases in mind? I think it is ok to extend the existing interfaces in order
> to cover new requirements once we learn about them. The important bit is
> that we don't implement a solution from which we know that it won't solve
> all requirements at the time of implementation. What I am missing a bit
> from the description is how option 2 will behave wrt checkpoints and the
> batch execution mode.
>
> Option 1 will require the generalization of the operator coordinator
> framework to participate in the checkpointing at an arbitrary position in
> the topology. Moreover, it seems as if this option exploits the JobMaster
> process to run some user code that could also be done in a parallelism 1
> operator (so option 2 should be able to solve this use case).
>
> Option 3 sounds like the most generic approach. But with a lot of power
> comes also some responsibility and I could see that being able to insert an
> arbitrary topology that has to work with streaming and batch can become
> quite a challenge for sink developers. I think it would be easier if there
> were more fixed dimensions for a sink developer if possible.
>
> I've left some more comments on the wiki page. PTAL.
>
> Cheers,
> Till
>
> On Tue, Nov 2, 2021 at 5:44 PM Fabian Paul <fabianp...@ververica.com>
> wrote:
>
> > Hi all,
> >
> > More and more data lake sinks rely on columnar formats which benefit from
> > few larger files than a lot of small files (read amplification).
> > Our current FileSink cannot ensure a certain size when writing to an
> > external filesystem which I call the small file compaction
> > problem. Unfortunately, there is no good way with the current unified
> Sink
> > operator topology to support this use case.
> >
> > I would like to propose to extend the unified Sink interface which we
> > proposed in FLIP-143 to resolve the small file compaction problem.
> > Therefore I have created FLIP-191 [1] to outline three different options
> > how the problem could be addressed.
> >
> > 1. Global Sink Coordinator
> > 2. Committable Aggregator Operator
> > 3. Custom sink topology
> >
> > Further information about the alternatives can be found in the document
> > and I would appreciate your feedback to decide on which way to go to
> > finally resolve this problem.
> >
> > Best,
> > Fabian
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-191%3A+Extend+unified+Sink+interface+to+support+small+file+compaction
> >
> >
>

Reply via email to