Arvid Heise created FLINK-23617:
-----------------------------------

             Summary: Co-locate sink operators with same parallelism
                 Key: FLINK-23617
                 URL: https://issues.apache.org/jira/browse/FLINK-23617
             Project: Flink
          Issue Type: Improvement
          Components: API / DataStream
    Affects Versions: 1.14.0
            Reporter: Arvid Heise


FLINK-19531 introduced the implementation of the Sink interface. It strictly 
cut the different parts of the sink pipeline into 3 operators:

writer -> committer -> global committer

In streaming mode with a parallelism p, the pipeline is executed as follows
writer(parallelism=p) -> committer(parallelism=p) -> global 
committer(parallelism=1).
Here we could bundle writer+committer into one operator.

In batch mode with a parallelism p, the pipeline is executed as follows
writer(parallelism=p) -> committer(parallelism=1) -> global 
committer(parallelism=1).
Here we could bundle committer+global committer into one operator. (Committer 
needs to run with parallelism=1 to create a pipeline region and reduce the risk 
of dataloss during commit; we can hopefully fix it after FLIP-147)

Having fewer operators will decrease the need to copy the committables in the 
operator chain (where we currently mostly use Kryo). Thus, we can implement 
connection/transaction pooling in streaming, where committables are reused 
after successful commit.

The proposal of this ticket is to extract the functionality of the 7 different 
operator implementations with their factories into reusable building blocks and 
use them in 2 operators (writer+committer and committer+global committer).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to