Arvid Heise created FLINK-23617: ----------------------------------- Summary: Co-locate sink operators with same parallelism Key: FLINK-23617 URL: https://issues.apache.org/jira/browse/FLINK-23617 Project: Flink Issue Type: Improvement Components: API / DataStream Affects Versions: 1.14.0 Reporter: Arvid Heise
FLINK-19531 introduced the implementation of the Sink interface. It strictly cut the different parts of the sink pipeline into 3 operators: writer -> committer -> global committer In streaming mode with a parallelism p, the pipeline is executed as follows writer(parallelism=p) -> committer(parallelism=p) -> global committer(parallelism=1). Here we could bundle writer+committer into one operator. In batch mode with a parallelism p, the pipeline is executed as follows writer(parallelism=p) -> committer(parallelism=1) -> global committer(parallelism=1). Here we could bundle committer+global committer into one operator. (Committer needs to run with parallelism=1 to create a pipeline region and reduce the risk of dataloss during commit; we can hopefully fix it after FLIP-147) Having fewer operators will decrease the need to copy the committables in the operator chain (where we currently mostly use Kryo). Thus, we can implement connection/transaction pooling in streaming, where committables are reused after successful commit. The proposal of this ticket is to extract the functionality of the 7 different operator implementations with their factories into reusable building blocks and use them in 2 operators (writer+committer and committer+global committer). -- This message was sent by Atlassian Jira (v8.3.4#803005)