Hi everyone,
I would like to start a discussion on FLIP-379: Dynamic source parallelism
inference for batch jobs[1].

In general, there are three main ways to set source parallelism for batch
jobs:
(1) User-defined source parallelism.
(2) Connector static parallelism inference.
(3) Dynamic parallelism inference.

Compared to manually setting parallelism, automatic parallelism inference
is easier to use and can better adapt to varying data volumes each day.
However, static parallelism inference cannot leverage runtime information,
resulting in inaccurate parallelism inference. Therefore, for batch jobs,
dynamic parallelism inference is the most ideal, but currently, the support
for adaptive batch scheduler is not very comprehensive.

Therefore, we aim to introduce a general interface that enables the
adaptive batch scheduler to dynamically infer the source parallelism at
runtime. Please refer to the FLIP[1] document for more details about the
proposed design and implementation.

I also thank Zhu Zhu and LiJie Wang for their suggestions during the
pre-discussion.
Looking forward to your feedback and suggestions, thanks.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs

Best regards,
Xia

Reply via email to