Re: Flink operator max parallelism and rescalable jobs

Maximilian Michels Fri, 16 Nov 2018 03:42:02 -0800

Hi Jozef,

The main blocker for rescaling Beam pipelines on Flink was the use ofKey Group state. This splits each operator state additionally into Npartitions, such that N * P = MAX_PARALLELISM, where P is theparallelism of the operator.

This has largely been done. However, it is not complete. If you look atthe way the UnboundedSourceWrapper snapshots its state, you will seethat it does not support Key Groups. Thus, if you increase theparallelism, one of the new parallel instances of the operator will_not_ receive state and thus behave differently.

I think we could migrate UnboundedSourceWrapper to KeyGroups and thenalso leverage spread of the Kafka partitions.


Thanks,
Max

On 16.11.18 10:57, Jozef Vilcek wrote:

Hi,
I want to collect some feedback on rescaling streaming Beam pipeline onFlink runner. Flink seems to be able to re-scale jobs, which in Beamterms means changing the parallelism in Beam. However, one have to makesure that state can rescale as well to the predefined MAX parallelism.Max parallelism must be set for job on FlinkRunner.
Flink supports fiddling with max parallelism on global, environment andoperator level. Changes in operator level are not possible with beam. Ifound this JIRA which seems to be inconclusive if changes in operatorparallelism make sense to adopt somehow in Beam
https://issues.apache.org/jira/browse/BEAM-68
I did try to set max parallelism to environment via my local patch. Myjob did launch and not crash like before when I bumped parallelism += 1.But there was one drawback as far as I know. My test job reads fromkafka and after launching job from savepoint point, one partition doesnot continue from offset in savepoint but according to what is definedby auto.offset.reset (my case 'latest') which is not great.
My questions:
1. Should re-scale work for beam if runner does support it or there canbe some incompatibilities in general depending on how particular runnerworks
2. Did anyone have a success with Flink and rescale? Honestly, not surehow well it behaves in native Flink. Never tried it
3. Why does kafka not redistribute stored partition offsets afterchenging parallelism?
4. Is BEAM-68 still relevant?

Many thanks,
Jozef

Re: Flink operator max parallelism and rescalable jobs

Reply via email to