Re: [VOTE] FLIP-586: Composable Parallelism Alignment Modes for Flink Autoscaler

Dennis-Mircea Ciupitu Sun, 07 Jun 2026 03:16:11 -0700

Thanks Vivek for the detailed review, and thanks Gyula for weighing in. You
are both right on the central point. The flat mode plus fallback surface is
harder to reason about than it should be, and EVENLY_SPREAD should not be
silently redefined. I want to revise the FLIP rather than defend the
current shape, and I think the result is genuinely better.

The reframing is: The alignment behavior has three real degrees of freedom
under the hood - where we search (within the current-to-target range, or
above the target), how strict we are about accepting a parallelism (exact
divisor, load reducing, or any non-empty), and what we do on failure
(block, or relax). The original proposal exposed these as a mode times
fallback cross-product, which is where the complexity came from. Most of
that cross-product is not actually meaningful (many combinations are
redundant or no-ops), and you are right that asking users to navigate it
does not serve them.

Revised design, a small named front door over those axes:

   - BALANCED (default) - Avoid skew, and if no clean divisor exists, scale
   anyway rather than get stuck. This reproduces today's default behavior
   exactly.
   - STRICT_DIVISOR - Only scale to an exact divisor between current and
   target. If none exists, do not scale and emit an event.
   - MAXIMIZE_UTILISATION - Always reduce per-subtask load above the
   target, snapping to a divisor when reachable. Unchanged from today.

Three everyday modes, which I believe is the "3 reasonable modes" you asked
for, Gyula. The parallelism alignment schema itself is genuinely complex (a
target parallelism interacts with the key group or partition count across
two search regions and several acceptance policies), so for the small set
of advanced users who need to tune it there is an optional ADVANCED mode.
ADVANCED composes the existing, already-proven built-in strategies as a
primary plus an optional fallback, with a validator that rejects the
redundant and self-referential combinations at config load. It deliberately
keeps the current strategies rather than a reduced model, because that is
exactly the expressiveness an advanced user reaches for. So the confusing
cross-product is gone from the front door, the complete schema stays
available to those who need it, and the few meaningful compositions are
reachable and guarded.

On the EVENLY_SPREAD migration, which I think is the most important fix. I
am retiring the EVENLY_SPREAD token rather than redefining it. Existing
configs that pin EVENLY_SPREAD keep mapping to BALANCED, the algorithm that
string actually had, through the old key kept as a deprecated option.
Anyone who wants the new exact-only behavior uses STRICT_DIVISOR. That
removes the silent behavior change entirely.

On JobVertexScaler, you are right that it had taken on too much. I
extracted the alignment logic into a dedicated alignment package (it shrank
JobVertexScaler from ~1270 to ~660 lines), which also gives the strategy
resolution and the validator a clean home and removes the duplicated
scale-up and scale-down paths. The built-in strategies now sit behind an
@Experimental AlignmentStrategy interface, so a pluggable custom-strategy
loader is a clean follow-up FLIP later rather than hard-coded search logic
now. If we do ship that loader, it would follow the same ServiceLoader
plugin convention as FLIP-575 (Scaling Executor Plugin SPI) rather than a
new mechanism, and it is complementary to FLIP-575 since alignment is a
per-vertex step inside the parallelism computation while FLIP-575
intercepts the final decisions.

On the config key naming you raised on the discussion thread, Gyula, I
agree the current key is poor. Rather than key-group-alignment, I went one
step further to job.autoscaler.scaling.alignment.mode (with the advanced
primary and fallback keys under the same scaling.alignment prefix), keeping
the old key as a fallback so existing configs keep working. The reason is
that the feature aligns parallelism to key groups or source partitions
equally, so a neutral "alignment" name fits better than
"key-group-alignment", and it matches the new AlignmentMode and
AlignmentStrategy types.

I created a draft redesigned FLIP doc [1] and updated the draft PR [2]
reflecting all of the above. Does this direction address the concerns?

Best regards,
Dennis

[1]
https://docs.google.com/document/d/18nh9D1fYqErky12WHznzSufXzt6rkm3tGbNkcfgTTvE/edit?usp=sharing
[2] https://github.com/apache/flink-kubernetes-operator/pull/1088

On Fri, Jun 5, 2026 at 9:43 PM <[email protected]> wrote:

> Thanks Vivek,
>
> I am inclined to agree here that making the config complex this way
> doesn’t really serve most users. If we could create 3 reasonable modes that
> would cover most use cases that would be best.
>
> Cheers
> Gyula
>
> Sent from my iPhone
>
> > On 5 Jun 2026, at 16:06, Vivek Jhaver <[email protected]> wrote:
> >
> > Vivek
>

Re: [VOTE] FLIP-586: Composable Parallelism Alignment Modes for Flink Autoscaler

Reply via email to