dongjoon-hyun opened a new pull request, #55615:
URL: https://github.com/apache/spark/pull/55615
### What changes were proposed in this pull request?
This PR restricts `ExecutorResizePlugin` to activate only when the `direct`
pods allocator (`ExecutorPodsAllocator`) is configured via
`spark.kubernetes.allocation.pods.allocator`.
When the configured allocator is anything else (`statefulset`, `deployment`,
or a user-supplied class), `ExecutorResizeDriverPlugin.init()` now logs a
warning and returns early without creating a Kubernetes client or
scheduling the periodic memory check.
The check uses a literal string comparison against `"direct"`, matching the
existing precedent in `KubernetesClusterSchedulerBackend` and
`BasicExecutorFeatureStep` which compare the same config to `"deployment"`.
### Why are the changes needed?
`ExecutorResizePlugin` resizes individual executor pods in place via
`pods().withName(...).subresource("resize").patch(...)`. This works
correctly only when Spark itself owns each pod, i.e. with the `direct`
allocator.
For the other allocators, the higher-level controller's pod template is
the source of truth:
- `statefulset` → `StatefulSet.spec.template`
- `deployment` → `Deployment.spec.template`
Patching a single pod under those allocators has the following problems:
1. **Lost on recreation.** Whenever the controller recreates a pod
(rolling update, node failure, eviction, scaling event), the new pod
is materialized from the unchanged template, reverting the resize.
2. **Drift / reconciliation.** The live pod diverges from the controller's
template, which can be reconciled away on the next rollout.
3. **Bad interaction with dynamic allocation.** With the `deployment`
allocator, `pod-deletion-cost` is used to choose deletion targets;
asymmetric memory across pods can produce unintended churn.
Because none of these failure modes were guarded against, the plugin
previously appeared to work but produced silently incorrect behavior on
non-`direct` allocators. A clear early-return with a warning makes the
limitation explicit.
### Does this PR introduce _any_ user-facing change?
Yes. Behavior of `ExecutorResizePlugin` changes when
`spark.kubernetes.allocation.pods.allocator` is not `direct`:
- **Before:** the plugin started normally and attempted to patch executor
pods, with results that could be reverted by the StatefulSet/Deployment
controller.
- **After:** the plugin logs a warning at driver startup and does not
schedule any work. The driver continues to run normally.
When the allocator is `direct` (the default), behavior is unchanged.
### How was this patch tested?
- Existing tests in `ExecutorResizePluginSuite` continue to pass — they
exercise the resize logic via the private `checkAndIncreaseMemory`
method and are unaffected by the new guard.
- Two new tests cover the early-return path:
- `init returns early when pods allocator is 'statefulset'`
- `init returns early when pods allocator is 'deployment'`
Each builds a `SparkConf` with the corresponding allocator, calls
`init`, and asserts that the returned map is empty and the private
`kubernetesClient` field remains `null`.
Run:
```
build/sbt -Pkubernetes 'kubernetes/testOnly *ExecutorResizePluginSuite'
```
Result: `Tests: succeeded 11, failed 0`.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]