dongjoon-hyun opened a new pull request, #55642:
URL: https://github.com/apache/spark/pull/55642
### What changes were proposed in this pull request?
This PR aims to support a new `ExecutorPVCResizePlugin` that monitors
executor PVC disk usage
and grows each PVC's `spec.resources.requests.storage` when usage exceeds a
threshold.
The executor side reports the max filesystem usage ratio across
`DiskBlockManager.localDirs`. The driver side patches the executor pod's
PVCs to
`currentSize * (1 + factor)` when the reported ratio exceeds the threshold.
New configurations:
| Key | Default | Meaning |
|---|---|---|
| `spark.kubernetes.executor.pvc.resizeInterval` | `0min` | Resize check
interval. `0` disables. |
| `spark.kubernetes.executor.pvc.resizeThreshold` | `0.5` | Usage ratio
above which a resize is triggered. |
| `spark.kubernetes.executor.pvc.resizeFactor` | `1.0` | Growth factor. |
### Why are the changes needed?
PVC-backed `SPARK_LOCAL_DIRS` must be sized conservatively up front to avoid
mid-job disk-full failures, which wastes storage cost. `ExecutorResizePlugin`
already established the observe-and-patch pattern for memory; this extends
it to
PVC storage.
### Does this PR introduce _any_ user-facing change?
No. The user needs to set this to `spark.plugins` explicitly.
**SUBMIT**
```
bin/spark-submit \
--master k8s://$K8S_MASTER \
--deploy-mode cluster \
-c spark.executor.cores=4 \
-c spark.executor.memory=4g \
-c spark.kubernetes.container.image=docker.apple.com/d_hyun/spark:20260430 \
-c spark.kubernetes.authenticate.driver.serviceAccountName=spark \
-c spark.kubernetes.driver.pod.name=pi \
-c spark.kubernetes.executor.podNamePrefix=pi \
-c
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
\
-c
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false
\
-c
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
\
-c
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=50Gi
\
-c
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=gp3
\
-c spark.kubernetes.driver.podTemplateFile=eks-root-pod.yml \
-c spark.kubernetes.executor.podTemplateFile=eks-root-pod.yml \
-c
spark.plugins=org.apache.spark.scheduler.cluster.k8s.ExecutorPVCResizePlugin \
-c spark.kubernetes.executor.pvc.resizeInterval=1m \
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples.jar 400000
```
**EXECUTOR SIZE REPORTING**
```
$ kubectl logs -f pi-exec-1 | grep Plugin
26/05/01 01:22:54 INFO ExecutorPVCResizeExecutorPlugin: Reporting max PVC
disk usage ratio for executor 1: 0.6136656796630462
26/05/01 01:23:54 INFO ExecutorPVCResizeExecutorPlugin: Reporting max PVC
disk usage ratio for executor 1: 0.30591566408202353
```
**RESIZED PVC**
```
$ kubectl get pvc
NAME STATUS VOLUME
CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
pi-exec-1-pvc-0 Bound pvc-d279a3da-ddfb-41c2-a32b-0f2bd83941c4
107374182400 RWOP gp3 <unset> 2m28s
pi-exec-2-pvc-0 Bound pvc-79f092d3-4a8d-4981-946d-d745d4038fd6 50Gi
RWOP gp3 <unset> 2m28s
```
### How was this patch tested?
Pass the CIs with a new `ExecutorPVCResizePluginSuite`.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7 (1M context)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]