This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 4db378fae30 [SPARK-44745][DOCS][K8S] Document shuffle data recovery from the remounted K8s PVCs 4db378fae30 is described below commit 4db378fae30733cbd2be41e95a3cd8ad2184e06f Author: Dongjoon Hyun <dongj...@apache.org> AuthorDate: Wed Aug 9 15:25:33 2023 -0700 [SPARK-44745][DOCS][K8S] Document shuffle data recovery from the remounted K8s PVCs ### What changes were proposed in this pull request? This PR aims to document an example of shuffle data recovery configuration from the remounted K8s PVCs. ### Why are the changes needed? This will help the users use this feature more easily. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review because this is a doc-only change. ![Screenshot 2023-08-09 at 1 39 48 PM](https://github.com/apache/spark/assets/9700541/8cc7240b-570d-4c2e-b90a-54795c18df0a) ``` $ kubectl logs -f xxx-exec-16 | grep Kube ... 23/08/09 21:09:21 INFO KubernetesLocalDiskShuffleExecutorComponents: Try to recover shuffle data. 23/08/09 21:09:21 INFO KubernetesLocalDiskShuffleExecutorComponents: Found 192 files 23/08/09 21:09:21 INFO KubernetesLocalDiskShuffleExecutorComponents: Try to recover /data/spark-x/executor-x/blockmgr-41a810ea-9503-447b-afc7-1cb104cd03cf/11/shuffle_0_11160_0.data 23/08/09 21:09:21 INFO KubernetesLocalDiskShuffleExecutorComponents: Try to recover /data/spark-x/executor-x/blockmgr-41a810ea-9503-447b-afc7-1cb104cd03cf/0e/shuffle_0_10063_0.data 23/08/09 21:09:21 INFO KubernetesLocalDiskShuffleExecutorComponents: Try to recover /data/spark-x/executor-x/blockmgr-41a810ea-9503-447b-afc7-1cb104cd03cf/0e/shuffle_0_10283_0.data 23/08/09 21:09:21 INFO KubernetesLocalDiskShuffleExecutorComponents: Ignore a non-shuffle block file. ``` Closes #42417 from dongjoon-hyun/SPARK-44745. Authored-by: Dongjoon Hyun <dongj...@apache.org> Signed-off-by: Dongjoon Hyun <dongj...@apache.org> --- docs/running-on-kubernetes.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index d3953592c4e..707a76196f3 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -394,6 +394,13 @@ spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount. spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false ``` +To enable shuffle data recovery feature via the built-in `KubernetesLocalDiskShuffleDataIO` plugin, we need to have the followings. You may want to enable `spark.kubernetes.driver.waitToReusePersistentVolumeClaim` additionally. + +``` +spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data/spark-x/executor-x +spark.shuffle.sort.io.plugin.class=org.apache.spark.shuffle.KubernetesLocalDiskShuffleDataIO +``` + If no volume is set as local storage, Spark uses temporary scratch space to spill data to disk during shuffles and other operations. When using Kubernetes as the resource manager the pods will be created with an [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume mounted for each directory listed in `spark.local.dir` or the environment variable `SPARK_LOCAL_DIRS` . If no directories are explicitly specified then a default directory is created and configured [...] `emptyDir` volumes use the ephemeral storage feature of Kubernetes and do not persist beyond the life of the pod. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org