[ 
https://issues.apache.org/jira/browse/SPARK-40298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600631#comment-17600631
 ] 

Dongjoon Hyun commented on SPARK-40298:
---------------------------------------

Thank you for trying Apache Spark feature, [~todd5167] , but as [~hyukjin.kwon] 
mentioned, this is more like a question.

First, could you provide a reproducible test case for your case? I want to help 
you.

Second, I assume that you verified KubernetesLocalDiskShuffleExecutorComponents 
logs correctly. However, the following could be partial observation.
{quote}It can be confirmed that the pvc has been multiplexed by other pods, and 
the Index and data data information has been sent
{quote}
SPARK-35593 was designed to help the recovery and to improve the stability at 
the best effort approach without any regressions which mean SPARK-35593 doesn't 
aim to block the existing Spark features like re-computation or executor 
allocation with new PVC. More specifically, there exists two cases where 
Spark's processing is faster than the recovery.

Case 1. When the Spark executor termination is a little slow and PVC is not 
available cleanly from K8s control plan for some some reason to Spark driver, 
Spark driver creates a new executor with a new PVC (of course, driver owned). 
In this case, you can have more PVCs than the executors. You can confirm this 
case with `kubectl` command.

Case 2. When the Spark processing is faster that Spark&K8s's executor 
allocation(Pod Creation+PVC assignment+Docker Image Downloading+...), Spark 
recomputes the lineage with the running executors without waiting new executor 
allocation (or recovery from it). It's Spark's original design. It can happen 
always.

> shuffle data recovery on the reused PVCs  no effect
> ---------------------------------------------------
>
>                 Key: SPARK-40298
>                 URL: https://issues.apache.org/jira/browse/SPARK-40298
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.2.2
>            Reporter: todd
>            Priority: Major
>         Attachments: 1662002808396.jpg, 1662002822097.jpg
>
>
> I use spark3.2.2 to test the [ Support shuffle data recovery on the reused 
> PVCs (SPARK-35593) ] feature.I found that when shuffle read fails, data is 
> still read from source.
> It can be confirmed that the pvc has been multiplexed by other pods, and the 
> Index and data data information has been sent
> *This is my spark configuration information:*
> --conf spark.driver.memory=5G 
> --conf spark.executor.memory=15G 
> --conf spark.executor.cores=1
> --conf spark.executor.instances=50
> --conf spark.sql.shuffle.partitions=50
> --conf spark.dynamicAllocation.enabled=false
> --conf spark.kubernetes.driver.reusePersistentVolumeClaim=true
> --conf spark.kubernetes.driver.ownPersistentVolumeClaim=true
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.claimName=OnDemand
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.storageClass=gp2
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.sizeLimit=100Gi
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.path=/tmp/data
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.readOnly=false
> --conf spark.executorEnv.SPARK_EXECUTOR_DIRS=/tmp/data
> --conf 
> spark.shuffle.sort.io.plugin.class=org.apache.spark.shuffle.KubernetesLocalDiskShuffleDataIO
> --conf spark.kubernetes.executor.missingPodDetectDelta=10s
> --conf spark.kubernetes.executor.apiPollingInterval=10s
> --conf spark.shuffle.io.retryWait=60s
> --conf spark.shuffle.io.maxRetries=5
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to