[ 
https://issues.apache.org/jira/browse/SPARK-47475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47475:
----------------------------------
    Summary: Support `spark.kubernetes.jars.avoidDownloadSchemes` for K8s 
Cluster Mode  (was: Jars Download from Driver Caused Executor Scalability Issue)

> Support `spark.kubernetes.jars.avoidDownloadSchemes` for K8s Cluster Mode
> -------------------------------------------------------------------------
>
>                 Key: SPARK-47475
>                 URL: https://issues.apache.org/jira/browse/SPARK-47475
>             Project: Spark
>          Issue Type: Improvement
>          Components: Deploy, Kubernetes, Spark Core
>    Affects Versions: 3.4.0, 3.5.0
>            Reporter: Jiale Tan
>            Assignee: Jiale Tan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> Under K8s cluster deployment mode, all the jars, including primary resource 
> jar, jars from {{--jars}} or {{spark.jars}}, will be downloaded to driver 
> local and then served to executors through file server running on driver.
> When jars are big and the application requests a lot of executors, the 
> massive concurrent jars download from the driver will cause network 
> saturation. In this case, the executors jar download will timeout, causing 
> executors to be terminated. From user point of view, the application is 
> trapped in the loop of massive executor loss and re-provision, but never gets 
> enough live executors as requested, which leads to job SLA breach or 
> sometimes job failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to