[ https://issues.apache.org/jira/browse/SPARK-47475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-47475: ---------------------------------- Summary: Support `spark.kubernetes.jars.avoidDownloadSchemes` for K8s Cluster Mode (was: Jars Download from Driver Caused Executor Scalability Issue) > Support `spark.kubernetes.jars.avoidDownloadSchemes` for K8s Cluster Mode > ------------------------------------------------------------------------- > > Key: SPARK-47475 > URL: https://issues.apache.org/jira/browse/SPARK-47475 > Project: Spark > Issue Type: Improvement > Components: Deploy, Kubernetes, Spark Core > Affects Versions: 3.4.0, 3.5.0 > Reporter: Jiale Tan > Assignee: Jiale Tan > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Under K8s cluster deployment mode, all the jars, including primary resource > jar, jars from {{--jars}} or {{spark.jars}}, will be downloaded to driver > local and then served to executors through file server running on driver. > When jars are big and the application requests a lot of executors, the > massive concurrent jars download from the driver will cause network > saturation. In this case, the executors jar download will timeout, causing > executors to be terminated. From user point of view, the application is > trapped in the loop of massive executor loss and re-provision, but never gets > enough live executors as requested, which leads to job SLA breach or > sometimes job failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org