[ https://issues.apache.org/jira/browse/SPARK-35084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326200#comment-17326200 ]
Keunhyun Oh commented on SPARK-35084: ------------------------------------- *Spark 2.4.5* [https://github.com/apache/spark/blob/v2.4.5/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] {code:java} if (!isMesosCluster && !isStandAloneCluster) { // Resolve maven dependencies if there are any and add classpath to jars. Add them to py-files // too for packages that include Python code val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies( args.packagesExclusions, args.packages, args.repositories, args.ivyRepoPath, args.ivySettingsPath) if (!StringUtils.isBlank(resolvedMavenCoordinates)) { args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates) if (args.isPython || isInternal(args.primaryResource)) { args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates) } } // install any R packages that may have been passed through --jars or --packages. // Spark Packages may contain R source code inside the jar. if (args.isR && !StringUtils.isBlank(args.jars)) { RPackageUtils.checkAndBuildRPackage(args.jars, printStream, args.verbose) } } {code} *Spark 3.0.2* **[https://github.com/apache/spark/blob/v3.0.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] {code:java} if (!StringUtils.isBlank(resolvedMavenCoordinates)) { // In K8s client mode, when in the driver, add resolved jars early as we might need // them at the submit time for artifact downloading. // For example we might use the dependencies for downloading // files from a Hadoop Compatible fs eg. S3. In this case the user might pass: // --packages com.amazonaws:aws-java-sdk:1.7.4:org.apache.hadoop:hadoop-aws:2.7.6 if (isKubernetesClusterModeDriver) { val loader = getSubmitClassLoader(sparkConf) for (jar <- resolvedMavenCoordinates.split(",")) { addJarToClasspath(jar, loader) } } else if (isKubernetesCluster) { // We need this in K8s cluster mode so that we can upload local deps // via the k8s application, like in cluster mode driver childClasspath ++= resolvedMavenCoordinates.split(",") } else { args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates) if (args.isPython || isInternal(args.primaryResource)) { args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates) } } }{code} When using k8s master, in spark 2, jars derived from maven are added to args.jars. However, in spark 3, maven dependencies are not merged to args.jars. I assume that because of it k8s cluster mode spark-submit is not supported spark.jars.packages I expected. So, jars from packages are not added to spark context. > [k8s] On Spark 3, jars listed in spark.jars and spark.jars.packages are not > added to sparkContext > ------------------------------------------------------------------------------------------------- > > Key: SPARK-35084 > URL: https://issues.apache.org/jira/browse/SPARK-35084 > Project: Spark > Issue Type: Question > Components: Kubernetes > Affects Versions: 3.0.0, 3.0.2, 3.1.1 > Reporter: Keunhyun Oh > Priority: Major > > I'm trying to migrate spark 2 to spark 3 in k8s. > > In my environment, on Spark 3.x, jars listed in spark.jars and > spark.jars.packages are not added to sparkContext. > After driver's process is launched, jars are not propagated to Executors. So, > NoClassDefException is raised in executors. > > In spark.properties, the only main application jar is contained in > spark.jars. It is different from Spark 2. > > How to solve this situation? Is it any changed spark options in spark 3 from > spark 2? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org