[ https://issues.apache.org/jira/browse/SPARK-34624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295650#comment-17295650 ]
Erik Krogen edited comment on SPARK-34624 at 3/5/21, 12:01 AM: --------------------------------------------------------------- Thanks for reporting this [~shardulm]! edit: moving question to PR was (Author: xkrogen): Thanks for reporting this [~shardulm]! One question for you: Is the correct behavior to filter/ignore these non-JAR dependencies, or still add them to the classpath, but with the proper extensions? > Filter non-jar dependencies from ivy/maven coordinates > ------------------------------------------------------ > > Key: SPARK-34624 > URL: https://issues.apache.org/jira/browse/SPARK-34624 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.1.1 > Reporter: Shardul Mahadik > Priority: Major > > Some maven artifacts define non-jar dependencies. One such example is > {{hive-exec}}'s dependency on the {{pom}} of {{apache-curator}} > https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.8/hive-exec-2.3.8.pom > Today trying to depend on such an artifact using {{--packages}} will print an > error but continue without including the non-jar dependency. > {code} > 1/03/04 09:46:49 ERROR SparkContext: Failed to add > file:/Users/smahadik/.ivy2/jars/org.apache.curator_apache-curator-2.7.1.jar > to Spark environment > java.io.FileNotFoundException: Jar > /Users/shardul/.ivy2/jars/org.apache.curator_apache-curator-2.7.1.jar not > found > at > org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1935) > at org.apache.spark.SparkContext.addJar(SparkContext.scala:1990) > at org.apache.spark.SparkContext.$anonfun$new$12(SparkContext.scala:501) > at > org.apache.spark.SparkContext.$anonfun$new$12$adapted(SparkContext.scala:501) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > {code} > Doing the same using {{spark.sql("ADD JAR > ivy://org.apache.hive:hive-exec:2.3.8?exclude=org.pentaho:pentaho-aggdesigner-algorithm")}} > will cause a failure > {code} > ADD JAR /Users/smahadik/.ivy2/jars/org.apache.curator_apache-curator-2.7.1.jar > /Users/smahadik/.ivy2/jars/org.apache.curator_apache-curator-2.7.1.jar does > not exist > ====================== > END HIVE FAILURE OUTPUT > ====================== > org.apache.spark.sql.execution.QueryExecutionException: > /Users/smahadik/.ivy2/jars/org.apache.curator_apache-curator-2.7.1.jar does > not exist > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$runHive$1(HiveClientImpl.scala:841) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:291) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273) > at > org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:800) > at > org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:787) > at > org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:947) > at > org.apache.spark.sql.hive.HiveSessionResourceLoader.$anonfun$addJar$1(HiveSessionStateBuilder.scala:130) > at > org.apache.spark.sql.hive.HiveSessionResourceLoader.$anonfun$addJar$1$adapted(HiveSessionStateBuilder.scala:129) > at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:75) > at > org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:129) > at > org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3705) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3703) > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228) > at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610) > ... 47 elided > {code} > We should exclude these non-jar artifacts as our current dependency > resolution code assume artifacts to be jars. e.g. > https://github.com/apache/spark/blob/17601e014c6ccb48958d35ffb04bedeac8cfc66a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1215 > and > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L318 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org