[ https://issues.apache.org/jira/browse/SPARK-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156599#comment-14156599 ]
Takuya Ueshin commented on SPARK-3764: -------------------------------------- Ah, I see that {{context.getTaskAttemptID}} at [ParquetTableOperations.scala:334|https://github.com/apache/spark/blob/6e27cb630de69fa5acb510b4e2f6b980742b1957/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala#L334] is breaking binary-compatibility of Spark itself. > Invalid dependencies of artifacts in Maven Central Repository. > -------------------------------------------------------------- > > Key: SPARK-3764 > URL: https://issues.apache.org/jira/browse/SPARK-3764 > Project: Spark > Issue Type: Bug > Components: Build > Affects Versions: 1.1.0 > Reporter: Takuya Ueshin > > While testing my spark applications locally using spark artifacts downloaded > from Maven Central, the following exception was thrown: > {quote} > ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught exception in thread > Thread[Executor task launch worker-2,5,main] > java.lang.IncompatibleClassChangeError: Found class > org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected > at > org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334) > at > parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251) > at > org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300) > at > org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318) > at > org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {quote} > This is because the hadoop class {{TaskAttemptContext}} is incompatible > between hadoop-1 and hadoop-2. > I guess the spark artifacts in Maven Central were built against hadoop-2 with > Maven, but the depending version of hadoop in {{pom.xml}} remains 1.0.4, so > the hadoop version mismatch is happend. > FYI: > sbt seems to publish 'effective pom'-like pom file, so the dependencies are > correctly resolved. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org