[jira] [Commented] (SPARK-3764) Invalid dependencies of artifacts in Maven Central Repository.

Takuya Ueshin (JIRA) Thu, 02 Oct 2014 07:10:49 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156599#comment-14156599
 ]


Takuya Ueshin commented on SPARK-3764:
--------------------------------------

Ah, I see that {{context.getTaskAttemptID}} at 
[ParquetTableOperations.scala:334|https://github.com/apache/spark/blob/6e27cb630de69fa5acb510b4e2f6b980742b1957/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala#L334]
 is breaking binary-compatibility of Spark itself.

> Invalid dependencies of artifacts in Maven Central Repository.
> --------------------------------------------------------------
>
>                 Key: SPARK-3764
>                 URL: https://issues.apache.org/jira/browse/SPARK-3764
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>    Affects Versions: 1.1.0
>            Reporter: Takuya Ueshin
>
> While testing my spark applications locally using spark artifacts downloaded 
> from Maven Central, the following exception was thrown:
> {quote}
> ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught exception in thread 
> Thread[Executor task launch worker-2,5,main]
> java.lang.IncompatibleClassChangeError: Found class 
> org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
>       at 
> org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
>       at 
> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
>       at 
> org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
>       at 
> org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
>       at 
> org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>       at org.apache.spark.scheduler.Task.run(Task.scala:54)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {quote}
> This is because the hadoop class {{TaskAttemptContext}} is incompatible 
> between hadoop-1 and hadoop-2.
> I guess the spark artifacts in Maven Central were built against hadoop-2 with 
> Maven, but the depending version of hadoop in {{pom.xml}} remains 1.0.4, so 
> the hadoop version mismatch is happend.
> FYI:
> sbt seems to publish 'effective pom'-like pom file, so the dependencies are 
> correctly resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3764) Invalid dependencies of artifacts in Maven Central Repository.

Reply via email to