[jira] [Comment Edited] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2015-02-02 Thread Markus Dale (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301780#comment-14301780
 ] 

Markus Dale edited comment on SPARK-3039 at 2/2/15 8:40 PM:


For me, Spark 1.2.0 either downloading spark-1.2.0-bin-hadoop2.4.tgz or
compiling the source with 

{code}
mvn -Pyarn -Phadoop-2.4 -Phive-0.13.1 -DskipTests clean package
{code}

still had the same problem:

{noformat}
java.lang.IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
at 
org.apache.avro.mapreduce.AvroRecordReaderBase.initialize(AvroRecordReaderBase.java:87)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:135)

{noformat}

Starting the build with a clean .m2/repository, the repository afterwards 
contained:

* avro-mapred/1.7.5 (with the default jar - i.e. hadoop1)
* avro-mapred/1.7.6 with the avro-mapred-1.7.6-hadoop2.jar (the one we want). 

Seemed that sharding these two dependencies into the spark-assembly-jar 
resulted in the error above
at least in the downloaded hadoop2.4 spark bin and my own build.

Running the following (after doing a mvn install and by-hand copy of all the 
spark artifacts into my local repo for spark-repl/yarn):

{code}
 mvn -Pyarn -Phadoop-2.4 -Phive -DskipTests dependency:tree 
-Dincludes=org.apache.avro:avro-mapred
{code}

Showed that the culprit was in the Hive project, namely 
org.spark-project.hive:hive-exec's
dependency on 1.7.5.

{noformat}
Building Spark Project Hive 1.2.0
[INFO] 
[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ spark-hive_2.10 ---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  \- org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]
{noformat}

Editing spark-1.2.0/sql/hive/pom.xml and excluding avro-mapred from hive-exec,
then recompile, fixed the problem and the resulting dist works well against
Avro/Hadoop2 code:

{code:xml}
dependency
  groupIdorg.spark-project.hive/groupId
  artifactIdhive-exec/artifactId
  version${hive.version}/version
  exclusions
exclusion
  groupIdcommons-logging/groupId
  artifactIdcommons-logging/artifactId
/exclusion
exclusion
  groupIdcom.esotericsoftware.kryo/groupId
  artifactIdkryo/artifactId
/exclusion
exclusion
  groupIdorg.apache.avro/groupId
  artifactIdavro-mapred/artifactId
/exclusion
  /exclusions
/dependency
{code}
   
Just the last exclusion added. Will try to do a pull-request if that's not 
already addressed in the latest code.


was (Author: medale):
For me, Spark 1.2.0 either downloading spark-1.2.0-bin-hadoop2.4.tgz or
compiling the source with 

{code}
mvn -Pyarn -Phadoop-2.4 -Phive -DskipTests clean package
{code}

still had the same problem:

{noformat}
java.lang.IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
at 
org.apache.avro.mapreduce.AvroRecordReaderBase.initialize(AvroRecordReaderBase.java:87)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:135)

{noformat}

Starting the build with a clean .m2/repository, the repository afterwards 
contained:

* avro-mapred/1.7.5 (with the default jar - i.e. hadoop1)
* avro-mapred/1.7.6 with the avro-mapred-1.7.6-hadoop2.jar (the one we want). 

Seemed that sharding these two dependencies into the spark-assembly-jar 
resulted in the error above
at least in the downloaded hadoop2.4 spark bin and my own build.

Running the following (after doing a mvn install and by-hand copy of all the 
spark artifacts into my local repo for spark-repl/yarn):

{code}
 mvn -Pyarn -Phadoop-2.4 -Phive -DskipTests dependency:tree 
-Dincludes=org.apache.avro:avro-mapred
{code}

Showed that the culprit was in the Hive project, namely 
org.spark-project.hive:hive-exec's
dependency on 1.7.5.

{noformat}
Building Spark Project Hive 1.2.0
[INFO] 
[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ spark-hive_2.10 ---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  \- org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]
{noformat}

Editing spark-1.2.0/sql/hive/pom.xml and excluding avro-mapred from hive-exec,
then recompile, fixed the problem and the resulting dist works well against
Avro/Hadoop2 code:

{code:xml}
dependency
  groupIdorg.spark-project.hive/groupId
  artifactIdhive-exec/artifactId
 

[jira] [Comment Edited] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-09 Thread Clay Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240143#comment-14240143
 ] 

Clay Kim edited comment on SPARK-3039 at 12/9/14 10:05 PM:
---

I finally managed to get this to work, using the prebuilt spark 
spark-1.1.1-bin-hadoop1 and avro-mapred 1.7.6 w/ classifier Hadoop2.

  org.apache.avro % avro-mapred % 1.7.6 classifier hadoop2,


was (Author: theclaymethod):
I finally managed to get this to work, using the prebuilt spark 
spark-1.1.1-bin-hadoop1 and avro-mapred 1.7.6 w/ classifier Hadoop2.
```
org.apache.avro % avro-mapred % 1.7.6 classifier hadoop2,
```

 Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 
 1 API
 --

 Key: SPARK-3039
 URL: https://issues.apache.org/jira/browse/SPARK-3039
 Project: Spark
  Issue Type: Bug
  Components: Build, Input/Output, Spark Core
Affects Versions: 0.9.1, 1.0.0, 1.1.0
 Environment: hadoop2, hadoop-2.4.0, HDP-2.1
Reporter: Bertrand Bossy
Assignee: Bertrand Bossy
 Fix For: 1.2.0


 The spark assembly contains the artifact org.apache.avro:avro-mapred as a 
 dependency of org.spark-project.hive:hive-serde.
 The avro-mapred package provides a hadoop FileInputFormat to read and write 
 avro files. There are two versions of this package, distinguished by a 
 classifier. avro-mapred for the new Hadoop API uses the classifier hadoop2. 
 avro-mapred for the old Hadoop API uses no classifier.
 E.g. when reading avro files using 
 {code}
 sc.newAPIHadoopFile[AvroKey[SomeClass]],NullWritable,AvroKeyInputFormat[SomeClass]](hdfs://path/to/file.avro)
 {code}
 The following error occurs:
 {code}
 java.lang.IncompatibleClassChangeError: Found interface 
 org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
 at 
 org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
 at 
 org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:111)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
 at org.apache.spark.scheduler.Task.run(Task.scala:51)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 This error usually is a hint that there was a mix up of the old and the new 
 Hadoop API. As a work-around, if avro-mapred for hadoop2 is forced to 
 appear before the version that is bundled with Spark, reading avro files 
 works fine. 
 Also, if Spark is built using avro-mapred for hadoop2, it works fine as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-08 Thread Derrick Burns (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238197#comment-14238197
 ] 

Derrick Burns edited comment on SPARK-3039 at 12/8/14 6:24 PM:
---

Spark 1.1.1/Hadoop 1.0.4

{code}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/12/08 10:21:06 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught 
exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

{code}



was (Author: derrickburns):
Spark 1.1.1/Hadoop 1.0.4

{quote}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/12/08 10:21:06 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught 
exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 

[jira] [Comment Edited] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-08 Thread Derrick Burns (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238197#comment-14238197
 ] 

Derrick Burns edited comment on SPARK-3039 at 12/8/14 6:27 PM:
---

Spark 1.1.1/Hadoop 1.0.4 built using maven with:

{code:xml}
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-mllib_2.10/artifactId
version1.1.1/version
  /dependency
{code}

{code}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/12/08 10:21:06 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught 
exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

{code}



was (Author: derrickburns):
Spark 1.1.1/Hadoop 1.0.4 built using maven with:

{code}
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-mllib_2.10/artifactId
version1.1.1/version
  /dependency
{code}

{code}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 

[jira] [Comment Edited] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-08 Thread Derrick Burns (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238197#comment-14238197
 ] 

Derrick Burns edited comment on SPARK-3039 at 12/8/14 6:26 PM:
---

Spark 1.1.1/Hadoop 1.0.4 built using maven with:

{code}
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-mllib_2.10/artifactId
version1.1.1/version
  /dependency
{code}

{code}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/12/08 10:21:06 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught 
exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

{code}



was (Author: derrickburns):
Spark 1.1.1/Hadoop 1.0.4

{code}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/12/08 10:21:06 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught 
exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 

[jira] [Comment Edited] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-08 Thread Derrick Burns (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238197#comment-14238197
 ] 

Derrick Burns edited comment on SPARK-3039 at 12/8/14 6:27 PM:
---

Spark 1.1.1/Hadoop 1.0.4 built using maven with:

{code:xml}
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-mllib_2.10/artifactId
version1.1.1/version
  /dependency
{code}

Here is the stack trace:

{code}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/12/08 10:21:06 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught 
exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

{code}



was (Author: derrickburns):
Spark 1.1.1/Hadoop 1.0.4 built using maven with:

{code:xml}
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-mllib_2.10/artifactId
version1.1.1/version
  /dependency
{code}

{code}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at 

[jira] [Comment Edited] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-08 Thread Derrick Burns (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238197#comment-14238197
 ] 

Derrick Burns edited comment on SPARK-3039 at 12/8/14 6:29 PM:
---

Spark 1.1.1/Hadoop 1.0.4 built using maven with:

{code:xml}
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-mllib_2.10/artifactId
version1.1.1/version
  /dependency
{code}

The code is rather trivial:

{code}
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val tweets = sqlContext.jsonFile(path).cache()
tweets.saveAsParquetFile(tweets.parquet)
{code}

Here is the stack trace:

{code}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/12/08 10:21:06 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught 
exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

{code}



was (Author: derrickburns):
Spark 1.1.1/Hadoop 1.0.4 built using maven with:

{code:xml}
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-mllib_2.10/artifactId
version1.1.1/version
  /dependency
{code}

Here is the stack trace:

{code}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 

[jira] [Comment Edited] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-08 Thread Derrick Burns (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238197#comment-14238197
 ] 

Derrick Burns edited comment on SPARK-3039 at 12/8/14 10:16 PM:


Running in local mode. Spark 1.1.1/Hadoop 1.0.4 built using maven with:

{code:xml}
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-mllib_2.10/artifactId
version1.1.1/version
  /dependency
{code}

The code is rather trivial:

{code}
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val tweets = sqlContext.jsonFile(path).cache()
tweets.saveAsParquetFile(tweets.parquet)
{code}

Here is the stack trace:

{code}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/12/08 10:21:06 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught 
exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at 
org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

{code}



was (Author: derrickburns):
Spark 1.1.1/Hadoop 1.0.4 built using maven with:

{code:xml}
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.1.1/version
  /dependency
  dependency
groupIdorg.apache.spark/groupId
artifactIdspark-mllib_2.10/artifactId
version1.1.1/version
  /dependency
{code}

The code is rather trivial:

{code}
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val tweets = sqlContext.jsonFile(path).cache()
tweets.saveAsParquetFile(tweets.parquet)
{code}

Here is the stack trace:

{code}
java.lang.IncompatibleClassChangeError: Found class 
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at 
org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
at 
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
at