SparkSpark-perf terasort WIP branch

Ewan Higgs Wed, 14 Jan 2015 05:35:14 -0800

Hi all,

I'm trying to build the Spark-perf WIP code but there are some errors todo with Hadoop APIs. I presume this is because there is some Hadoopversion set and it's referring to that. But I can't seem to find it.


The errors are as follows:

[info] Compiling 15 Scala sources and 2 Java sources to/home/ehiggs/src/spark-perf/spark-tests/target/scala-2.10/classes...[error]/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraInputFormat.scala:40:object task is not a member of package org.apache.hadoop.mapreduce

[error] import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
[error]                                    ^

[error]/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraInputFormat.scala:132:not found: type TaskAttemptContextImpl

[error]             val context = new TaskAttemptContextImpl(
[error]                               ^

[error]/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraScheduler.scala:37:object TTConfig is not a member of packageorg.apache.hadoop.mapreduce.server.tasktracker

[error] import org.apache.hadoop.mapreduce.server.tasktracker.TTConfig
[error]        ^

[error]/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraScheduler.scala:91:not found: value TTConfig

[error]   var slotsPerHost : Int = conf.getInt(TTConfig.TT_MAP_SLOTS, 4)
[error]                                        ^

[error]/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraSortAll.scala:7:value run is not a member of org.apache.spark.examples.terasort.TeraGen

[error]     tg.run(Array[String]("10M", "/tmp/terasort_in"))
[error]        ^

[error]/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraSortAll.scala:9:value run is not a member of org.apache.spark.examples.terasort.TeraSort

[error]     ts.run(Array[String]("/tmp/terasort_in", "/tmp/terasort_out"))
[error]        ^
[error] 6 errors found
[error] (compile:compile) Compilation failed
[error] Total time: 13 s, completed 05-Jan-2015 12:21:47

I can build the same code if it's in the Spark tree using the followingcommand:

mvn -Dhadoop.version=2.5.0 -DskipTests=true install

Is there a way I can convince spark-perf to build this code with theappropriate Hadoop library version? I tried to apply the following tospark-tests/project/SparkTestsBuild.scala but it didn't seem to work asI expected:


$ git diff project/SparkTestsBuild.scala

diff --git a/spark-tests/project/SparkTestsBuild.scalab/spark-tests/project/SparkTestsBuild.scala

index 4116326..4ed5f0c 100644
--- a/spark-tests/project/SparkTestsBuild.scala
+++ b/spark-tests/project/SparkTestsBuild.scala
@@ -16,7 +16,9 @@ object SparkTestsBuild extends Build {
         "org.scalatest" %% "scalatest" % "2.2.1" % "test",
         "com.google.guava" % "guava" % "14.0.1",
         "org.apache.spark" %% "spark-core" % "1.0.0" % "provided",
-        "org.json4s" %% "json4s-native" % "3.2.9"
+        "org.json4s" %% "json4s-native" % "3.2.9",
+        "org.apache.hadoop" % "hadoop-common" % "2.5.0",
+        "org.apache.hadoop" % "hadoop-mapreduce" % "2.5.0"
       ),
       test in assembly := {},

outputPath in assembly :=file("target/spark-perf-tests-assembly.jar"),

@@ -36,4 +38,4 @@ object SparkTestsBuild extends Build {
         case _ => MergeStrategy.first
       }
     ))
-}
\ No newline at end of file
+}


Yours,
Ewan

SparkSpark-perf terasort WIP branch

Reply via email to