Hi, I'm working on having "SparkPerf" ( https://github.com/databricks/spark-perf) run with Spark 2.0, noticed a few pull requests not yet accepted so concerned this project's been abandoned - it's proven very useful in the past for quality assurance as we can easily exercise lots of Spark functions with a cluster (perhaps exposing problems that don't surface with the Spark unit tests).
I want to use Scala 2.11.8 and Spark 2.0.0 so I'm making my way through various files, currently faced with a NoSuchMethod exception NoSuchMethodError: org/apache/spark/SparkContext.rddToPairRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/reflect/ClassTag;Lscala/reflect/ClassTag;Lscala/math/Ordering;)Lorg/apache/spark/rdd/PairRDDFunctions; at spark.perf.AggregateByKey.runTest(KVDataTest.scala:137) class AggregateByKey(sc: SparkContext) extends KVDataTest(sc) { override def runTest(rdd: RDD[_], reduceTasks: Int) { rdd.asInstanceOf[RDD[(String, String)]] .map{case (k, v) => (k, v.toInt)}.reduceByKey(_ + _, reduceTasks).count() } } Grepping shows ./spark-tests/target/streams/compile/incCompileSetup/$global/streams/inc_compile_2.10:/home/aroberts/Desktop/spark-perf/spark-tests/src/main/scala/spark/perf/KVDataTest.scala -> rddToPairRDDFunctions The scheduling-throughput tests complete fine but the problem here is seen with agg-by-key (and likely other modules to fix owing to API changes between 1.x and 2.x which I guess is the cause of the above problem). Has anybody already made good progress here? Would like to work together and get this available for everyone, I'll be churning through it either way. Will be looking at HiBench also. Next step for me is to use sbt -Dspark.version=2.0.0 (2.0.0-preview?) and work from there, although I figured the prep tests stage would do this for me (how else is it going to build?). Cheers, Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU