I am able to run my application after I compiled Spark source in the following way
./dev/change-scala-version.sh 2.11 ./dev/make-distribution.sh --name spark-2.0.0-snapshot-bin-hadoop2.6 --tgz -Phadoop-2.6 -DskipTests But while the application is running I get the following exception, which I was not getting with Spark 1.6.1. Any idea why this might be happening? java.lang.IllegalArgumentException: requirement failed: chunks must be non-empty at scala.Predef$.require(Predef.scala:224) at org.apache.spark.util.io.ChunkedByteBuffer.<init>(ChunkedByteBuffer.scala:41) at org.apache.spark.util.io.ChunkedByteBuffer.<init>(ChunkedByteBuffer.scala:52) at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:580) at org.apache.spark.storage.BlockManager.getRemoteValues(BlockManager.scala:514) at org.apache.spark.storage.BlockManager.get(BlockManager.scala:601) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:653) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:329) at org.apache.spark.rdd.RDD.iterator(RDD.scala:280) at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:100) at org.apache.spark.rdd.PartitionerAwareUnionRDD$$anonfun$compute$1.apply(PartitionerAwareUnionRDD.scala:99) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.SetBuilder.$plus$plus$eq(SetBuilder.scala:20) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at scala.collection.AbstractIterator.to(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toSet(TraversableOnce.scala:304) at scala.collection.AbstractIterator.toSet(Iterator.scala:1336) at org.daselab.sparkel.SparkELHDFSTestCopy$$anonfun$45.apply(SparkELHDFSTestCopy.scala:392) at org.daselab.sparkel.SparkELHDFSTestCopy$$anonfun$45.apply(SparkELHDFSTestCopy.scala:391) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:756) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:756) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) On Fri, May 13, 2016 at 6:33 AM, Raghava Mutharaju < m.vijayaragh...@gmail.com> wrote: > Thank you for the response. > > I used the following command to build from source > > build/mvn -Dhadoop.version=2.6.4 -Phadoop-2.6 -DskipTests clean package > > Would this put in the required jars in .ivy2 during the build process? If > so, how can I make the spark distribution runnable, so that I can use it on > other machines as well (make-distribution.sh no longer exists in Spark root > folder)? > > For compiling my application, I put in the following lines in the build.sbt > > packAutoSettings > val spark = "org.apache.spark" %% "spark-core" % "2.0.0-SNAPSHOT" > val sparksql = "org.apache.spark" % "spark-sql_2.11" % "2.0.0-SNAPSHOT" > > lazy val root = (project in file(".")). > settings( > name := "sparkel", > version := "0.1.0", > scalaVersion := "2.11.8", > libraryDependencies += spark, > libraryDependencies += sparksql > ) > > > Regards, > Raghava. > > > On Fri, May 13, 2016 at 12:23 AM, Luciano Resende <luckbr1...@gmail.com> > wrote: > >> Spark has moved to build using Scala 2.11 by default in master/trunk. >> >> As for the 2.0.0-SNAPSHOT, it is actually the version of master/trunk and >> you might be missing some modules/profiles for your build. What command did >> you use to build ? >> >> On Thu, May 12, 2016 at 9:01 PM, Raghava Mutharaju < >> m.vijayaragh...@gmail.com> wrote: >> >>> Hello All, >>> >>> I built Spark from the source code available at >>> https://github.com/apache/spark/. Although I haven't specified the >>> "-Dscala-2.11" option (to build with Scala 2.11), from the build messages I >>> see that it ended up using Scala 2.11. Now, for my application sbt, what >>> should be the spark version? I tried the following >>> >>> val spark = "org.apache.spark" %% "spark-core" % "2.0.0-SNAPSHOT" >>> val sparksql = "org.apache.spark" % "spark-sql_2.11" % "2.0.0-SNAPSHOT" >>> >>> and scalaVersion := "2.11.8" >>> >>> But this setting of spark version gives sbt error >>> >>> unresolved dependency: org.apache.spark#spark-core_2.11;2.0.0-SNAPSHOT >>> >>> I guess this is because the repository doesn't contain 2.0.0-SNAPSHOT. >>> Does this mean, the only option is to put all the required jars in the lib >>> folder (unmanaged dependencies)? >>> >>> Regards, >>> Raghava. >>> >> >> >> >> -- >> Luciano Resende >> http://twitter.com/lresende1975 >> http://lresende.blogspot.com/ >> > > > > -- > Regards, > Raghava > http://raghavam.github.io > -- Regards, Raghava http://raghavam.github.io