@Rui do you mean the spark-core jar in the maven central repo
are incompatible with the same version of the the official pre-built Spark
binary? That's really weird. I thought they should have used the same codes.

Best Regards,
Shixiong Zhu

2014-12-18 17:22 GMT+08:00 Sean Owen <so...@cloudera.com>:
>
> Well, it's always a good idea to used matched binary versions. Here it
> is more acutely necessary. You can use a pre built binary -- if you
> use it to compile and also run. Why does it not make sense to publish
> artifacts?
>
> Not sure what you mean about core vs assembly, as the assembly
> contains all of the modules. You don't literally need the same jar
> file.
>
> On Thu, Dec 18, 2014 at 3:20 AM, Sun, Rui <rui....@intel.com> wrote:
> > Not using spark-submit. The App directly communicates with the Spark
> cluster
> > in standalone mode.
> >
> >
> >
> > If mark the Spark dependency as 'provided’, then the spark-core .jar
> > elsewhere must be pointe to in CLASSPATH. However, the pre-built Spark
> > binary only has an assembly jar, not having individual module jars. So
> you
> > don’t have a chance to point to a module.jar which is the same binary as
> > that in the pre-built Spark binary.
> >
> >
> >
> > Maybe the Spark distribution should contain not only the assembly jar but
> > also individual module jars. Any opinion?
> >
> >
> >
> > From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu]
> > Sent: Thursday, December 18, 2014 2:20 AM
> > To: Sean Owen
> > Cc: Sun, Rui; user@spark.apache.org
> > Subject: Re: weird bytecode incompatability issue between spark-core jar
> > from mvn repo and official spark prebuilt binary
> >
> >
> >
> > Just to clarify, are you running the application using spark-submit after
> > packaging with sbt package ? One thing that might help is to mark the
> Spark
> > dependency as 'provided' as then you shouldn't have the Spark classes in
> > your jar.
> >
> >
> >
> > Thanks
> >
> > Shivaram
> >
> >
> >
> > On Wed, Dec 17, 2014 at 4:39 AM, Sean Owen <so...@cloudera.com> wrote:
> >
> > You should use the same binaries everywhere. The problem here is that
> > anonymous functions get compiled to different names when you build
> > different (potentially) so you actually have one function being called
> > when another function is meant.
> >
> >
> > On Wed, Dec 17, 2014 at 12:07 PM, Sun, Rui <rui....@intel.com> wrote:
> >> Hi,
> >>
> >>
> >>
> >> I encountered a weird bytecode incompatability issue between spark-core
> >> jar
> >> from mvn repo and official spark prebuilt binary.
> >>
> >>
> >>
> >> Steps to reproduce:
> >>
> >> 1.     Download the official pre-built Spark binary 1.1.1 at
> >> http://d3kbcqa49mib13.cloudfront.net/spark-1.1.1-bin-hadoop1.tgz
> >>
> >> 2.     Launch the Spark cluster in pseudo cluster mode
> >>
> >> 3.     A small scala APP which calls RDD.saveAsObjectFile()
> >>
> >> scalaVersion := "2.10.4"
> >>
> >>
> >>
> >> libraryDependencies ++= Seq(
> >>
> >>   "org.apache.spark" %% "spark-core" % "1.1.1"
> >>
> >> )
> >>
> >>
> >>
> >> val sc = new SparkContext(args(0), "test") //args[0] is the Spark master
> >> URI
> >>
> >>   val rdd = sc.parallelize(List(1, 2, 3))
> >>
> >>   rdd.saveAsObjectFile("/tmp/mysaoftmp")
> >>
> >>           sc.stop
> >>
> >>
> >>
> >> throws an exception as follows:
> >>
> >> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to
> >> stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure:
> >> Lost
> >> task 1.3 in stage 0.0 (TID 6, ray-desktop.sh.intel.com):
> >> java.lang.ClassCastException: scala.Tuple2 cannot be cast to
> >> scala.collection.Iterator
> >>
> >> [error]
>  org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> >>
> >> [error]
>  org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> >>
> >> [error]
> >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> >>
> >> [error]
> >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >>
> >> [error]         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >>
> >> [error]
>  org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >>
> >> [error]
> >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >>
> >> [error]         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >>
> >> [error]
> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> >>
> >> [error]         org.apache.spark.scheduler.Task.run(Task.scala:54)
> >>
> >> [error]
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
> >>
> >> [error]
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >>
> >> [error]
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>
> >> [error]         java.lang.Thread.run(Thread.java:701)
> >>
> >>
> >>
> >> After investigation, I found that this is caused by bytecode
> >> incompatibility
> >> issue between RDD.class in spark-core_2.10-1.1.1.jar and the pre-built
> >> spark
> >> assembly respectively.
> >>
> >>
> >>
> >> This issue also happens with spark 1.1.0.
> >>
> >>
> >>
> >> Is there anything wrong in my usage of Spark? Or anything wrong in the
> >> process of deploying Spark module jars to maven repo?
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to