Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

Dean Wampler Wed, 25 Mar 2015 12:35:55 -0700

Weird. Are you running using SBT console? It should have the spark-core jar
on the classpath. Similarly, spark-shell or spark-submit should work, but
be sure you're using the same version of Spark when running as when
compiling. Also, you might need to add spark-sql to your SBT dependencies,
but that shouldn't be this issue.


Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 12:09 PM, roni <roni.epi...@gmail.com> wrote:

> Thanks Dean and Nick.
> So, I removed the ADAM and H2o from my SBT as I was not using them.
> I got the code to compile  - only for fail while running with -
> SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/rdd/RDD$
>         at preDefKmerIntersection$.main(kmerIntersetion.scala:26)
>
> This line is where I do a "JOIN" operation.
> val hgPair = hgfasta.map(_.split (",")).map(a=> (a(0), a(1).trim().toInt))
>                  val filtered = hgPair.filter(kv => kv._2 == 1)
>                  val bedPair = bedFile.map(_.split (",")).map(a=> (a(0),
> a(1).trim().toInt))
>             *     val joinRDD = bedPair.join(filtered)   *
> Any idea whats going on?
> I have data on the EC2 so I am avoiding creating the new cluster , but
> just upgrading and changing the code to use 1.3 and Spark SQL
> Thanks
> Roni
>
>
>
> On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler <deanwamp...@gmail.com>
> wrote:
>
>> For the Spark SQL parts, 1.3 breaks backwards compatibility, because
>> before 1.3, Spark SQL was considered experimental where API changes were
>> allowed.
>>
>> So, H2O and ADA compatible with 1.2.X might not work with 1.3.
>>
>> dean
>>
>> Dean Wampler, Ph.D.
>> Author: Programming Scala, 2nd Edition
>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>> Typesafe <http://typesafe.com>
>> @deanwampler <http://twitter.com/deanwampler>
>> http://polyglotprogramming.com
>>
>> On Wed, Mar 25, 2015 at 9:39 AM, roni <roni.epi...@gmail.com> wrote:
>>
>>> Even if H2o and ADA are dependent on 1.2.1 , it should be backword
>>> compatible, right?
>>> So using 1.3 should not break them.
>>> And the code is not using the classes from those libs.
>>> I tried sbt clean compile .. same errror
>>> Thanks
>>> _R
>>>
>>> On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath <
>>> nick.pentre...@gmail.com> wrote:
>>>
>>>> What version of Spark do the other dependencies rely on (Adam and H2O?)
>>>> - that could be it
>>>>
>>>> Or try sbt clean compile
>>>>
>>>> —
>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>
>>>>
>>>> On Wed, Mar 25, 2015 at 5:58 PM, roni <roni.epi...@gmail.com> wrote:
>>>>
>>>>>     I have a EC2 cluster created using spark version 1.2.1.
>>>>> And I have a SBT project .
>>>>> Now I want to upgrade to spark 1.3 and use the new features.
>>>>> Below are issues .
>>>>> Sorry for the long post.
>>>>> Appreciate your help.
>>>>> Thanks
>>>>> -Roni
>>>>>
>>>>> Question - Do I have to create a new cluster using spark 1.3?
>>>>>
>>>>> Here is what I did -
>>>>>
>>>>> In my SBT file I  changed to -
>>>>> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"
>>>>>
>>>>> But then I started getting compilation error. along with
>>>>> Here are some of the libraries that were evicted:
>>>>> [warn]  * org.apache.spark:spark-core_2.10:1.2.0 -> 1.3.0
>>>>> [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) ->
>>>>> 2.6.0
>>>>> [warn] Run 'evicted' to see detailed eviction warnings
>>>>>
>>>>>  constructor cannot be instantiated to expected type;
>>>>> [error]  found   : (T1, T2)
>>>>> [error]  required: org.apache.spark.sql.catalyst.expressions.Row
>>>>> [error]                                 val ty =
>>>>> joinRDD.map{case(word, (file1Counts, file2Counts)) => KmerIntesect(word,
>>>>> file1Counts,"xyz")}
>>>>> [error]                                                          ^
>>>>>
>>>>> Here is my SBT and code --
>>>>> SBT -
>>>>>
>>>>> version := "1.0"
>>>>>
>>>>> scalaVersion := "2.10.4"
>>>>>
>>>>> resolvers += "Sonatype OSS Snapshots" at "
>>>>> https://oss.sonatype.org/content/repositories/snapshots";;
>>>>> resolvers += "Maven Repo1" at "https://repo1.maven.org/maven2";;
>>>>> resolvers += "Maven Repo" at "
>>>>> https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/";;
>>>>>
>>>>> /* Dependencies - %% appends Scala version to artifactId */
>>>>> libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0"
>>>>> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"
>>>>> libraryDependencies += "org.bdgenomics.adam" % "adam-core" % "0.16.0"
>>>>> libraryDependencies += "ai.h2o" % "sparkling-water-core_2.10" %
>>>>> "0.2.10"
>>>>>
>>>>>
>>>>> CODE --
>>>>> import org.apache.spark.{SparkConf, SparkContext}
>>>>> case class KmerIntesect(kmer: String, kCount: Int, fileName: String)
>>>>>
>>>>> object preDefKmerIntersection {
>>>>>   def main(args: Array[String]) {
>>>>>
>>>>>  val sparkConf = new SparkConf().setAppName("preDefKmer-intersect")
>>>>>      val sc = new SparkContext(sparkConf)
>>>>>         import sqlContext.createSchemaRDD
>>>>>         val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>>>>             val bedFile = sc.textFile("s3n://a/b/c",40)
>>>>>              val hgfasta = sc.textFile("hdfs://a/b/c",40)
>>>>>                  val hgPair = hgfasta.map(_.split (",")).map(a=>
>>>>> (a(0), a(1).trim().toInt))
>>>>>                  val filtered = hgPair.filter(kv => kv._2 == 1)
>>>>>                  val bedPair = bedFile.map(_.split (",")).map(a=>
>>>>> (a(0), a(1).trim().toInt))
>>>>>                  val joinRDD = bedPair.join(filtered)
>>>>>                 val ty = joinRDD.map{case(word, (file1Counts,
>>>>> file2Counts)) => KmerIntesect(word, file1Counts,"xyz")}
>>>>>                 ty.registerTempTable("KmerIntesect")
>>>>>
>>>>> ty.saveAsParquetFile("hdfs://x/y/z/kmerIntersect.parquet")
>>>>>   }
>>>>> }
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

Reply via email to