Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-26 Thread Steve Loughran

On 25 Mar 2015, at 21:54, roni 
roni.epi...@gmail.commailto:roni.epi...@gmail.com wrote:

Is there any way that I can install the new one and remove previous version.
I installed spark 1.3 on my EC2 master and set teh spark home to the new one.
But when I start teh spark-shell I get -
 java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native 
Method)

Is There no way to upgrade without creating new cluster?
Thanks
Roni

This isn't a spark version problem itself, more one of Hadoop versions.

If you see this it means that the Hadoop JARs shipping with Spark 1.3 are 
trying to bind to a native method implemented in hadoop.so —but that method 
isn't there.

Possible fixes

1. Simplest: find out which version of Hadoop is running in the EC2 cluster, 
and get a version of Spark 1.3 built against that version. If you can't find 
one, it's easy enough to just check out the 1.3.0 release off github/ASF git 
and build it yourself.

2. Upgrade the underlying Hadoop Cluster

3. find the location of hadoop.so in your VMs, and overwrite it with a the 
version of Hadoop.so from the version of Hadoop used in the build of Spark 1.3, 
and rely on the intent of the Hadoop team to make updated native binaries 
backwards compatible across branch-2 releases (i.e. they only add functions, 
not remove or rename them).

#3 is an ugly hack which may work immediately but once you get in the game of 
mixing artifacts from different Hadoop releases, is a slippery slope towards an 
unmaintanable Hadoop cluster.

I'd go for tactic #1 first


upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
I have a EC2 cluster created using spark version 1.2.1.
And I have a SBT project .
Now I want to upgrade to spark 1.3 and use the new features.
Below are issues .
Sorry for the long post.
Appreciate your help.
Thanks
-Roni

Question - Do I have to create a new cluster using spark 1.3?

Here is what I did -

In my SBT file I  changed to -
libraryDependencies += org.apache.spark %% spark-core % 1.3.0

But then I started getting compilation error. along with
Here are some of the libraries that were evicted:
[warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
[warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0
[warn] Run 'evicted' to see detailed eviction warnings

 constructor cannot be instantiated to expected type;
[error]  found   : (T1, T2)
[error]  required: org.apache.spark.sql.catalyst.expressions.Row
[error] val ty = joinRDD.map{case(word,
(file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
[error]  ^

Here is my SBT and code --
SBT -

version := 1.0

scalaVersion := 2.10.4

resolvers += Sonatype OSS Snapshots at 
https://oss.sonatype.org/content/repositories/snapshots;;
resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
resolvers += Maven Repo at 
https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

/* Dependencies - %% appends Scala version to artifactId */
libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
libraryDependencies += org.apache.spark %% spark-core % 1.3.0
libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10


CODE --
import org.apache.spark.{SparkConf, SparkContext}
case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

object preDefKmerIntersection {
  def main(args: Array[String]) {

 val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
 val sc = new SparkContext(sparkConf)
import sqlContext.createSchemaRDD
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val bedFile = sc.textFile(s3n://a/b/c,40)
 val hgfasta = sc.textFile(hdfs://a/b/c,40)
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
a(1).trim().toInt))
 val filtered = hgPair.filter(kv = kv._2 == 1)
 val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
a(1).trim().toInt))
 val joinRDD = bedPair.join(filtered)
val ty = joinRDD.map{case(word, (file1Counts, file2Counts))
= KmerIntesect(word, file1Counts,xyz)}
ty.registerTempTable(KmerIntesect)
ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
  }
}


Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Dean Wampler
For the Spark SQL parts, 1.3 breaks backwards compatibility, because before
1.3, Spark SQL was considered experimental where API changes were allowed.

So, H2O and ADA compatible with 1.2.X might not work with 1.3.

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com
 wrote:

 What version of Spark do the other dependencies rely on (Adam and H2O?) -
 that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) -
 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty = joinRDD.map{case(word,
 (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val joinRDD = bedPair.join(filtered)
 val ty = joinRDD.map{case(word, (file1Counts,
 file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 ty.registerTempTable(KmerIntesect)

 ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
   }
 }






Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Nick Pentreath
What version of Spark do the other dependencies rely on (Adam and H2O?) - that 
could be it




Or try sbt clean compile 



—
Sent from Mailbox

On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni
 Question - Do I have to create a new cluster using spark 1.3?
 Here is what I did -
 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings
  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty = joinRDD.map{case(word,
 (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 [error]  ^
 Here is my SBT and code --
 SBT -
 version := 1.0
 scalaVersion := 2.10.4
 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;
 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10
 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)
 object preDefKmerIntersection {
   def main(args: Array[String]) {
  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val joinRDD = bedPair.join(filtered)
 val ty = joinRDD.map{case(word, (file1Counts, file2Counts))
 = KmerIntesect(word, file1Counts,xyz)}
 ty.registerTempTable(KmerIntesect)
 ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
   }
 }

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
Even if H2o and ADA are dependent on 1.2.1 , it should be backword
compatible, right?
So using 1.3 should not break them.
And the code is not using the classes from those libs.
I tried sbt clean compile .. same errror
Thanks
_R

On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com
wrote:

 What version of Spark do the other dependencies rely on (Adam and H2O?) -
 that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty = joinRDD.map{case(word,
 (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val joinRDD = bedPair.join(filtered)
 val ty = joinRDD.map{case(word, (file1Counts,
 file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 ty.registerTempTable(KmerIntesect)
 ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
   }
 }





Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Nick Pentreath
Ah I see now you are trying to use a spark 1.2 cluster - you will need to be 
running spark 1.3 on your EC2 cluster in order to run programs built against 
spark 1.3.




You will need to terminate and restart your cluster with spark 1.3 



—
Sent from Mailbox

On Wed, Mar 25, 2015 at 6:39 PM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R
 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com
 wrote:
 What version of Spark do the other dependencies rely on (Adam and H2O?) -
 that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty = joinRDD.map{case(word,
 (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val joinRDD = bedPair.join(filtered)
 val ty = joinRDD.map{case(word, (file1Counts,
 file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 ty.registerTempTable(KmerIntesect)
 ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
   }
 }




Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
Thanks Dean and Nick.
So, I removed the ADAM and H2o from my SBT as I was not using them.
I got the code to compile  - only for fail while running with -
SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21
Exception in thread main java.lang.NoClassDefFoundError:
org/apache/spark/rdd/RDD$
at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

This line is where I do a JOIN operation.
val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt))
 val filtered = hgPair.filter(kv = kv._2 == 1)
 val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
a(1).trim().toInt))
* val joinRDD = bedPair.join(filtered)   *
Any idea whats going on?
I have data on the EC2 so I am avoiding creating the new cluster , but just
upgrading and changing the code to use 1.3 and Spark SQL
Thanks
Roni



On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com
  wrote:

 What version of Spark do the other dependencies rely on (Adam and H2O?)
 - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) -
 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty = joinRDD.map{case(word,
 (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a=
 (a(0), a(1).trim().toInt))
  val joinRDD = bedPair.join(filtered)
 val ty = joinRDD.map{case(word, (file1Counts,
 file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 ty.registerTempTable(KmerIntesect)

 ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
   }
 }







Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Dean Wampler
Weird. Are you running using SBT console? It should have the spark-core jar
on the classpath. Similarly, spark-shell or spark-submit should work, but
be sure you're using the same version of Spark when running as when
compiling. Also, you might need to add spark-sql to your SBT dependencies,
but that shouldn't be this issue.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote:

 Thanks Dean and Nick.
 So, I removed the ADAM and H2o from my SBT as I was not using them.
 I got the code to compile  - only for fail while running with -
 SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/rdd/RDD$
 at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

 This line is where I do a JOIN operation.
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
 * val joinRDD = bedPair.join(filtered)   *
 Any idea whats going on?
 I have data on the EC2 so I am avoiding creating the new cluster , but
 just upgrading and changing the code to use 1.3 and Spark SQL
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com
 wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath 
 nick.pentre...@gmail.com wrote:

 What version of Spark do the other dependencies rely on (Adam and H2O?)
 - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) -
 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty =
 joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word,
 file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 %
 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = 

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Dean Wampler
Yes, that's the problem. The RDD class exists in both binary jar files, but
the signatures probably don't match. The bottom line, as always for tools
like this, is that you can't mix versions.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 3:13 PM, roni roni.epi...@gmail.com wrote:

 My cluster is still on spark 1.2 and in SBT I am using 1.3.
 So probably it is compiling with 1.3 but running with 1.2 ?

 On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com
 wrote:

 Weird. Are you running using SBT console? It should have the spark-core
 jar on the classpath. Similarly, spark-shell or spark-submit should work,
 but be sure you're using the same version of Spark when running as when
 compiling. Also, you might need to add spark-sql to your SBT dependencies,
 but that shouldn't be this issue.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote:

 Thanks Dean and Nick.
 So, I removed the ADAM and H2o from my SBT as I was not using them.
 I got the code to compile  - only for fail while running with -
 SparkContext: Created broadcast 1 from textFile at
 kmerIntersetion.scala:21
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/rdd/RDD$
 at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

 This line is where I do a JOIN operation.
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
 * val joinRDD = bedPair.join(filtered)   *
 Any idea whats going on?
 I have data on the EC2 so I am avoiding creating the new cluster , but
 just upgrading and changing the code to use 1.3 and Spark SQL
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com
 wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath 
 nick.pentre...@gmail.com wrote:

 What version of Spark do the other dependencies rely on (Adam and
 H2O?) - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) -
 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty =
 joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word,
 file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/
 ;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client %
 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
My cluster is still on spark 1.2 and in SBT I am using 1.3.
So probably it is compiling with 1.3 but running with 1.2 ?

On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com
wrote:

 Weird. Are you running using SBT console? It should have the spark-core
 jar on the classpath. Similarly, spark-shell or spark-submit should work,
 but be sure you're using the same version of Spark when running as when
 compiling. Also, you might need to add spark-sql to your SBT dependencies,
 but that shouldn't be this issue.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote:

 Thanks Dean and Nick.
 So, I removed the ADAM and H2o from my SBT as I was not using them.
 I got the code to compile  - only for fail while running with -
 SparkContext: Created broadcast 1 from textFile at
 kmerIntersetion.scala:21
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/rdd/RDD$
 at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

 This line is where I do a JOIN operation.
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
 * val joinRDD = bedPair.join(filtered)   *
 Any idea whats going on?
 I have data on the EC2 so I am avoiding creating the new cluster , but
 just upgrading and changing the code to use 1.3 and Spark SQL
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com
 wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath 
 nick.pentre...@gmail.com wrote:

 What version of Spark do the other dependencies rely on (Adam and
 H2O?) - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) -
 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty =
 joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word,
 file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/
 ;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 %
 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import 

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
Is there any way that I can install the new one and remove previous version.
I installed spark 1.3 on my EC2 master and set teh spark home to the new
one.
But when I start teh spark-shell I get -
 java.lang.UnsatisfiedLinkError:
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native
Method)

Is There no way to upgrade without creating new cluster?
Thanks
Roni



On Wed, Mar 25, 2015 at 1:18 PM, Dean Wampler deanwamp...@gmail.com wrote:

 Yes, that's the problem. The RDD class exists in both binary jar files,
 but the signatures probably don't match. The bottom line, as always for
 tools like this, is that you can't mix versions.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 3:13 PM, roni roni.epi...@gmail.com wrote:

 My cluster is still on spark 1.2 and in SBT I am using 1.3.
 So probably it is compiling with 1.3 but running with 1.2 ?

 On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com
 wrote:

 Weird. Are you running using SBT console? It should have the spark-core
 jar on the classpath. Similarly, spark-shell or spark-submit should work,
 but be sure you're using the same version of Spark when running as when
 compiling. Also, you might need to add spark-sql to your SBT dependencies,
 but that shouldn't be this issue.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote:

 Thanks Dean and Nick.
 So, I removed the ADAM and H2o from my SBT as I was not using them.
 I got the code to compile  - only for fail while running with -
 SparkContext: Created broadcast 1 from textFile at
 kmerIntersetion.scala:21
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/rdd/RDD$
 at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

 This line is where I do a JOIN operation.
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a=
 (a(0), a(1).trim().toInt))
 * val joinRDD = bedPair.join(filtered)   *
 Any idea whats going on?
 I have data on the EC2 so I am avoiding creating the new cluster , but
 just upgrading and changing the code to use 1.3 and Spark SQL
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com
 wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath 
 nick.pentre...@gmail.com wrote:

 What version of Spark do the other dependencies rely on (Adam and
 H2O?) - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0)
 - 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty =
 joinRDD.map{case(word, (file1Counts, file2Counts)) = 
 KmerIntesect(word,
 file1Counts,xyz)}
 [error]   

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Dean Wampler
You could stop the running the processes and run the same processes using
the new version, starting with the master and then the slaves. You would
have to snoop around a bit to get the command-line arguments right, but
it's doable. Use `ps -efw` to find the command-lines used. Be sure to rerun
them as the same user. Or look at what the EC2 scripts do.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 4:54 PM, roni roni.epi...@gmail.com wrote:

 Is there any way that I can install the new one and remove previous
 version.
 I installed spark 1.3 on my EC2 master and set teh spark home to the new
 one.
 But when I start teh spark-shell I get -
  java.lang.UnsatisfiedLinkError:
 org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
 at
 org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native
 Method)

 Is There no way to upgrade without creating new cluster?
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 1:18 PM, Dean Wampler deanwamp...@gmail.com
 wrote:

 Yes, that's the problem. The RDD class exists in both binary jar files,
 but the signatures probably don't match. The bottom line, as always for
 tools like this, is that you can't mix versions.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 3:13 PM, roni roni.epi...@gmail.com wrote:

 My cluster is still on spark 1.2 and in SBT I am using 1.3.
 So probably it is compiling with 1.3 but running with 1.2 ?

 On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com
 wrote:

 Weird. Are you running using SBT console? It should have the spark-core
 jar on the classpath. Similarly, spark-shell or spark-submit should work,
 but be sure you're using the same version of Spark when running as when
 compiling. Also, you might need to add spark-sql to your SBT dependencies,
 but that shouldn't be this issue.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote:

 Thanks Dean and Nick.
 So, I removed the ADAM and H2o from my SBT as I was not using them.
 I got the code to compile  - only for fail while running with -
 SparkContext: Created broadcast 1 from textFile at
 kmerIntersetion.scala:21
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/rdd/RDD$
 at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

 This line is where I do a JOIN operation.
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a=
 (a(0), a(1).trim().toInt))
 * val joinRDD = bedPair.join(filtered)   *
 Any idea whats going on?
 I have data on the EC2 so I am avoiding creating the new cluster , but
 just upgrading and changing the code to use 1.3 and Spark SQL
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com
 wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath 
 nick.pentre...@gmail.com wrote:

 What version of Spark do the other dependencies rely on (Adam and
 H2O?) - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com
 wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %%