Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-26 Thread Steve Loughran

On 25 Mar 2015, at 21:54, roni 
roni.epi...@gmail.commailto:roni.epi...@gmail.com wrote:

Is there any way that I can install the new one and remove previous version.
I installed spark 1.3 on my EC2 master and set teh spark home to the new one.
But when I start teh spark-shell I get -
 java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native 
Method)

Is There no way to upgrade without creating new cluster?
Thanks
Roni

This isn't a spark version problem itself, more one of Hadoop versions.

If you see this it means that the Hadoop JARs shipping with Spark 1.3 are 
trying to bind to a native method implemented in hadoop.so —but that method 
isn't there.

Possible fixes

1. Simplest: find out which version of Hadoop is running in the EC2 cluster, 
and get a version of Spark 1.3 built against that version. If you can't find 
one, it's easy enough to just check out the 1.3.0 release off github/ASF git 
and build it yourself.

2. Upgrade the underlying Hadoop Cluster

3. find the location of hadoop.so in your VMs, and overwrite it with a the 
version of Hadoop.so from the version of Hadoop used in the build of Spark 1.3, 
and rely on the intent of the Hadoop team to make updated native binaries 
backwards compatible across branch-2 releases (i.e. they only add functions, 
not remove or rename them).

#3 is an ugly hack which may work immediately but once you get in the game of 
mixing artifacts from different Hadoop releases, is a slippery slope towards an 
unmaintanable Hadoop cluster.

I'd go for tactic #1 first


upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
I have a EC2 cluster created using spark version 1.2.1.
And I have a SBT project .
Now I want to upgrade to spark 1.3 and use the new features.
Below are issues .
Sorry for the long post.
Appreciate your help.
Thanks
-Roni

Question - Do I have to create a new cluster using spark 1.3?

Here is what I did -

In my SBT file I  changed to -
libraryDependencies += org.apache.spark %% spark-core % 1.3.0

But then I started getting compilation error. along with
Here are some of the libraries that were evicted:
[warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
[warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0
[warn] Run 'evicted' to see detailed eviction warnings

 constructor cannot be instantiated to expected type;
[error]  found   : (T1, T2)
[error]  required: org.apache.spark.sql.catalyst.expressions.Row
[error] val ty = joinRDD.map{case(word,
(file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
[error]  ^

Here is my SBT and code --
SBT -

version := 1.0

scalaVersion := 2.10.4

resolvers += Sonatype OSS Snapshots at 
https://oss.sonatype.org/content/repositories/snapshots;;
resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
resolvers += Maven Repo at 
https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

/* Dependencies - %% appends Scala version to artifactId */
libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
libraryDependencies += org.apache.spark %% spark-core % 1.3.0
libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10


CODE --
import org.apache.spark.{SparkConf, SparkContext}
case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

object preDefKmerIntersection {
  def main(args: Array[String]) {

 val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
 val sc = new SparkContext(sparkConf)
import sqlContext.createSchemaRDD
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val bedFile = sc.textFile(s3n://a/b/c,40)
 val hgfasta = sc.textFile(hdfs://a/b/c,40)
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
a(1).trim().toInt))
 val filtered = hgPair.filter(kv = kv._2 == 1)
 val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
a(1).trim().toInt))
 val joinRDD = bedPair.join(filtered)
val ty = joinRDD.map{case(word, (file1Counts, file2Counts))
= KmerIntesect(word, file1Counts,xyz)}
ty.registerTempTable(KmerIntesect)
ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
  }
}


Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Dean Wampler
For the Spark SQL parts, 1.3 breaks backwards compatibility, because before
1.3, Spark SQL was considered experimental where API changes were allowed.

So, H2O and ADA compatible with 1.2.X might not work with 1.3.

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com
 wrote:

 What version of Spark do the other dependencies rely on (Adam and H2O?) -
 that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) -
 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty = joinRDD.map{case(word,
 (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val joinRDD = bedPair.join(filtered)
 val ty = joinRDD.map{case(word, (file1Counts,
 file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 ty.registerTempTable(KmerIntesect)

 ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
   }
 }






Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Nick Pentreath
What version of Spark do the other dependencies rely on (Adam and H2O?) - that 
could be it




Or try sbt clean compile 



—
Sent from Mailbox

On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni
 Question - Do I have to create a new cluster using spark 1.3?
 Here is what I did -
 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings
  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty = joinRDD.map{case(word,
 (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 [error]  ^
 Here is my SBT and code --
 SBT -
 version := 1.0
 scalaVersion := 2.10.4
 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;
 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10
 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)
 object preDefKmerIntersection {
   def main(args: Array[String]) {
  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val joinRDD = bedPair.join(filtered)
 val ty = joinRDD.map{case(word, (file1Counts, file2Counts))
 = KmerIntesect(word, file1Counts,xyz)}
 ty.registerTempTable(KmerIntesect)
 ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
   }
 }

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
Even if H2o and ADA are dependent on 1.2.1 , it should be backword
compatible, right?
So using 1.3 should not break them.
And the code is not using the classes from those libs.
I tried sbt clean compile .. same errror
Thanks
_R

On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com
wrote:

 What version of Spark do the other dependencies rely on (Adam and H2O?) -
 that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty = joinRDD.map{case(word,
 (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val joinRDD = bedPair.join(filtered)
 val ty = joinRDD.map{case(word, (file1Counts,
 file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 ty.registerTempTable(KmerIntesect)
 ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
   }
 }





Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Nick Pentreath
Ah I see now you are trying to use a spark 1.2 cluster - you will need to be 
running spark 1.3 on your EC2 cluster in order to run programs built against 
spark 1.3.




You will need to terminate and restart your cluster with spark 1.3 



—
Sent from Mailbox

On Wed, Mar 25, 2015 at 6:39 PM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R
 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com
 wrote:
 What version of Spark do the other dependencies rely on (Adam and H2O?) -
 that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty = joinRDD.map{case(word,
 (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val joinRDD = bedPair.join(filtered)
 val ty = joinRDD.map{case(word, (file1Counts,
 file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 ty.registerTempTable(KmerIntesect)
 ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
   }
 }




Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
Thanks Dean and Nick.
So, I removed the ADAM and H2o from my SBT as I was not using them.
I got the code to compile  - only for fail while running with -
SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21
Exception in thread main java.lang.NoClassDefFoundError:
org/apache/spark/rdd/RDD$
at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

This line is where I do a JOIN operation.
val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt))
 val filtered = hgPair.filter(kv = kv._2 == 1)
 val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
a(1).trim().toInt))
* val joinRDD = bedPair.join(filtered)   *
Any idea whats going on?
I have data on the EC2 so I am avoiding creating the new cluster , but just
upgrading and changing the code to use 1.3 and Spark SQL
Thanks
Roni



On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com
  wrote:

 What version of Spark do the other dependencies rely on (Adam and H2O?)
 - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) -
 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty = joinRDD.map{case(word,
 (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a=
 (a(0), a(1).trim().toInt))
  val joinRDD = bedPair.join(filtered)
 val ty = joinRDD.map{case(word, (file1Counts,
 file2Counts)) = KmerIntesect(word, file1Counts,xyz)}
 ty.registerTempTable(KmerIntesect)

 ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet)
   }
 }







Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Dean Wampler
Weird. Are you running using SBT console? It should have the spark-core jar
on the classpath. Similarly, spark-shell or spark-submit should work, but
be sure you're using the same version of Spark when running as when
compiling. Also, you might need to add spark-sql to your SBT dependencies,
but that shouldn't be this issue.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote:

 Thanks Dean and Nick.
 So, I removed the ADAM and H2o from my SBT as I was not using them.
 I got the code to compile  - only for fail while running with -
 SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/rdd/RDD$
 at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

 This line is where I do a JOIN operation.
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
 * val joinRDD = bedPair.join(filtered)   *
 Any idea whats going on?
 I have data on the EC2 so I am avoiding creating the new cluster , but
 just upgrading and changing the code to use 1.3 and Spark SQL
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com
 wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath 
 nick.pentre...@gmail.com wrote:

 What version of Spark do the other dependencies rely on (Adam and H2O?)
 - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) -
 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty =
 joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word,
 file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 %
 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import sqlContext.createSchemaRDD
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val bedFile = sc.textFile(s3n://a/b/c,40)
  val hgfasta = sc.textFile(hdfs://a/b/c,40)
  val hgPair = 

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Dean Wampler
Yes, that's the problem. The RDD class exists in both binary jar files, but
the signatures probably don't match. The bottom line, as always for tools
like this, is that you can't mix versions.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 3:13 PM, roni roni.epi...@gmail.com wrote:

 My cluster is still on spark 1.2 and in SBT I am using 1.3.
 So probably it is compiling with 1.3 but running with 1.2 ?

 On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com
 wrote:

 Weird. Are you running using SBT console? It should have the spark-core
 jar on the classpath. Similarly, spark-shell or spark-submit should work,
 but be sure you're using the same version of Spark when running as when
 compiling. Also, you might need to add spark-sql to your SBT dependencies,
 but that shouldn't be this issue.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote:

 Thanks Dean and Nick.
 So, I removed the ADAM and H2o from my SBT as I was not using them.
 I got the code to compile  - only for fail while running with -
 SparkContext: Created broadcast 1 from textFile at
 kmerIntersetion.scala:21
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/rdd/RDD$
 at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

 This line is where I do a JOIN operation.
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
 * val joinRDD = bedPair.join(filtered)   *
 Any idea whats going on?
 I have data on the EC2 so I am avoiding creating the new cluster , but
 just upgrading and changing the code to use 1.3 and Spark SQL
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com
 wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath 
 nick.pentre...@gmail.com wrote:

 What version of Spark do the other dependencies rely on (Adam and
 H2O?) - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) -
 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty =
 joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word,
 file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/
 ;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client %
 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
My cluster is still on spark 1.2 and in SBT I am using 1.3.
So probably it is compiling with 1.3 but running with 1.2 ?

On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com
wrote:

 Weird. Are you running using SBT console? It should have the spark-core
 jar on the classpath. Similarly, spark-shell or spark-submit should work,
 but be sure you're using the same version of Spark when running as when
 compiling. Also, you might need to add spark-sql to your SBT dependencies,
 but that shouldn't be this issue.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote:

 Thanks Dean and Nick.
 So, I removed the ADAM and H2o from my SBT as I was not using them.
 I got the code to compile  - only for fail while running with -
 SparkContext: Created broadcast 1 from textFile at
 kmerIntersetion.scala:21
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/rdd/RDD$
 at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

 This line is where I do a JOIN operation.
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
 * val joinRDD = bedPair.join(filtered)   *
 Any idea whats going on?
 I have data on the EC2 so I am avoiding creating the new cluster , but
 just upgrading and changing the code to use 1.3 and Spark SQL
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com
 wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath 
 nick.pentre...@gmail.com wrote:

 What version of Spark do the other dependencies rely on (Adam and
 H2O?) - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) -
 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty =
 joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word,
 file1Counts,xyz)}
 [error]  ^

 Here is my SBT and code --
 SBT -

 version := 1.0

 scalaVersion := 2.10.4

 resolvers += Sonatype OSS Snapshots at 
 https://oss.sonatype.org/content/repositories/snapshots;;
 resolvers += Maven Repo1 at https://repo1.maven.org/maven2;;
 resolvers += Maven Repo at 
 https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/
 ;

 /* Dependencies - %% appends Scala version to artifactId */
 libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0
 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0
 libraryDependencies += ai.h2o % sparkling-water-core_2.10 %
 0.2.10


 CODE --
 import org.apache.spark.{SparkConf, SparkContext}
 case class KmerIntesect(kmer: String, kCount: Int, fileName: String)

 object preDefKmerIntersection {
   def main(args: Array[String]) {

  val sparkConf = new SparkConf().setAppName(preDefKmer-intersect)
  val sc = new SparkContext(sparkConf)
 import 

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread roni
Is there any way that I can install the new one and remove previous version.
I installed spark 1.3 on my EC2 master and set teh spark home to the new
one.
But when I start teh spark-shell I get -
 java.lang.UnsatisfiedLinkError:
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native
Method)

Is There no way to upgrade without creating new cluster?
Thanks
Roni



On Wed, Mar 25, 2015 at 1:18 PM, Dean Wampler deanwamp...@gmail.com wrote:

 Yes, that's the problem. The RDD class exists in both binary jar files,
 but the signatures probably don't match. The bottom line, as always for
 tools like this, is that you can't mix versions.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 3:13 PM, roni roni.epi...@gmail.com wrote:

 My cluster is still on spark 1.2 and in SBT I am using 1.3.
 So probably it is compiling with 1.3 but running with 1.2 ?

 On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com
 wrote:

 Weird. Are you running using SBT console? It should have the spark-core
 jar on the classpath. Similarly, spark-shell or spark-submit should work,
 but be sure you're using the same version of Spark when running as when
 compiling. Also, you might need to add spark-sql to your SBT dependencies,
 but that shouldn't be this issue.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote:

 Thanks Dean and Nick.
 So, I removed the ADAM and H2o from my SBT as I was not using them.
 I got the code to compile  - only for fail while running with -
 SparkContext: Created broadcast 1 from textFile at
 kmerIntersetion.scala:21
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/rdd/RDD$
 at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

 This line is where I do a JOIN operation.
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a=
 (a(0), a(1).trim().toInt))
 * val joinRDD = bedPair.join(filtered)   *
 Any idea whats going on?
 I have data on the EC2 so I am avoiding creating the new cluster , but
 just upgrading and changing the code to use 1.3 and Spark SQL
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com
 wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath 
 nick.pentre...@gmail.com wrote:

 What version of Spark do the other dependencies rely on (Adam and
 H2O?) - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% spark-core % 1.3.0

 But then I started getting compilation error. along with
 Here are some of the libraries that were evicted:
 [warn]  * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0
 [warn]  * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0)
 - 2.6.0
 [warn] Run 'evicted' to see detailed eviction warnings

  constructor cannot be instantiated to expected type;
 [error]  found   : (T1, T2)
 [error]  required: org.apache.spark.sql.catalyst.expressions.Row
 [error] val ty =
 joinRDD.map{case(word, (file1Counts, file2Counts)) = 
 KmerIntesect(word,
 file1Counts,xyz)}
 [error]   

Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems

2015-03-25 Thread Dean Wampler
You could stop the running the processes and run the same processes using
the new version, starting with the master and then the slaves. You would
have to snoop around a bit to get the command-line arguments right, but
it's doable. Use `ps -efw` to find the command-lines used. Be sure to rerun
them as the same user. Or look at what the EC2 scripts do.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com

On Wed, Mar 25, 2015 at 4:54 PM, roni roni.epi...@gmail.com wrote:

 Is there any way that I can install the new one and remove previous
 version.
 I installed spark 1.3 on my EC2 master and set teh spark home to the new
 one.
 But when I start teh spark-shell I get -
  java.lang.UnsatisfiedLinkError:
 org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
 at
 org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native
 Method)

 Is There no way to upgrade without creating new cluster?
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 1:18 PM, Dean Wampler deanwamp...@gmail.com
 wrote:

 Yes, that's the problem. The RDD class exists in both binary jar files,
 but the signatures probably don't match. The bottom line, as always for
 tools like this, is that you can't mix versions.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 3:13 PM, roni roni.epi...@gmail.com wrote:

 My cluster is still on spark 1.2 and in SBT I am using 1.3.
 So probably it is compiling with 1.3 but running with 1.2 ?

 On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com
 wrote:

 Weird. Are you running using SBT console? It should have the spark-core
 jar on the classpath. Similarly, spark-shell or spark-submit should work,
 but be sure you're using the same version of Spark when running as when
 compiling. Also, you might need to add spark-sql to your SBT dependencies,
 but that shouldn't be this issue.

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote:

 Thanks Dean and Nick.
 So, I removed the ADAM and H2o from my SBT as I was not using them.
 I got the code to compile  - only for fail while running with -
 SparkContext: Created broadcast 1 from textFile at
 kmerIntersetion.scala:21
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/rdd/RDD$
 at preDefKmerIntersection$.main(kmerIntersetion.scala:26)

 This line is where I do a JOIN operation.
 val hgPair = hgfasta.map(_.split (,)).map(a= (a(0),
 a(1).trim().toInt))
  val filtered = hgPair.filter(kv = kv._2 == 1)
  val bedPair = bedFile.map(_.split (,)).map(a=
 (a(0), a(1).trim().toInt))
 * val joinRDD = bedPair.join(filtered)   *
 Any idea whats going on?
 I have data on the EC2 so I am avoiding creating the new cluster , but
 just upgrading and changing the code to use 1.3 and Spark SQL
 Thanks
 Roni



 On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com
 wrote:

 For the Spark SQL parts, 1.3 breaks backwards compatibility, because
 before 1.3, Spark SQL was considered experimental where API changes were
 allowed.

 So, H2O and ADA compatible with 1.2.X might not work with 1.3.

 dean

 Dean Wampler, Ph.D.
 Author: Programming Scala, 2nd Edition
 http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
 Typesafe http://typesafe.com
 @deanwampler http://twitter.com/deanwampler
 http://polyglotprogramming.com

 On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote:

 Even if H2o and ADA are dependent on 1.2.1 , it should be backword
 compatible, right?
 So using 1.3 should not break them.
 And the code is not using the classes from those libs.
 I tried sbt clean compile .. same errror
 Thanks
 _R

 On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath 
 nick.pentre...@gmail.com wrote:

 What version of Spark do the other dependencies rely on (Adam and
 H2O?) - that could be it

 Or try sbt clean compile

 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com
 wrote:

 I have a EC2 cluster created using spark version 1.2.1.
 And I have a SBT project .
 Now I want to upgrade to spark 1.3 and use the new features.
 Below are issues .
 Sorry for the long post.
 Appreciate your help.
 Thanks
 -Roni

 Question - Do I have to create a new cluster using spark 1.3?

 Here is what I did -

 In my SBT file I  changed to -
 libraryDependencies += org.apache.spark %% 

Re: Upgrade to Spark 1.2.1 using Guava

2015-03-02 Thread Pat Ferrel
Marcelo’s work-around works. So if you are using the itemsimilarity stuff, the 
CLI has a way to solve the class not found and I can point out how to do the 
equivalent if you are using the library API. Ping me if you care.


On Feb 28, 2015, at 2:27 PM, Erlend Hamnaberg erl...@hamnaberg.net wrote:

Yes. I ran into this problem with mahout snapshot and spark 1.2.0 not really 
trying to figure out why that was a problem, since there were already too many 
moving parts in my app. Obviously there is a classpath issue somewhere.

/Erlend

On 27 Feb 2015 22:30, Pat Ferrel p...@occamsmachete.com 
mailto:p...@occamsmachete.com wrote:
@Erlend hah, we were trying to merge your PR and ran into this—small world. You 
actually compile the JavaSerializer source in your project?

@Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, 
that didn’t seem to work?

On Feb 27, 2015, at 1:23 PM, Erlend Hamnaberg erl...@hamnaberg.net 
mailto:erl...@hamnaberg.net wrote:

Hi.

I have had a simliar issue. I had to pull the JavaSerializer source into my own 
project, just so I got the classloading of this class under control.

This must be a class loader issue with spark.

-E

On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com 
mailto:p...@occamsmachete.com wrote:
I understand that I need to supply Guava to Spark. The HashBiMap is created in 
the client and broadcast to the workers. So it is needed in both. To achieve 
this there is a deps.jar with Guava (and Scopt but that is only for the 
client). Scopt is found so I know the jar is fine for the client.

I pass in the deps.jar to the context creation code. I’ve checked the content 
of the jar and have verified that it is used at context creation time.

I register the serializer as follows:

class MyKryoRegistrator extends KryoRegistrator {

  override def registerClasses(kryo: Kryo) = {

val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
//kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
log.info http://log.info/(\n\n\nRegister Serializer for  + 
h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed get 
logged
kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], 
new JavaSerializer())
  }
}

The job proceeds up until the broadcast value, a HashBiMap, is deserialized, 
which is where I get the following error.

Have I missed a step for deserialization of broadcast values? Odd that 
serialization happened but deserialization failed. I’m running on a standalone 
localhost-only cluster.


15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 
(TID 9, 192.168.0.2): java.io.IOException: 
com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at 
org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
deserialization.
at 
com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
at 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-28 Thread Pat Ferrel
Maybe but any time the work around is to use spark-submit --conf 
spark.executor.extraClassPath=/guava.jar blah” that means that standalone apps 
must have hard coded paths that are honored on every worker. And as you know a 
lib is pretty much blocked from use of this version of Spark—hence the blocker 
severity.

I could easily be wrong but userClassPathFirst doesn’t seem to be the issue. 
There is no class conflict.

On Feb 27, 2015, at 7:13 PM, Sean Owen so...@cloudera.com wrote:

This seems like a job for userClassPathFirst. Or could be. It's
definitely an issue of visibility between where the serializer is and
where the user class is.

At the top you said Pat that you didn't try this, but why not?

On Fri, Feb 27, 2015 at 10:11 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I’ll try to find a Jira for it. I hope a fix is in 1.3
 
 
 On Feb 27, 2015, at 1:59 PM, Pat Ferrel p...@occamsmachete.com wrote:
 
 Thanks! that worked.
 
 On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote:
 
 I don’t use spark-submit I have a standalone app.
 
 So I guess you want me to add that key/value to the conf in my code and make 
 sure it exists on workers.
 
 
 On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote:
 
 On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.
 
 Sorry, I'm still confused about what config you're changing. I'm
 suggesting using:
 
 spark-submit --conf spark.executor.extraClassPath=/guava.jar blah
 
 
 --
 Marcelo
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-28 Thread Erlend Hamnaberg
Yes. I ran into this problem with mahout snapshot and spark 1.2.0 not
really trying to figure out why that was a problem, since there were
already too many moving parts in my app. Obviously there is a classpath
issue somewhere.

/Erlend
On 27 Feb 2015 22:30, Pat Ferrel p...@occamsmachete.com wrote:

 @Erlend hah, we were trying to merge your PR and ran into this—small
 world. You actually compile the JavaSerializer source in your project?

 @Marcelo do you mean by modifying spark.executor.extraClassPath on all
 workers, that didn’t seem to work?

 On Feb 27, 2015, at 1:23 PM, Erlend Hamnaberg erl...@hamnaberg.net
 wrote:

 Hi.

 I have had a simliar issue. I had to pull the JavaSerializer source into
 my own project, just so I got the classloading of this class under control.

 This must be a class loader issue with spark.

 -E

 On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com wrote:

 I understand that I need to supply Guava to Spark. The HashBiMap is
 created in the client and broadcast to the workers. So it is needed in
 both. To achieve this there is a deps.jar with Guava (and Scopt but that is
 only for the client). Scopt is found so I know the jar is fine for the
 client.

 I pass in the deps.jar to the context creation code. I’ve checked the
 content of the jar and have verified that it is used at context creation
 time.

 I register the serializer as follows:

 class MyKryoRegistrator extends KryoRegistrator {

   override def registerClasses(kryo: Kryo) = {

 val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
 //kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
 log.info(\n\n\nRegister Serializer for  +
 h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed
 get logged
 kryo.register(classOf[com.google.common.collect.HashBiMap[String,
 Int]], new JavaSerializer())
   }
 }

 The job proceeds up until the broadcast value, a HashBiMap, is
 deserialized, which is where I get the following error.

 Have I missed a step for deserialization of broadcast values? Odd that
 serialization happened but deserialization failed. I’m running on a
 standalone localhost-only cluster.


 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage
 4.0 (TID 9, 192.168.0.2): java.io.IOException:
 com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
 at
 org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
 at
 org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
 at
 org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
 at
 org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
 at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
 at
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
 at
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at
 org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
 at
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
 at
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: com.esotericsoftware.kryo.KryoException: Error during Java
 deserialization.
 at
 com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
 at
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
 at
 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
 at
 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
 ... 19 more

  root eror ==
 Caused by: java.lang.ClassNotFoundException:
 com.google.common.collect.HashBiMap
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Sean Owen
This seems like a job for userClassPathFirst. Or could be. It's
definitely an issue of visibility between where the serializer is and
where the user class is.

At the top you said Pat that you didn't try this, but why not?

On Fri, Feb 27, 2015 at 10:11 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I’ll try to find a Jira for it. I hope a fix is in 1.3


 On Feb 27, 2015, at 1:59 PM, Pat Ferrel p...@occamsmachete.com wrote:

 Thanks! that worked.

 On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote:

 I don’t use spark-submit I have a standalone app.

 So I guess you want me to add that key/value to the conf in my code and make 
 sure it exists on workers.


 On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote:

 On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.

 Sorry, I'm still confused about what config you're changing. I'm
 suggesting using:

 spark-submit --conf spark.executor.extraClassPath=/guava.jar blah


 --
 Marcelo

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
I understand that I need to supply Guava to Spark. The HashBiMap is created in 
the client and broadcast to the workers. So it is needed in both. To achieve 
this there is a deps.jar with Guava (and Scopt but that is only for the 
client). Scopt is found so I know the jar is fine for the client. 

I pass in the deps.jar to the context creation code. I’ve checked the content 
of the jar and have verified that it is used at context creation time. 

I register the serializer as follows:

class MyKryoRegistrator extends KryoRegistrator {

  override def registerClasses(kryo: Kryo) = {

val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
//kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
log.info(\n\n\nRegister Serializer for  + h.getClass.getCanonicalName + 
\n\n\n) // just to be sure this does indeed get logged
kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], 
new JavaSerializer())
  }
}

The job proceeds up until the broadcast value, a HashBiMap, is deserialized, 
which is where I get the following error. 

Have I missed a step for deserialization of broadcast values? Odd that 
serialization happened but deserialization failed. I’m running on a standalone 
localhost-only cluster.


15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 
(TID 9, 192.168.0.2): java.io.IOException: 
com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at 
org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
deserialization.
at 
com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
... 19 more

 root eror ==
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
...








On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote:

Guava is not in Spark. (Well, long version: it's in Spark but it's
relocated to a different package except for some special classes
leaked through the public API.)

If your app needs Guava, it needs to package Guava with it (e.g. by
using maven-shade-plugin, or using --jars if only executors use
Guava).

On Wed, Feb 25, 2015 at 5:17 PM, Pat Ferrel p...@occamsmachete.com wrote:
 The root Spark pom has guava set at a certain version number. It’s very hard
 to read the shading xml. Someone suggested that I try using
 userClassPathFirst but that sounds too heavy handed since I don’t really
 care which version of guava I get, not picky.
 
 When I set my project to use the same version as Spark I get a missing
 classdef, which usually means a version conflict.
 
 At this point I am quite confused about what is 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
Ah, I see. That makes a lot of sense now.

You might be running into some weird class loader visibility issue.
I've seen some bugs in jira about this in the past, maybe you're
hitting one of them.

Until I have some time to investigate (of if you're curious feel free
to scavenge jira), a workaround could be to manually copy the guava
jar to your executor nodes, and add them to the executor's class path
manually (spark.executor.extraClassPath). That will place your guava
in the Spark classloader (vs. your app's class loader when using
--jars), and things should work.


On Fri, Feb 27, 2015 at 11:52 AM, Pat Ferrel p...@occamsmachete.com wrote:
 I understand that I need to supply Guava to Spark. The HashBiMap is created 
 in the client and broadcast to the workers. So it is needed in both. To 
 achieve this there is a deps.jar with Guava (and Scopt but that is only for 
 the client). Scopt is found so I know the jar is fine for the client.

 I pass in the deps.jar to the context creation code. I’ve checked the content 
 of the jar and have verified that it is used at context creation time.

 I register the serializer as follows:

 class MyKryoRegistrator extends KryoRegistrator {

   override def registerClasses(kryo: Kryo) = {

 val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
 //kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
 log.info(\n\n\nRegister Serializer for  + h.getClass.getCanonicalName + 
 \n\n\n) // just to be sure this does indeed get logged
 kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], 
 new JavaSerializer())
   }
 }

 The job proceeds up until the broadcast value, a HashBiMap, is deserialized, 
 which is where I get the following error.

 Have I missed a step for deserialization of broadcast values? Odd that 
 serialization happened but deserialization failed. I’m running on a 
 standalone localhost-only cluster.


 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 
 (TID 9, 192.168.0.2): java.io.IOException: 
 com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
 at 
 org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
 at 
 org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
 at 
 org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
 at 
 org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
 at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
 at 
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
 at 
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at 
 org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
 at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
 at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
 deserialization.
 at 
 com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
 at 
 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
 at 
 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
 ... 19 more

  root eror ==
 Caused by: java.lang.ClassNotFoundException: 
 com.google.common.collect.HashBiMap
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 ...








 On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote:

 Guava is not in Spark. (Well, long version: it's in Spark but it's
 relocated to a 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Erlend Hamnaberg
Hi.

I have had a simliar issue. I had to pull the JavaSerializer source into my
own project, just so I got the classloading of this class under control.

This must be a class loader issue with spark.

-E

On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com wrote:

 I understand that I need to supply Guava to Spark. The HashBiMap is
 created in the client and broadcast to the workers. So it is needed in
 both. To achieve this there is a deps.jar with Guava (and Scopt but that is
 only for the client). Scopt is found so I know the jar is fine for the
 client.

 I pass in the deps.jar to the context creation code. I’ve checked the
 content of the jar and have verified that it is used at context creation
 time.

 I register the serializer as follows:

 class MyKryoRegistrator extends KryoRegistrator {

   override def registerClasses(kryo: Kryo) = {

 val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
 //kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
 log.info(\n\n\nRegister Serializer for  +
 h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed
 get logged
 kryo.register(classOf[com.google.common.collect.HashBiMap[String,
 Int]], new JavaSerializer())
   }
 }

 The job proceeds up until the broadcast value, a HashBiMap, is
 deserialized, which is where I get the following error.

 Have I missed a step for deserialization of broadcast values? Odd that
 serialization happened but deserialization failed. I’m running on a
 standalone localhost-only cluster.


 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage
 4.0 (TID 9, 192.168.0.2): java.io.IOException:
 com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
 at
 org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
 at
 org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
 at
 org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
 at
 org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
 at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
 at
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
 at
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at
 org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
 at
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
 at
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: com.esotericsoftware.kryo.KryoException: Error during Java
 deserialization.
 at
 com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
 at
 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
 at
 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
 ... 19 more

  root eror ==
 Caused by: java.lang.ClassNotFoundException:
 com.google.common.collect.HashBiMap
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 ...








 On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote:

 Guava is not in Spark. (Well, long version: it's in Spark but it's
 relocated to a different package except for some special classes
 leaked through the public API.)

 If your app needs Guava, it needs to package Guava with it (e.g. by
 using maven-shade-plugin, or using --jars if only executors use
 Guava).

 On Wed, Feb 25, 2015 at 5:17 PM, Pat Ferrel p...@occamsmachete.com wrote:
  The root Spark pom has guava set at a certain version number. It’s very
 hard
  to read the 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
On Fri, Feb 27, 2015 at 1:30 PM, Pat Ferrel p...@occamsmachete.com wrote:
 @Marcelo do you mean by modifying spark.executor.extraClassPath on all
 workers, that didn’t seem to work?

That's an app configuration, not a worker configuration, so if you're
trying to set it on the worker configuration it will definitely not
work.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
I don’t use spark-submit I have a standalone app.

So I guess you want me to add that key/value to the conf in my code and make 
sure it exists on workers.


On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote:

On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.

Sorry, I'm still confused about what config you're changing. I'm
suggesting using:

spark-submit --conf spark.executor.extraClassPath=/guava.jar blah


-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
@Erlend hah, we were trying to merge your PR and ran into this—small world. You 
actually compile the JavaSerializer source in your project?

@Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, 
that didn’t seem to work?

On Feb 27, 2015, at 1:23 PM, Erlend Hamnaberg erl...@hamnaberg.net wrote:

Hi.

I have had a simliar issue. I had to pull the JavaSerializer source into my own 
project, just so I got the classloading of this class under control.

This must be a class loader issue with spark.

-E

On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com 
mailto:p...@occamsmachete.com wrote:
I understand that I need to supply Guava to Spark. The HashBiMap is created in 
the client and broadcast to the workers. So it is needed in both. To achieve 
this there is a deps.jar with Guava (and Scopt but that is only for the 
client). Scopt is found so I know the jar is fine for the client.

I pass in the deps.jar to the context creation code. I’ve checked the content 
of the jar and have verified that it is used at context creation time.

I register the serializer as follows:

class MyKryoRegistrator extends KryoRegistrator {

  override def registerClasses(kryo: Kryo) = {

val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
//kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
log.info http://log.info/(\n\n\nRegister Serializer for  + 
h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed get 
logged
kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], 
new JavaSerializer())
  }
}

The job proceeds up until the broadcast value, a HashBiMap, is deserialized, 
which is where I get the following error.

Have I missed a step for deserialization of broadcast values? Odd that 
serialization happened but deserialization failed. I’m running on a standalone 
localhost-only cluster.


15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 
(TID 9, 192.168.0.2): java.io.IOException: 
com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at 
org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
deserialization.
at 
com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
... 19 more

 root eror ==
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
...








On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com 
mailto:van...@cloudera.com wrote:

Guava is not in Spark. (Well, long version: it's in Spark but it's
relocated to a different package except for some special classes
leaked 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.

Sorry, I'm still confused about what config you're changing. I'm
suggesting using:

spark-submit --conf spark.executor.extraClassPath=/guava.jar blah


-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
Thanks! that worked.

On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote:

I don’t use spark-submit I have a standalone app.

So I guess you want me to add that key/value to the conf in my code and make 
sure it exists on workers.


On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote:

On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.

Sorry, I'm still confused about what config you're changing. I'm
suggesting using:

spark-submit --conf spark.executor.extraClassPath=/guava.jar blah


-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
I’ll try to find a Jira for it. I hope a fix is in 1.3


On Feb 27, 2015, at 1:59 PM, Pat Ferrel p...@occamsmachete.com wrote:

Thanks! that worked.

On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote:

I don’t use spark-submit I have a standalone app.

So I guess you want me to add that key/value to the conf in my code and make 
sure it exists on workers.


On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote:

On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.

Sorry, I'm still confused about what config you're changing. I'm
suggesting using:

spark-submit --conf spark.executor.extraClassPath=/guava.jar blah


-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
I changed in the spark master conf, which is also the only worker. I added a 
path to the jar that has guava in it. Still can’t find the class.

Trying Erland’s idea next.

On Feb 27, 2015, at 1:35 PM, Marcelo Vanzin van...@cloudera.com wrote:

On Fri, Feb 27, 2015 at 1:30 PM, Pat Ferrel p...@occamsmachete.com wrote:
 @Marcelo do you mean by modifying spark.executor.extraClassPath on all
 workers, that didn’t seem to work?

That's an app configuration, not a worker configuration, so if you're
trying to set it on the worker configuration it will definitely not
work.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



upgrade to Spark 1.2.1

2015-02-25 Thread Pat Ferrel
Getting an error that confuses me. Running a largish app on a standalone 
cluster on my laptop. The app uses a guava HashBiMap as a broadcast value. With 
Spark 1.1.0 I simply registered the class and its serializer with kryo like 
this:

   kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], new 
JavaSerializer())

And all was well. I’ve also tried addSerializer instead of register. Now I get 
a class not found during deserialization. I checked the jar list used to create 
the context and found the jar that contains HashBiMap but get this error. Any 
ideas:

15/02/25 14:46:33 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 
(TID 8, 192.168.0.2): java.io.IOException: 
com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at 
org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
at 
org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at 
org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
deserialization.
at 
com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
... 19 more

== root error 
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap


at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:625)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:40)
... 24 more



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: upgrade to Spark 1.2.1

2015-02-25 Thread Ted Yu
Could this be caused by Spark using shaded Guava jar ?

Cheers

On Wed, Feb 25, 2015 at 3:26 PM, Pat Ferrel p...@occamsmachete.com wrote:

 Getting an error that confuses me. Running a largish app on a standalone
 cluster on my laptop. The app uses a guava HashBiMap as a broadcast value.
 With Spark 1.1.0 I simply registered the class and its serializer with kryo
 like this:

kryo.register(classOf[com.google.common.collect.HashBiMap[String,
 Int]], new JavaSerializer())

 And all was well. I’ve also tried addSerializer instead of register. Now I
 get a class not found during deserialization. I checked the jar list used
 to create the context and found the jar that contains HashBiMap but get
 this error. Any ideas:

 15/02/25 14:46:33 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
 4.0 (TID 8, 192.168.0.2): java.io.IOException:
 com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
 at
 org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
 at
 org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
 at
 org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
 at
 org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
 at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
 at
 org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
 at
 org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at
 org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
 at
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
 at
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: com.esotericsoftware.kryo.KryoException: Error during Java
 deserialization.
 at
 com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
 at
 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
 at
 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
 ... 19 more

 == root error 
 Caused by: java.lang.ClassNotFoundException:
 com.google.common.collect.HashBiMap


 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:274)
 at
 java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:625)
 at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
 at
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at
 com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:40)
 ... 24 more



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org