Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems
On 25 Mar 2015, at 21:54, roni roni.epi...@gmail.commailto:roni.epi...@gmail.com wrote: Is there any way that I can install the new one and remove previous version. I installed spark 1.3 on my EC2 master and set teh spark home to the new one. But when I start teh spark-shell I get - java.lang.UnsatisfiedLinkError: org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method) Is There no way to upgrade without creating new cluster? Thanks Roni This isn't a spark version problem itself, more one of Hadoop versions. If you see this it means that the Hadoop JARs shipping with Spark 1.3 are trying to bind to a native method implemented in hadoop.so —but that method isn't there. Possible fixes 1. Simplest: find out which version of Hadoop is running in the EC2 cluster, and get a version of Spark 1.3 built against that version. If you can't find one, it's easy enough to just check out the 1.3.0 release off github/ASF git and build it yourself. 2. Upgrade the underlying Hadoop Cluster 3. find the location of hadoop.so in your VMs, and overwrite it with a the version of Hadoop.so from the version of Hadoop used in the build of Spark 1.3, and rely on the intent of the Hadoop team to make updated native binaries backwards compatible across branch-2 releases (i.e. they only add functions, not remove or rename them). #3 is an ugly hack which may work immediately but once you get in the game of mixing artifacts from different Hadoop releases, is a slippery slope towards an unmaintanable Hadoop cluster. I'd go for tactic #1 first
Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems
For the Spark SQL parts, 1.3 breaks backwards compatibility, because before 1.3, Spark SQL was considered experimental where API changes were allowed. So, H2O and ADA compatible with 1.2.X might not work with 1.3. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote: Even if H2o and ADA are dependent on 1.2.1 , it should be backword compatible, right? So using 1.3 should not break them. And the code is not using the classes from those libs. I tried sbt clean compile .. same errror Thanks _R On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com wrote: What version of Spark do the other dependencies rely on (Adam and H2O?) - that could be it Or try sbt clean compile — Sent from Mailbox https://www.dropbox.com/mailbox On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote: I have a EC2 cluster created using spark version 1.2.1. And I have a SBT project . Now I want to upgrade to spark 1.3 and use the new features. Below are issues . Sorry for the long post. Appreciate your help. Thanks -Roni Question - Do I have to create a new cluster using spark 1.3? Here is what I did - In my SBT file I changed to - libraryDependencies += org.apache.spark %% spark-core % 1.3.0 But then I started getting compilation error. along with Here are some of the libraries that were evicted: [warn] * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0 [warn] * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0 [warn] Run 'evicted' to see detailed eviction warnings constructor cannot be instantiated to expected type; [error] found : (T1, T2) [error] required: org.apache.spark.sql.catalyst.expressions.Row [error] val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} [error] ^ Here is my SBT and code -- SBT - version := 1.0 scalaVersion := 2.10.4 resolvers += Sonatype OSS Snapshots at https://oss.sonatype.org/content/repositories/snapshots;; resolvers += Maven Repo1 at https://repo1.maven.org/maven2;; resolvers += Maven Repo at https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;; /* Dependencies - %% appends Scala version to artifactId */ libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0 libraryDependencies += org.apache.spark %% spark-core % 1.3.0 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10 CODE -- import org.apache.spark.{SparkConf, SparkContext} case class KmerIntesect(kmer: String, kCount: Int, fileName: String) object preDefKmerIntersection { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(preDefKmer-intersect) val sc = new SparkContext(sparkConf) import sqlContext.createSchemaRDD val sqlContext = new org.apache.spark.sql.SQLContext(sc) val bedFile = sc.textFile(s3n://a/b/c,40) val hgfasta = sc.textFile(hdfs://a/b/c,40) val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val filtered = hgPair.filter(kv = kv._2 == 1) val bedPair = bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val joinRDD = bedPair.join(filtered) val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} ty.registerTempTable(KmerIntesect) ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet) } }
Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems
What version of Spark do the other dependencies rely on (Adam and H2O?) - that could be it Or try sbt clean compile — Sent from Mailbox On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote: I have a EC2 cluster created using spark version 1.2.1. And I have a SBT project . Now I want to upgrade to spark 1.3 and use the new features. Below are issues . Sorry for the long post. Appreciate your help. Thanks -Roni Question - Do I have to create a new cluster using spark 1.3? Here is what I did - In my SBT file I changed to - libraryDependencies += org.apache.spark %% spark-core % 1.3.0 But then I started getting compilation error. along with Here are some of the libraries that were evicted: [warn] * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0 [warn] * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0 [warn] Run 'evicted' to see detailed eviction warnings constructor cannot be instantiated to expected type; [error] found : (T1, T2) [error] required: org.apache.spark.sql.catalyst.expressions.Row [error] val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} [error] ^ Here is my SBT and code -- SBT - version := 1.0 scalaVersion := 2.10.4 resolvers += Sonatype OSS Snapshots at https://oss.sonatype.org/content/repositories/snapshots;; resolvers += Maven Repo1 at https://repo1.maven.org/maven2;; resolvers += Maven Repo at https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;; /* Dependencies - %% appends Scala version to artifactId */ libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0 libraryDependencies += org.apache.spark %% spark-core % 1.3.0 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10 CODE -- import org.apache.spark.{SparkConf, SparkContext} case class KmerIntesect(kmer: String, kCount: Int, fileName: String) object preDefKmerIntersection { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(preDefKmer-intersect) val sc = new SparkContext(sparkConf) import sqlContext.createSchemaRDD val sqlContext = new org.apache.spark.sql.SQLContext(sc) val bedFile = sc.textFile(s3n://a/b/c,40) val hgfasta = sc.textFile(hdfs://a/b/c,40) val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val filtered = hgPair.filter(kv = kv._2 == 1) val bedPair = bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val joinRDD = bedPair.join(filtered) val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} ty.registerTempTable(KmerIntesect) ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet) } }
Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems
Even if H2o and ADA are dependent on 1.2.1 , it should be backword compatible, right? So using 1.3 should not break them. And the code is not using the classes from those libs. I tried sbt clean compile .. same errror Thanks _R On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com wrote: What version of Spark do the other dependencies rely on (Adam and H2O?) - that could be it Or try sbt clean compile — Sent from Mailbox https://www.dropbox.com/mailbox On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote: I have a EC2 cluster created using spark version 1.2.1. And I have a SBT project . Now I want to upgrade to spark 1.3 and use the new features. Below are issues . Sorry for the long post. Appreciate your help. Thanks -Roni Question - Do I have to create a new cluster using spark 1.3? Here is what I did - In my SBT file I changed to - libraryDependencies += org.apache.spark %% spark-core % 1.3.0 But then I started getting compilation error. along with Here are some of the libraries that were evicted: [warn] * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0 [warn] * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0 [warn] Run 'evicted' to see detailed eviction warnings constructor cannot be instantiated to expected type; [error] found : (T1, T2) [error] required: org.apache.spark.sql.catalyst.expressions.Row [error] val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} [error] ^ Here is my SBT and code -- SBT - version := 1.0 scalaVersion := 2.10.4 resolvers += Sonatype OSS Snapshots at https://oss.sonatype.org/content/repositories/snapshots;; resolvers += Maven Repo1 at https://repo1.maven.org/maven2;; resolvers += Maven Repo at https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;; /* Dependencies - %% appends Scala version to artifactId */ libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0 libraryDependencies += org.apache.spark %% spark-core % 1.3.0 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10 CODE -- import org.apache.spark.{SparkConf, SparkContext} case class KmerIntesect(kmer: String, kCount: Int, fileName: String) object preDefKmerIntersection { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(preDefKmer-intersect) val sc = new SparkContext(sparkConf) import sqlContext.createSchemaRDD val sqlContext = new org.apache.spark.sql.SQLContext(sc) val bedFile = sc.textFile(s3n://a/b/c,40) val hgfasta = sc.textFile(hdfs://a/b/c,40) val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val filtered = hgPair.filter(kv = kv._2 == 1) val bedPair = bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val joinRDD = bedPair.join(filtered) val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} ty.registerTempTable(KmerIntesect) ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet) } }
Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems
Ah I see now you are trying to use a spark 1.2 cluster - you will need to be running spark 1.3 on your EC2 cluster in order to run programs built against spark 1.3. You will need to terminate and restart your cluster with spark 1.3 — Sent from Mailbox On Wed, Mar 25, 2015 at 6:39 PM, roni roni.epi...@gmail.com wrote: Even if H2o and ADA are dependent on 1.2.1 , it should be backword compatible, right? So using 1.3 should not break them. And the code is not using the classes from those libs. I tried sbt clean compile .. same errror Thanks _R On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com wrote: What version of Spark do the other dependencies rely on (Adam and H2O?) - that could be it Or try sbt clean compile — Sent from Mailbox https://www.dropbox.com/mailbox On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote: I have a EC2 cluster created using spark version 1.2.1. And I have a SBT project . Now I want to upgrade to spark 1.3 and use the new features. Below are issues . Sorry for the long post. Appreciate your help. Thanks -Roni Question - Do I have to create a new cluster using spark 1.3? Here is what I did - In my SBT file I changed to - libraryDependencies += org.apache.spark %% spark-core % 1.3.0 But then I started getting compilation error. along with Here are some of the libraries that were evicted: [warn] * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0 [warn] * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0 [warn] Run 'evicted' to see detailed eviction warnings constructor cannot be instantiated to expected type; [error] found : (T1, T2) [error] required: org.apache.spark.sql.catalyst.expressions.Row [error] val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} [error] ^ Here is my SBT and code -- SBT - version := 1.0 scalaVersion := 2.10.4 resolvers += Sonatype OSS Snapshots at https://oss.sonatype.org/content/repositories/snapshots;; resolvers += Maven Repo1 at https://repo1.maven.org/maven2;; resolvers += Maven Repo at https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;; /* Dependencies - %% appends Scala version to artifactId */ libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0 libraryDependencies += org.apache.spark %% spark-core % 1.3.0 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10 CODE -- import org.apache.spark.{SparkConf, SparkContext} case class KmerIntesect(kmer: String, kCount: Int, fileName: String) object preDefKmerIntersection { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(preDefKmer-intersect) val sc = new SparkContext(sparkConf) import sqlContext.createSchemaRDD val sqlContext = new org.apache.spark.sql.SQLContext(sc) val bedFile = sc.textFile(s3n://a/b/c,40) val hgfasta = sc.textFile(hdfs://a/b/c,40) val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val filtered = hgPair.filter(kv = kv._2 == 1) val bedPair = bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val joinRDD = bedPair.join(filtered) val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} ty.registerTempTable(KmerIntesect) ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet) } }
Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems
Thanks Dean and Nick. So, I removed the ADAM and H2o from my SBT as I was not using them. I got the code to compile - only for fail while running with - SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21 Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD$ at preDefKmerIntersection$.main(kmerIntersetion.scala:26) This line is where I do a JOIN operation. val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val filtered = hgPair.filter(kv = kv._2 == 1) val bedPair = bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) * val joinRDD = bedPair.join(filtered) * Any idea whats going on? I have data on the EC2 so I am avoiding creating the new cluster , but just upgrading and changing the code to use 1.3 and Spark SQL Thanks Roni On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com wrote: For the Spark SQL parts, 1.3 breaks backwards compatibility, because before 1.3, Spark SQL was considered experimental where API changes were allowed. So, H2O and ADA compatible with 1.2.X might not work with 1.3. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote: Even if H2o and ADA are dependent on 1.2.1 , it should be backword compatible, right? So using 1.3 should not break them. And the code is not using the classes from those libs. I tried sbt clean compile .. same errror Thanks _R On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com wrote: What version of Spark do the other dependencies rely on (Adam and H2O?) - that could be it Or try sbt clean compile — Sent from Mailbox https://www.dropbox.com/mailbox On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote: I have a EC2 cluster created using spark version 1.2.1. And I have a SBT project . Now I want to upgrade to spark 1.3 and use the new features. Below are issues . Sorry for the long post. Appreciate your help. Thanks -Roni Question - Do I have to create a new cluster using spark 1.3? Here is what I did - In my SBT file I changed to - libraryDependencies += org.apache.spark %% spark-core % 1.3.0 But then I started getting compilation error. along with Here are some of the libraries that were evicted: [warn] * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0 [warn] * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0 [warn] Run 'evicted' to see detailed eviction warnings constructor cannot be instantiated to expected type; [error] found : (T1, T2) [error] required: org.apache.spark.sql.catalyst.expressions.Row [error] val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} [error] ^ Here is my SBT and code -- SBT - version := 1.0 scalaVersion := 2.10.4 resolvers += Sonatype OSS Snapshots at https://oss.sonatype.org/content/repositories/snapshots;; resolvers += Maven Repo1 at https://repo1.maven.org/maven2;; resolvers += Maven Repo at https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;; /* Dependencies - %% appends Scala version to artifactId */ libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0 libraryDependencies += org.apache.spark %% spark-core % 1.3.0 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10 CODE -- import org.apache.spark.{SparkConf, SparkContext} case class KmerIntesect(kmer: String, kCount: Int, fileName: String) object preDefKmerIntersection { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(preDefKmer-intersect) val sc = new SparkContext(sparkConf) import sqlContext.createSchemaRDD val sqlContext = new org.apache.spark.sql.SQLContext(sc) val bedFile = sc.textFile(s3n://a/b/c,40) val hgfasta = sc.textFile(hdfs://a/b/c,40) val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val filtered = hgPair.filter(kv = kv._2 == 1) val bedPair = bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val joinRDD = bedPair.join(filtered) val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} ty.registerTempTable(KmerIntesect) ty.saveAsParquetFile(hdfs://x/y/z/kmerIntersect.parquet) } }
Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems
Weird. Are you running using SBT console? It should have the spark-core jar on the classpath. Similarly, spark-shell or spark-submit should work, but be sure you're using the same version of Spark when running as when compiling. Also, you might need to add spark-sql to your SBT dependencies, but that shouldn't be this issue. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote: Thanks Dean and Nick. So, I removed the ADAM and H2o from my SBT as I was not using them. I got the code to compile - only for fail while running with - SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21 Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD$ at preDefKmerIntersection$.main(kmerIntersetion.scala:26) This line is where I do a JOIN operation. val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val filtered = hgPair.filter(kv = kv._2 == 1) val bedPair = bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) * val joinRDD = bedPair.join(filtered) * Any idea whats going on? I have data on the EC2 so I am avoiding creating the new cluster , but just upgrading and changing the code to use 1.3 and Spark SQL Thanks Roni On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com wrote: For the Spark SQL parts, 1.3 breaks backwards compatibility, because before 1.3, Spark SQL was considered experimental where API changes were allowed. So, H2O and ADA compatible with 1.2.X might not work with 1.3. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote: Even if H2o and ADA are dependent on 1.2.1 , it should be backword compatible, right? So using 1.3 should not break them. And the code is not using the classes from those libs. I tried sbt clean compile .. same errror Thanks _R On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com wrote: What version of Spark do the other dependencies rely on (Adam and H2O?) - that could be it Or try sbt clean compile — Sent from Mailbox https://www.dropbox.com/mailbox On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote: I have a EC2 cluster created using spark version 1.2.1. And I have a SBT project . Now I want to upgrade to spark 1.3 and use the new features. Below are issues . Sorry for the long post. Appreciate your help. Thanks -Roni Question - Do I have to create a new cluster using spark 1.3? Here is what I did - In my SBT file I changed to - libraryDependencies += org.apache.spark %% spark-core % 1.3.0 But then I started getting compilation error. along with Here are some of the libraries that were evicted: [warn] * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0 [warn] * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0 [warn] Run 'evicted' to see detailed eviction warnings constructor cannot be instantiated to expected type; [error] found : (T1, T2) [error] required: org.apache.spark.sql.catalyst.expressions.Row [error] val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} [error] ^ Here is my SBT and code -- SBT - version := 1.0 scalaVersion := 2.10.4 resolvers += Sonatype OSS Snapshots at https://oss.sonatype.org/content/repositories/snapshots;; resolvers += Maven Repo1 at https://repo1.maven.org/maven2;; resolvers += Maven Repo at https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/;; /* Dependencies - %% appends Scala version to artifactId */ libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0 libraryDependencies += org.apache.spark %% spark-core % 1.3.0 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10 CODE -- import org.apache.spark.{SparkConf, SparkContext} case class KmerIntesect(kmer: String, kCount: Int, fileName: String) object preDefKmerIntersection { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(preDefKmer-intersect) val sc = new SparkContext(sparkConf) import sqlContext.createSchemaRDD val sqlContext = new org.apache.spark.sql.SQLContext(sc) val bedFile = sc.textFile(s3n://a/b/c,40) val hgfasta = sc.textFile(hdfs://a/b/c,40) val hgPair =
Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems
Yes, that's the problem. The RDD class exists in both binary jar files, but the signatures probably don't match. The bottom line, as always for tools like this, is that you can't mix versions. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 3:13 PM, roni roni.epi...@gmail.com wrote: My cluster is still on spark 1.2 and in SBT I am using 1.3. So probably it is compiling with 1.3 but running with 1.2 ? On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com wrote: Weird. Are you running using SBT console? It should have the spark-core jar on the classpath. Similarly, spark-shell or spark-submit should work, but be sure you're using the same version of Spark when running as when compiling. Also, you might need to add spark-sql to your SBT dependencies, but that shouldn't be this issue. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote: Thanks Dean and Nick. So, I removed the ADAM and H2o from my SBT as I was not using them. I got the code to compile - only for fail while running with - SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21 Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD$ at preDefKmerIntersection$.main(kmerIntersetion.scala:26) This line is where I do a JOIN operation. val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val filtered = hgPair.filter(kv = kv._2 == 1) val bedPair = bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) * val joinRDD = bedPair.join(filtered) * Any idea whats going on? I have data on the EC2 so I am avoiding creating the new cluster , but just upgrading and changing the code to use 1.3 and Spark SQL Thanks Roni On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com wrote: For the Spark SQL parts, 1.3 breaks backwards compatibility, because before 1.3, Spark SQL was considered experimental where API changes were allowed. So, H2O and ADA compatible with 1.2.X might not work with 1.3. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote: Even if H2o and ADA are dependent on 1.2.1 , it should be backword compatible, right? So using 1.3 should not break them. And the code is not using the classes from those libs. I tried sbt clean compile .. same errror Thanks _R On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com wrote: What version of Spark do the other dependencies rely on (Adam and H2O?) - that could be it Or try sbt clean compile — Sent from Mailbox https://www.dropbox.com/mailbox On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote: I have a EC2 cluster created using spark version 1.2.1. And I have a SBT project . Now I want to upgrade to spark 1.3 and use the new features. Below are issues . Sorry for the long post. Appreciate your help. Thanks -Roni Question - Do I have to create a new cluster using spark 1.3? Here is what I did - In my SBT file I changed to - libraryDependencies += org.apache.spark %% spark-core % 1.3.0 But then I started getting compilation error. along with Here are some of the libraries that were evicted: [warn] * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0 [warn] * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0 [warn] Run 'evicted' to see detailed eviction warnings constructor cannot be instantiated to expected type; [error] found : (T1, T2) [error] required: org.apache.spark.sql.catalyst.expressions.Row [error] val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} [error] ^ Here is my SBT and code -- SBT - version := 1.0 scalaVersion := 2.10.4 resolvers += Sonatype OSS Snapshots at https://oss.sonatype.org/content/repositories/snapshots;; resolvers += Maven Repo1 at https://repo1.maven.org/maven2;; resolvers += Maven Repo at https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/ ; /* Dependencies - %% appends Scala version to artifactId */ libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0 libraryDependencies += org.apache.spark %% spark-core %
Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems
My cluster is still on spark 1.2 and in SBT I am using 1.3. So probably it is compiling with 1.3 but running with 1.2 ? On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com wrote: Weird. Are you running using SBT console? It should have the spark-core jar on the classpath. Similarly, spark-shell or spark-submit should work, but be sure you're using the same version of Spark when running as when compiling. Also, you might need to add spark-sql to your SBT dependencies, but that shouldn't be this issue. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote: Thanks Dean and Nick. So, I removed the ADAM and H2o from my SBT as I was not using them. I got the code to compile - only for fail while running with - SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21 Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD$ at preDefKmerIntersection$.main(kmerIntersetion.scala:26) This line is where I do a JOIN operation. val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val filtered = hgPair.filter(kv = kv._2 == 1) val bedPair = bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) * val joinRDD = bedPair.join(filtered) * Any idea whats going on? I have data on the EC2 so I am avoiding creating the new cluster , but just upgrading and changing the code to use 1.3 and Spark SQL Thanks Roni On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com wrote: For the Spark SQL parts, 1.3 breaks backwards compatibility, because before 1.3, Spark SQL was considered experimental where API changes were allowed. So, H2O and ADA compatible with 1.2.X might not work with 1.3. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote: Even if H2o and ADA are dependent on 1.2.1 , it should be backword compatible, right? So using 1.3 should not break them. And the code is not using the classes from those libs. I tried sbt clean compile .. same errror Thanks _R On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com wrote: What version of Spark do the other dependencies rely on (Adam and H2O?) - that could be it Or try sbt clean compile — Sent from Mailbox https://www.dropbox.com/mailbox On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote: I have a EC2 cluster created using spark version 1.2.1. And I have a SBT project . Now I want to upgrade to spark 1.3 and use the new features. Below are issues . Sorry for the long post. Appreciate your help. Thanks -Roni Question - Do I have to create a new cluster using spark 1.3? Here is what I did - In my SBT file I changed to - libraryDependencies += org.apache.spark %% spark-core % 1.3.0 But then I started getting compilation error. along with Here are some of the libraries that were evicted: [warn] * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0 [warn] * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0 [warn] Run 'evicted' to see detailed eviction warnings constructor cannot be instantiated to expected type; [error] found : (T1, T2) [error] required: org.apache.spark.sql.catalyst.expressions.Row [error] val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} [error] ^ Here is my SBT and code -- SBT - version := 1.0 scalaVersion := 2.10.4 resolvers += Sonatype OSS Snapshots at https://oss.sonatype.org/content/repositories/snapshots;; resolvers += Maven Repo1 at https://repo1.maven.org/maven2;; resolvers += Maven Repo at https://s3.amazonaws.com/h2o-release/h2o-dev/master/1056/maven/repo/ ; /* Dependencies - %% appends Scala version to artifactId */ libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0 libraryDependencies += org.apache.spark %% spark-core % 1.3.0 libraryDependencies += org.bdgenomics.adam % adam-core % 0.16.0 libraryDependencies += ai.h2o % sparkling-water-core_2.10 % 0.2.10 CODE -- import org.apache.spark.{SparkConf, SparkContext} case class KmerIntesect(kmer: String, kCount: Int, fileName: String) object preDefKmerIntersection { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(preDefKmer-intersect) val sc = new SparkContext(sparkConf) import
Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems
Is there any way that I can install the new one and remove previous version. I installed spark 1.3 on my EC2 master and set teh spark home to the new one. But when I start teh spark-shell I get - java.lang.UnsatisfiedLinkError: org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method) Is There no way to upgrade without creating new cluster? Thanks Roni On Wed, Mar 25, 2015 at 1:18 PM, Dean Wampler deanwamp...@gmail.com wrote: Yes, that's the problem. The RDD class exists in both binary jar files, but the signatures probably don't match. The bottom line, as always for tools like this, is that you can't mix versions. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 3:13 PM, roni roni.epi...@gmail.com wrote: My cluster is still on spark 1.2 and in SBT I am using 1.3. So probably it is compiling with 1.3 but running with 1.2 ? On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com wrote: Weird. Are you running using SBT console? It should have the spark-core jar on the classpath. Similarly, spark-shell or spark-submit should work, but be sure you're using the same version of Spark when running as when compiling. Also, you might need to add spark-sql to your SBT dependencies, but that shouldn't be this issue. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote: Thanks Dean and Nick. So, I removed the ADAM and H2o from my SBT as I was not using them. I got the code to compile - only for fail while running with - SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21 Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD$ at preDefKmerIntersection$.main(kmerIntersetion.scala:26) This line is where I do a JOIN operation. val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val filtered = hgPair.filter(kv = kv._2 == 1) val bedPair = bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) * val joinRDD = bedPair.join(filtered) * Any idea whats going on? I have data on the EC2 so I am avoiding creating the new cluster , but just upgrading and changing the code to use 1.3 and Spark SQL Thanks Roni On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com wrote: For the Spark SQL parts, 1.3 breaks backwards compatibility, because before 1.3, Spark SQL was considered experimental where API changes were allowed. So, H2O and ADA compatible with 1.2.X might not work with 1.3. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote: Even if H2o and ADA are dependent on 1.2.1 , it should be backword compatible, right? So using 1.3 should not break them. And the code is not using the classes from those libs. I tried sbt clean compile .. same errror Thanks _R On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com wrote: What version of Spark do the other dependencies rely on (Adam and H2O?) - that could be it Or try sbt clean compile — Sent from Mailbox https://www.dropbox.com/mailbox On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote: I have a EC2 cluster created using spark version 1.2.1. And I have a SBT project . Now I want to upgrade to spark 1.3 and use the new features. Below are issues . Sorry for the long post. Appreciate your help. Thanks -Roni Question - Do I have to create a new cluster using spark 1.3? Here is what I did - In my SBT file I changed to - libraryDependencies += org.apache.spark %% spark-core % 1.3.0 But then I started getting compilation error. along with Here are some of the libraries that were evicted: [warn] * org.apache.spark:spark-core_2.10:1.2.0 - 1.3.0 [warn] * org.apache.hadoop:hadoop-client:(2.5.0-cdh5.2.0, 2.2.0) - 2.6.0 [warn] Run 'evicted' to see detailed eviction warnings constructor cannot be instantiated to expected type; [error] found : (T1, T2) [error] required: org.apache.spark.sql.catalyst.expressions.Row [error] val ty = joinRDD.map{case(word, (file1Counts, file2Counts)) = KmerIntesect(word, file1Counts,xyz)} [error]
Re: upgrade from spark 1.2.1 to 1.3 on EC2 cluster and problems
You could stop the running the processes and run the same processes using the new version, starting with the master and then the slaves. You would have to snoop around a bit to get the command-line arguments right, but it's doable. Use `ps -efw` to find the command-lines used. Be sure to rerun them as the same user. Or look at what the EC2 scripts do. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 4:54 PM, roni roni.epi...@gmail.com wrote: Is there any way that I can install the new one and remove previous version. I installed spark 1.3 on my EC2 master and set teh spark home to the new one. But when I start teh spark-shell I get - java.lang.UnsatisfiedLinkError: org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method) Is There no way to upgrade without creating new cluster? Thanks Roni On Wed, Mar 25, 2015 at 1:18 PM, Dean Wampler deanwamp...@gmail.com wrote: Yes, that's the problem. The RDD class exists in both binary jar files, but the signatures probably don't match. The bottom line, as always for tools like this, is that you can't mix versions. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 3:13 PM, roni roni.epi...@gmail.com wrote: My cluster is still on spark 1.2 and in SBT I am using 1.3. So probably it is compiling with 1.3 but running with 1.2 ? On Wed, Mar 25, 2015 at 12:34 PM, Dean Wampler deanwamp...@gmail.com wrote: Weird. Are you running using SBT console? It should have the spark-core jar on the classpath. Similarly, spark-shell or spark-submit should work, but be sure you're using the same version of Spark when running as when compiling. Also, you might need to add spark-sql to your SBT dependencies, but that shouldn't be this issue. Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 12:09 PM, roni roni.epi...@gmail.com wrote: Thanks Dean and Nick. So, I removed the ADAM and H2o from my SBT as I was not using them. I got the code to compile - only for fail while running with - SparkContext: Created broadcast 1 from textFile at kmerIntersetion.scala:21 Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD$ at preDefKmerIntersection$.main(kmerIntersetion.scala:26) This line is where I do a JOIN operation. val hgPair = hgfasta.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) val filtered = hgPair.filter(kv = kv._2 == 1) val bedPair = bedFile.map(_.split (,)).map(a= (a(0), a(1).trim().toInt)) * val joinRDD = bedPair.join(filtered) * Any idea whats going on? I have data on the EC2 so I am avoiding creating the new cluster , but just upgrading and changing the code to use 1.3 and Spark SQL Thanks Roni On Wed, Mar 25, 2015 at 9:50 AM, Dean Wampler deanwamp...@gmail.com wrote: For the Spark SQL parts, 1.3 breaks backwards compatibility, because before 1.3, Spark SQL was considered experimental where API changes were allowed. So, H2O and ADA compatible with 1.2.X might not work with 1.3. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Wed, Mar 25, 2015 at 9:39 AM, roni roni.epi...@gmail.com wrote: Even if H2o and ADA are dependent on 1.2.1 , it should be backword compatible, right? So using 1.3 should not break them. And the code is not using the classes from those libs. I tried sbt clean compile .. same errror Thanks _R On Wed, Mar 25, 2015 at 9:26 AM, Nick Pentreath nick.pentre...@gmail.com wrote: What version of Spark do the other dependencies rely on (Adam and H2O?) - that could be it Or try sbt clean compile — Sent from Mailbox https://www.dropbox.com/mailbox On Wed, Mar 25, 2015 at 5:58 PM, roni roni.epi...@gmail.com wrote: I have a EC2 cluster created using spark version 1.2.1. And I have a SBT project . Now I want to upgrade to spark 1.3 and use the new features. Below are issues . Sorry for the long post. Appreciate your help. Thanks -Roni Question - Do I have to create a new cluster using spark 1.3? Here is what I did - In my SBT file I changed to - libraryDependencies += org.apache.spark %%
Re: Upgrade to Spark 1.2.1 using Guava
Marcelo’s work-around works. So if you are using the itemsimilarity stuff, the CLI has a way to solve the class not found and I can point out how to do the equivalent if you are using the library API. Ping me if you care. On Feb 28, 2015, at 2:27 PM, Erlend Hamnaberg erl...@hamnaberg.net wrote: Yes. I ran into this problem with mahout snapshot and spark 1.2.0 not really trying to figure out why that was a problem, since there were already too many moving parts in my app. Obviously there is a classpath issue somewhere. /Erlend On 27 Feb 2015 22:30, Pat Ferrel p...@occamsmachete.com mailto:p...@occamsmachete.com wrote: @Erlend hah, we were trying to merge your PR and ran into this—small world. You actually compile the JavaSerializer source in your project? @Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, that didn’t seem to work? On Feb 27, 2015, at 1:23 PM, Erlend Hamnaberg erl...@hamnaberg.net mailto:erl...@hamnaberg.net wrote: Hi. I have had a simliar issue. I had to pull the JavaSerializer source into my own project, just so I got the classloading of this class under control. This must be a class loader issue with spark. -E On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com mailto:p...@occamsmachete.com wrote: I understand that I need to supply Guava to Spark. The HashBiMap is created in the client and broadcast to the workers. So it is needed in both. To achieve this there is a deps.jar with Guava (and Scopt but that is only for the client). Scopt is found so I know the jar is fine for the client. I pass in the deps.jar to the context creation code. I’ve checked the content of the jar and have verified that it is used at context creation time. I register the serializer as follows: class MyKryoRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) = { val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]() //kryo.addDefaultSerializer(h.getClass, new JavaSerializer()) log.info http://log.info/(\n\n\nRegister Serializer for + h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed get logged kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], new JavaSerializer()) } } The job proceeds up until the broadcast value, a HashBiMap, is deserialized, which is where I get the following error. Have I missed a step for deserialization of broadcast values? Odd that serialization happened but deserialization failed. I’m running on a standalone localhost-only cluster. 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 (TID 9, 192.168.0.2): java.io.IOException: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216) at
Re: Upgrade to Spark 1.2.1 using Guava
Maybe but any time the work around is to use spark-submit --conf spark.executor.extraClassPath=/guava.jar blah” that means that standalone apps must have hard coded paths that are honored on every worker. And as you know a lib is pretty much blocked from use of this version of Spark—hence the blocker severity. I could easily be wrong but userClassPathFirst doesn’t seem to be the issue. There is no class conflict. On Feb 27, 2015, at 7:13 PM, Sean Owen so...@cloudera.com wrote: This seems like a job for userClassPathFirst. Or could be. It's definitely an issue of visibility between where the serializer is and where the user class is. At the top you said Pat that you didn't try this, but why not? On Fri, Feb 27, 2015 at 10:11 PM, Pat Ferrel p...@occamsmachete.com wrote: I’ll try to find a Jira for it. I hope a fix is in 1.3 On Feb 27, 2015, at 1:59 PM, Pat Ferrel p...@occamsmachete.com wrote: Thanks! that worked. On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote: I don’t use spark-submit I have a standalone app. So I guess you want me to add that key/value to the conf in my code and make sure it exists on workers. On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote: On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote: I changed in the spark master conf, which is also the only worker. I added a path to the jar that has guava in it. Still can’t find the class. Sorry, I'm still confused about what config you're changing. I'm suggesting using: spark-submit --conf spark.executor.extraClassPath=/guava.jar blah -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Upgrade to Spark 1.2.1 using Guava
Yes. I ran into this problem with mahout snapshot and spark 1.2.0 not really trying to figure out why that was a problem, since there were already too many moving parts in my app. Obviously there is a classpath issue somewhere. /Erlend On 27 Feb 2015 22:30, Pat Ferrel p...@occamsmachete.com wrote: @Erlend hah, we were trying to merge your PR and ran into this—small world. You actually compile the JavaSerializer source in your project? @Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, that didn’t seem to work? On Feb 27, 2015, at 1:23 PM, Erlend Hamnaberg erl...@hamnaberg.net wrote: Hi. I have had a simliar issue. I had to pull the JavaSerializer source into my own project, just so I got the classloading of this class under control. This must be a class loader issue with spark. -E On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com wrote: I understand that I need to supply Guava to Spark. The HashBiMap is created in the client and broadcast to the workers. So it is needed in both. To achieve this there is a deps.jar with Guava (and Scopt but that is only for the client). Scopt is found so I know the jar is fine for the client. I pass in the deps.jar to the context creation code. I’ve checked the content of the jar and have verified that it is used at context creation time. I register the serializer as follows: class MyKryoRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) = { val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]() //kryo.addDefaultSerializer(h.getClass, new JavaSerializer()) log.info(\n\n\nRegister Serializer for + h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed get logged kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], new JavaSerializer()) } } The job proceeds up until the broadcast value, a HashBiMap, is deserialized, which is where I get the following error. Have I missed a step for deserialization of broadcast values? Odd that serialization happened but deserialization failed. I’m running on a standalone localhost-only cluster. 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 (TID 9, 192.168.0.2): java.io.IOException: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090) ... 19 more root eror == Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
Re: Upgrade to Spark 1.2.1 using Guava
This seems like a job for userClassPathFirst. Or could be. It's definitely an issue of visibility between where the serializer is and where the user class is. At the top you said Pat that you didn't try this, but why not? On Fri, Feb 27, 2015 at 10:11 PM, Pat Ferrel p...@occamsmachete.com wrote: I’ll try to find a Jira for it. I hope a fix is in 1.3 On Feb 27, 2015, at 1:59 PM, Pat Ferrel p...@occamsmachete.com wrote: Thanks! that worked. On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote: I don’t use spark-submit I have a standalone app. So I guess you want me to add that key/value to the conf in my code and make sure it exists on workers. On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote: On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote: I changed in the spark master conf, which is also the only worker. I added a path to the jar that has guava in it. Still can’t find the class. Sorry, I'm still confused about what config you're changing. I'm suggesting using: spark-submit --conf spark.executor.extraClassPath=/guava.jar blah -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Upgrade to Spark 1.2.1 using Guava
I understand that I need to supply Guava to Spark. The HashBiMap is created in the client and broadcast to the workers. So it is needed in both. To achieve this there is a deps.jar with Guava (and Scopt but that is only for the client). Scopt is found so I know the jar is fine for the client. I pass in the deps.jar to the context creation code. I’ve checked the content of the jar and have verified that it is used at context creation time. I register the serializer as follows: class MyKryoRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) = { val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]() //kryo.addDefaultSerializer(h.getClass, new JavaSerializer()) log.info(\n\n\nRegister Serializer for + h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed get logged kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], new JavaSerializer()) } } The job proceeds up until the broadcast value, a HashBiMap, is deserialized, which is where I get the following error. Have I missed a step for deserialization of broadcast values? Odd that serialization happened but deserialization failed. I’m running on a standalone localhost-only cluster. 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 (TID 9, 192.168.0.2): java.io.IOException: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090) ... 19 more root eror == Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ... On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote: Guava is not in Spark. (Well, long version: it's in Spark but it's relocated to a different package except for some special classes leaked through the public API.) If your app needs Guava, it needs to package Guava with it (e.g. by using maven-shade-plugin, or using --jars if only executors use Guava). On Wed, Feb 25, 2015 at 5:17 PM, Pat Ferrel p...@occamsmachete.com wrote: The root Spark pom has guava set at a certain version number. It’s very hard to read the shading xml. Someone suggested that I try using userClassPathFirst but that sounds too heavy handed since I don’t really care which version of guava I get, not picky. When I set my project to use the same version as Spark I get a missing classdef, which usually means a version conflict. At this point I am quite confused about what is
Re: Upgrade to Spark 1.2.1 using Guava
Ah, I see. That makes a lot of sense now. You might be running into some weird class loader visibility issue. I've seen some bugs in jira about this in the past, maybe you're hitting one of them. Until I have some time to investigate (of if you're curious feel free to scavenge jira), a workaround could be to manually copy the guava jar to your executor nodes, and add them to the executor's class path manually (spark.executor.extraClassPath). That will place your guava in the Spark classloader (vs. your app's class loader when using --jars), and things should work. On Fri, Feb 27, 2015 at 11:52 AM, Pat Ferrel p...@occamsmachete.com wrote: I understand that I need to supply Guava to Spark. The HashBiMap is created in the client and broadcast to the workers. So it is needed in both. To achieve this there is a deps.jar with Guava (and Scopt but that is only for the client). Scopt is found so I know the jar is fine for the client. I pass in the deps.jar to the context creation code. I’ve checked the content of the jar and have verified that it is used at context creation time. I register the serializer as follows: class MyKryoRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) = { val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]() //kryo.addDefaultSerializer(h.getClass, new JavaSerializer()) log.info(\n\n\nRegister Serializer for + h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed get logged kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], new JavaSerializer()) } } The job proceeds up until the broadcast value, a HashBiMap, is deserialized, which is where I get the following error. Have I missed a step for deserialization of broadcast values? Odd that serialization happened but deserialization failed. I’m running on a standalone localhost-only cluster. 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 (TID 9, 192.168.0.2): java.io.IOException: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090) ... 19 more root eror == Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ... On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote: Guava is not in Spark. (Well, long version: it's in Spark but it's relocated to a
Re: Upgrade to Spark 1.2.1 using Guava
Hi. I have had a simliar issue. I had to pull the JavaSerializer source into my own project, just so I got the classloading of this class under control. This must be a class loader issue with spark. -E On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com wrote: I understand that I need to supply Guava to Spark. The HashBiMap is created in the client and broadcast to the workers. So it is needed in both. To achieve this there is a deps.jar with Guava (and Scopt but that is only for the client). Scopt is found so I know the jar is fine for the client. I pass in the deps.jar to the context creation code. I’ve checked the content of the jar and have verified that it is used at context creation time. I register the serializer as follows: class MyKryoRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) = { val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]() //kryo.addDefaultSerializer(h.getClass, new JavaSerializer()) log.info(\n\n\nRegister Serializer for + h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed get logged kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], new JavaSerializer()) } } The job proceeds up until the broadcast value, a HashBiMap, is deserialized, which is where I get the following error. Have I missed a step for deserialization of broadcast values? Odd that serialization happened but deserialization failed. I’m running on a standalone localhost-only cluster. 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 (TID 9, 192.168.0.2): java.io.IOException: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090) ... 19 more root eror == Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ... On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote: Guava is not in Spark. (Well, long version: it's in Spark but it's relocated to a different package except for some special classes leaked through the public API.) If your app needs Guava, it needs to package Guava with it (e.g. by using maven-shade-plugin, or using --jars if only executors use Guava). On Wed, Feb 25, 2015 at 5:17 PM, Pat Ferrel p...@occamsmachete.com wrote: The root Spark pom has guava set at a certain version number. It’s very hard to read the
Re: Upgrade to Spark 1.2.1 using Guava
On Fri, Feb 27, 2015 at 1:30 PM, Pat Ferrel p...@occamsmachete.com wrote: @Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, that didn’t seem to work? That's an app configuration, not a worker configuration, so if you're trying to set it on the worker configuration it will definitely not work. -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Upgrade to Spark 1.2.1 using Guava
I don’t use spark-submit I have a standalone app. So I guess you want me to add that key/value to the conf in my code and make sure it exists on workers. On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote: On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote: I changed in the spark master conf, which is also the only worker. I added a path to the jar that has guava in it. Still can’t find the class. Sorry, I'm still confused about what config you're changing. I'm suggesting using: spark-submit --conf spark.executor.extraClassPath=/guava.jar blah -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Upgrade to Spark 1.2.1 using Guava
@Erlend hah, we were trying to merge your PR and ran into this—small world. You actually compile the JavaSerializer source in your project? @Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, that didn’t seem to work? On Feb 27, 2015, at 1:23 PM, Erlend Hamnaberg erl...@hamnaberg.net wrote: Hi. I have had a simliar issue. I had to pull the JavaSerializer source into my own project, just so I got the classloading of this class under control. This must be a class loader issue with spark. -E On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com mailto:p...@occamsmachete.com wrote: I understand that I need to supply Guava to Spark. The HashBiMap is created in the client and broadcast to the workers. So it is needed in both. To achieve this there is a deps.jar with Guava (and Scopt but that is only for the client). Scopt is found so I know the jar is fine for the client. I pass in the deps.jar to the context creation code. I’ve checked the content of the jar and have verified that it is used at context creation time. I register the serializer as follows: class MyKryoRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) = { val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]() //kryo.addDefaultSerializer(h.getClass, new JavaSerializer()) log.info http://log.info/(\n\n\nRegister Serializer for + h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed get logged kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], new JavaSerializer()) } } The job proceeds up until the broadcast value, a HashBiMap, is deserialized, which is where I get the following error. Have I missed a step for deserialization of broadcast values? Odd that serialization happened but deserialization failed. I’m running on a standalone localhost-only cluster. 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 (TID 9, 192.168.0.2): java.io.IOException: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95) at my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090) ... 19 more root eror == Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ... On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com mailto:van...@cloudera.com wrote: Guava is not in Spark. (Well, long version: it's in Spark but it's relocated to a different package except for some special classes leaked
Re: Upgrade to Spark 1.2.1 using Guava
On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote: I changed in the spark master conf, which is also the only worker. I added a path to the jar that has guava in it. Still can’t find the class. Sorry, I'm still confused about what config you're changing. I'm suggesting using: spark-submit --conf spark.executor.extraClassPath=/guava.jar blah -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Upgrade to Spark 1.2.1 using Guava
Thanks! that worked. On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote: I don’t use spark-submit I have a standalone app. So I guess you want me to add that key/value to the conf in my code and make sure it exists on workers. On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote: On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote: I changed in the spark master conf, which is also the only worker. I added a path to the jar that has guava in it. Still can’t find the class. Sorry, I'm still confused about what config you're changing. I'm suggesting using: spark-submit --conf spark.executor.extraClassPath=/guava.jar blah -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Upgrade to Spark 1.2.1 using Guava
I’ll try to find a Jira for it. I hope a fix is in 1.3 On Feb 27, 2015, at 1:59 PM, Pat Ferrel p...@occamsmachete.com wrote: Thanks! that worked. On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote: I don’t use spark-submit I have a standalone app. So I guess you want me to add that key/value to the conf in my code and make sure it exists on workers. On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote: On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote: I changed in the spark master conf, which is also the only worker. I added a path to the jar that has guava in it. Still can’t find the class. Sorry, I'm still confused about what config you're changing. I'm suggesting using: spark-submit --conf spark.executor.extraClassPath=/guava.jar blah -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Upgrade to Spark 1.2.1 using Guava
I changed in the spark master conf, which is also the only worker. I added a path to the jar that has guava in it. Still can’t find the class. Trying Erland’s idea next. On Feb 27, 2015, at 1:35 PM, Marcelo Vanzin van...@cloudera.com wrote: On Fri, Feb 27, 2015 at 1:30 PM, Pat Ferrel p...@occamsmachete.com wrote: @Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, that didn’t seem to work? That's an app configuration, not a worker configuration, so if you're trying to set it on the worker configuration it will definitely not work. -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: upgrade to Spark 1.2.1
Could this be caused by Spark using shaded Guava jar ? Cheers On Wed, Feb 25, 2015 at 3:26 PM, Pat Ferrel p...@occamsmachete.com wrote: Getting an error that confuses me. Running a largish app on a standalone cluster on my laptop. The app uses a guava HashBiMap as a broadcast value. With Spark 1.1.0 I simply registered the class and its serializer with kryo like this: kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], new JavaSerializer()) And all was well. I’ve also tried addSerializer instead of register. Now I get a class not found during deserialization. I checked the jar list used to create the context and found the jar that contains HashBiMap but get this error. Any ideas: 15/02/25 14:46:33 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 8, 192.168.0.2): java.io.IOException: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95) at org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: com.esotericsoftware.kryo.KryoException: Error during Java deserialization. at com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090) ... 19 more == root error Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:625) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:40) ... 24 more - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org