Re: error: bad symbolic reference. A signature in SparkContext.class refers to term io in package org.apache.hadoop which is not available

2014-07-23 Thread Sean Owen
The issue is that you don't have Hadoop classes in your compiler
classpath. In the first example, you are getting Hadoop classes from
the Spark assembly, which packages everything together.

In the second example, you are referencing Spark .jars as deployed in
a Hadoop cluster. They no longer contain a copy of Hadoop classes. So
you would also need to add the Hadoop .jars in the cluster to your
classpath.

It may be much easier to manage this as a project with SBT or Maven
and let it sort out dependencies.

On Wed, Jul 23, 2014 at 6:01 PM, Sameer Tilak ssti...@live.com wrote:
 Hi everyone,
 I was using Spark1.0 from Apache site and I was able to compile my code
 successfully using:

 scalac -classpath
 /apps/software/secondstring/secondstring/dist/lib/secondstring-20140630.jar:/apps/software/spark-1.0.0-bin-hadoop1/lib/datanucleus-api-jdo-3.2.1.jar:/apps/software/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:spark-assembly-1.0.0-hadoop1.0.4.jar/datanucleus-core-3.2.2.jar
 ComputeScores.scala

 Last week I have moved to CDH 5.1 and I am trying to compile the same by
 doing the following. However, I am getting the following errors. Any help
 with this will be great!


 scalac -classpath
 /apps/software/secondstring/secondstring/dist/lib/secondstring-20140723.jar:/opt/cloudera/parcels/CDH/lib/spark/core/lib/spark-core_2.10-1.0.0-cdh5.1.0.jar:/opt/cloudera/parcels/CDH/lib/spark/lib/kryo-2.21.jar:/opt/cloudera/parcels/CDH/lib/hadoop/lib/commons-io-2.4.jar
 JaccardScore.scala


 JaccardScore.scala:37: error: bad symbolic reference. A signature in
 SparkContext.class refers to term io
 in package org.apache.hadoop which is not available.
 It may be completely missing from the current classpath, or the version on
 the classpath might be incompatible with the version used when compiling
 SparkContext.class.
   val mjc = new Jaccard() with Serializable
 ^
 JaccardScore.scala:39: error: bad symbolic reference. A signature in
 SparkContext.class refers to term io
 in package org.apache.hadoop which is not available.
 It may be completely missing from the current classpath, or the version on
 the classpath might be incompatible with the version used when compiling
 SparkContext.class.
   val conf = new
 SparkConf().setMaster(spark://pzxnvm2021:7077).setAppName(ApproxStrMatch)
  ^
 JaccardScore.scala:51: error: bad symbolic reference. A signature in
 SparkContext.class refers to term io
 in package org.apache.hadoop which is not available.
 It may be completely missing from the current classpath, or the version on
 the classpath might be incompatible with the version used when compiling
 SparkContext.class.
 var scorevector = destrdd.map(x = jc_.score(str1, new
 BasicStringWrapper(x)))


RE: error: bad symbolic reference. A signature in SparkContext.class refers to term io in package org.apache.hadoop which is not available

2014-07-23 Thread Sameer Tilak
Hi Sean,Thanks for the quick reply. I moved to an sbt-based build and I was 
able to build the project successfully. 
In my /apps/sameert/software/approxstrmatch I see the following:
jar -tf 
target/scala-2.10/approxstrmatch_2.10-1.0.jarMETA-INF/MANIFEST.MFapproxstrmatch/approxstrmatch/MyRegistrator.classapproxstrmatch/JaccardScore$$anonfun$calculateJaccardScore$1.classapproxstrmatch/JaccardScore$$anonfun$calculateAnotatedJaccardScore$1.classapproxstrmatch/JaccardScore$$anonfun$calculateSortedJaccardScore$1$$anonfun$4.classapproxstrmatch/JaccardScore$$anon$1.classapproxstrmatch/JaccardScore$$anonfun$calculateSortedJaccardScore$1$$anonfun$3.classapproxstrmatch/JaccardScore$$anonfun$calculateSortedJaccardScore$1.classapproxstrmatch/JaccardScore$$anonfun$calculateAnotatedJaccardScore$1$$anonfun$2.classapproxstrmatch/JaccardScore.classapproxstrmatch/JaccardScore$$anonfun$calculateSortedJaccardScore$1$$anonfun$5.classapproxstrmatch/JaccardScore$$anonfun$calculateJaccardScore$1$$anonfun$1.class

However, when I start my spark shell:
spark-shell --jars 
/apps/sameert/software/secondstring/secondstring/dist/lib/secondstring-20140723.jar
 
/apps/sameert/software/approxstrmatch/target/scala-2.10/approxstrmatch_2.10-1.0.jar
I type the following interactively, I get error, not sure what I am missing 
now. This used to work before.
val srcFile = 
sc.textFile(hdfs://ipaddr:8020/user/sameert/approxstrmatch/target-sentences.csv)val
 distFile = 
sc.textFile(hdfs://ipaddr:8020/user/sameert/approxstrmatch/sameer_sentence_filter.tsv)
val score = new approxstrmatch.JaccardScore()error: not found: value 
approxstrmatch

 From: so...@cloudera.com
 Date: Wed, 23 Jul 2014 18:11:34 +0100
 Subject: Re: error: bad symbolic reference. A signature in SparkContext.class 
 refers to term io in package org.apache.hadoop which is not available
 To: user@spark.apache.org
 
 The issue is that you don't have Hadoop classes in your compiler
 classpath. In the first example, you are getting Hadoop classes from
 the Spark assembly, which packages everything together.
 
 In the second example, you are referencing Spark .jars as deployed in
 a Hadoop cluster. They no longer contain a copy of Hadoop classes. So
 you would also need to add the Hadoop .jars in the cluster to your
 classpath.
 
 It may be much easier to manage this as a project with SBT or Maven
 and let it sort out dependencies.
 
 On Wed, Jul 23, 2014 at 6:01 PM, Sameer Tilak ssti...@live.com wrote:
  Hi everyone,
  I was using Spark1.0 from Apache site and I was able to compile my code
  successfully using:
 
  scalac -classpath
  /apps/software/secondstring/secondstring/dist/lib/secondstring-20140630.jar:/apps/software/spark-1.0.0-bin-hadoop1/lib/datanucleus-api-jdo-3.2.1.jar:/apps/software/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:spark-assembly-1.0.0-hadoop1.0.4.jar/datanucleus-core-3.2.2.jar
  ComputeScores.scala
 
  Last week I have moved to CDH 5.1 and I am trying to compile the same by
  doing the following. However, I am getting the following errors. Any help
  with this will be great!
 
 
  scalac -classpath
  /apps/software/secondstring/secondstring/dist/lib/secondstring-20140723.jar:/opt/cloudera/parcels/CDH/lib/spark/core/lib/spark-core_2.10-1.0.0-cdh5.1.0.jar:/opt/cloudera/parcels/CDH/lib/spark/lib/kryo-2.21.jar:/opt/cloudera/parcels/CDH/lib/hadoop/lib/commons-io-2.4.jar
  JaccardScore.scala
 
 
  JaccardScore.scala:37: error: bad symbolic reference. A signature in
  SparkContext.class refers to term io
  in package org.apache.hadoop which is not available.
  It may be completely missing from the current classpath, or the version on
  the classpath might be incompatible with the version used when compiling
  SparkContext.class.
val mjc = new Jaccard() with Serializable
  ^
  JaccardScore.scala:39: error: bad symbolic reference. A signature in
  SparkContext.class refers to term io
  in package org.apache.hadoop which is not available.
  It may be completely missing from the current classpath, or the version on
  the classpath might be incompatible with the version used when compiling
  SparkContext.class.
val conf = new
  SparkConf().setMaster(spark://pzxnvm2021:7077).setAppName(ApproxStrMatch)
   ^
  JaccardScore.scala:51: error: bad symbolic reference. A signature in
  SparkContext.class refers to term io
  in package org.apache.hadoop which is not available.
  It may be completely missing from the current classpath, or the version on
  the classpath might be incompatible with the version used when compiling
  SparkContext.class.
  var scorevector = destrdd.map(x = jc_.score(str1, new
  BasicStringWrapper(x)))
  

RE: error: bad symbolic reference. A signature in SparkContext.class refers to term io in package org.apache.hadoop which is not available

2014-07-23 Thread Sameer Tilak
I was able to resolve this, In my spark-shell command I forgot to add a comma 
in between two jar files. 

From: ssti...@live.com
To: user@spark.apache.org
Subject: RE: error: bad symbolic reference. A signature in SparkContext.class 
refers to term io in package org.apache.hadoop which is not available
Date: Wed, 23 Jul 2014 11:29:03 -0700




Hi Sean,Thanks for the quick reply. I moved to an sbt-based build and I was 
able to build the project successfully. 
In my /apps/sameert/software/approxstrmatch I see the following:
jar -tf 
target/scala-2.10/approxstrmatch_2.10-1.0.jarMETA-INF/MANIFEST.MFapproxstrmatch/approxstrmatch/MyRegistrator.classapproxstrmatch/JaccardScore$$anonfun$calculateJaccardScore$1.classapproxstrmatch/JaccardScore$$anonfun$calculateAnotatedJaccardScore$1.classapproxstrmatch/JaccardScore$$anonfun$calculateSortedJaccardScore$1$$anonfun$4.classapproxstrmatch/JaccardScore$$anon$1.classapproxstrmatch/JaccardScore$$anonfun$calculateSortedJaccardScore$1$$anonfun$3.classapproxstrmatch/JaccardScore$$anonfun$calculateSortedJaccardScore$1.classapproxstrmatch/JaccardScore$$anonfun$calculateAnotatedJaccardScore$1$$anonfun$2.classapproxstrmatch/JaccardScore.classapproxstrmatch/JaccardScore$$anonfun$calculateSortedJaccardScore$1$$anonfun$5.classapproxstrmatch/JaccardScore$$anonfun$calculateJaccardScore$1$$anonfun$1.class

However, when I start my spark shell:
spark-shell --jars 
/apps/sameert/software/secondstring/secondstring/dist/lib/secondstring-20140723.jar
 
/apps/sameert/software/approxstrmatch/target/scala-2.10/approxstrmatch_2.10-1.0.jar
I type the following interactively, I get error, not sure what I am missing 
now. This used to work before.
val srcFile = 
sc.textFile(hdfs://ipaddr:8020/user/sameert/approxstrmatch/target-sentences.csv)val
 distFile = 
sc.textFile(hdfs://ipaddr:8020/user/sameert/approxstrmatch/sameer_sentence_filter.tsv)
val score = new approxstrmatch.JaccardScore()error: not found: value 
approxstrmatch

 From: so...@cloudera.com
 Date: Wed, 23 Jul 2014 18:11:34 +0100
 Subject: Re: error: bad symbolic reference. A signature in SparkContext.class 
 refers to term io in package org.apache.hadoop which is not available
 To: user@spark.apache.org
 
 The issue is that you don't have Hadoop classes in your compiler
 classpath. In the first example, you are getting Hadoop classes from
 the Spark assembly, which packages everything together.
 
 In the second example, you are referencing Spark .jars as deployed in
 a Hadoop cluster. They no longer contain a copy of Hadoop classes. So
 you would also need to add the Hadoop .jars in the cluster to your
 classpath.
 
 It may be much easier to manage this as a project with SBT or Maven
 and let it sort out dependencies.
 
 On Wed, Jul 23, 2014 at 6:01 PM, Sameer Tilak ssti...@live.com wrote:
  Hi everyone,
  I was using Spark1.0 from Apache site and I was able to compile my code
  successfully using:
 
  scalac -classpath
  /apps/software/secondstring/secondstring/dist/lib/secondstring-20140630.jar:/apps/software/spark-1.0.0-bin-hadoop1/lib/datanucleus-api-jdo-3.2.1.jar:/apps/software/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:spark-assembly-1.0.0-hadoop1.0.4.jar/datanucleus-core-3.2.2.jar
  ComputeScores.scala
 
  Last week I have moved to CDH 5.1 and I am trying to compile the same by
  doing the following. However, I am getting the following errors. Any help
  with this will be great!
 
 
  scalac -classpath
  /apps/software/secondstring/secondstring/dist/lib/secondstring-20140723.jar:/opt/cloudera/parcels/CDH/lib/spark/core/lib/spark-core_2.10-1.0.0-cdh5.1.0.jar:/opt/cloudera/parcels/CDH/lib/spark/lib/kryo-2.21.jar:/opt/cloudera/parcels/CDH/lib/hadoop/lib/commons-io-2.4.jar
  JaccardScore.scala
 
 
  JaccardScore.scala:37: error: bad symbolic reference. A signature in
  SparkContext.class refers to term io
  in package org.apache.hadoop which is not available.
  It may be completely missing from the current classpath, or the version on
  the classpath might be incompatible with the version used when compiling
  SparkContext.class.
val mjc = new Jaccard() with Serializable
  ^
  JaccardScore.scala:39: error: bad symbolic reference. A signature in
  SparkContext.class refers to term io
  in package org.apache.hadoop which is not available.
  It may be completely missing from the current classpath, or the version on
  the classpath might be incompatible with the version used when compiling
  SparkContext.class.
val conf = new
  SparkConf().setMaster(spark://pzxnvm2021:7077).setAppName(ApproxStrMatch)
   ^
  JaccardScore.scala:51: error: bad symbolic reference. A signature in
  SparkContext.class refers to term io
  in package org.apache.hadoop which is not available.
  It may be completely missing from the current classpath, or the version on
  the classpath might be incompatible with the version used when compiling
  SparkContext.class.
  var scorevector = destrdd.map(x =