Re: Guava 11 dependency issue in Spark 1.2.0
I have recently encountered a similar problem with Guava version collision with Hadoop. Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they staying in version 11, does anyone know? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi Sean, I removed the hadoop dependencies from the app and ran it on the cluster. It gives a java.io.EOFException 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with curMem=0, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.0 KB, free 1911.2 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with curMem=177166, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.9 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at AvroRelation.scala:45 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at SparkPlan.scala:84 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at SparkPlan.scala:84) with 2 output partitions (allowLocal=false) 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at SparkPlan.scala:84) 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List() 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List() 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84), which has no missing parents 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with curMem=202668, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with curMem=207532, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84) 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 10.100.5.109): java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722) at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at
Re: Guava 11 dependency issue in Spark 1.2.0
Actually there is already someone on Hadoop-Common-Dev taking care of removing the old Guava dependency http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201501.mbox/browser https://issues.apache.org/jira/browse/HADOOP-11470 *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Jan 19, 2015 at 4:03 PM, Romi Kuntsman r...@totango.com wrote: I have recently encountered a similar problem with Guava version collision with Hadoop. Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they staying in version 11, does anyone know? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi Sean, I removed the hadoop dependencies from the app and ran it on the cluster. It gives a java.io.EOFException 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with curMem=0, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.0 KB, free 1911.2 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with curMem=177166, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.9 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at AvroRelation.scala:45 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at SparkPlan.scala:84 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at SparkPlan.scala:84) with 2 output partitions (allowLocal=false) 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at SparkPlan.scala:84) 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List() 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List() 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84), which has no missing parents 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with curMem=202668, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with curMem=207532, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84) 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 10.100.5.109): java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722) at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at
Re: Guava 11 dependency issue in Spark 1.2.0
Please see this thread: http://search-hadoop.com/m/LgpTk2aVYgr/Hadoop+guava+upgradesubj=Re+Time+to+address+the+Guava+version+problem On Jan 19, 2015, at 6:03 AM, Romi Kuntsman r...@totango.com wrote: I have recently encountered a similar problem with Guava version collision with Hadoop. Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they staying in version 11, does anyone know? Romi Kuntsman, Big Data Engineer http://www.totango.com On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi Sean, I removed the hadoop dependencies from the app and ran it on the cluster. It gives a java.io.EOFException 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with curMem=0, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.0 KB, free 1911.2 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with curMem=177166, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.9 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at AvroRelation.scala:45 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at SparkPlan.scala:84 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at SparkPlan.scala:84) with 2 output partitions (allowLocal=false) 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at SparkPlan.scala:84) 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List() 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List() 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84), which has no missing parents 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with curMem=202668, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with curMem=207532, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84) 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 10.100.5.109): java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722) at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775) at
Re: Guava 11 dependency issue in Spark 1.2.0
-dev Guava was not downgraded to 11. That PR was not merged. It was part of a discussion about, indeed, what to do about potential Guava version conflicts. Spark uses Guava, but so does Hadoop, and so do user programs. Spark uses 14.0.1 in fact: https://github.com/apache/spark/blob/master/pom.xml#L330 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as well. Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a lot of these problems are solved. As we've seen though, this one is tricky. What's your Spark version? and what are you executing? what mode -- standalone, YARN? What Hadoop version? On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi, I have been running a simple Spark app on a local spark cluster and I came across this error. Exception in thread main java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) at org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) at org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) at org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695) at com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45) at com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44) at org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114) While looking into this I found out that Guava was downgraded to version 11 in this PR. https://github.com/apache/spark/pull/1610 In this PR OpenHashSet.scala:261 line hashInt has been changed to hashLong. But when I actually run my app, java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt error occurs, which is understandable because hashInt is not available before Guava 12. So, I''m wondering why this occurs? Cheers -- Niranda Perera
Re: Guava 11 dependency issue in Spark 1.2.0
Hi Sean, My mistake, Guava 11 dependency came from the hadoop-commons indeed. I'm running the following simple app in spark 1.2.0 standalone local cluster (2 workers) with Hadoop 1.2.1 public class AvroSparkTest { public static void main(String[] args) throws Exception { SparkConf sparkConf = new SparkConf() .setMaster(spark://niranda-ThinkPad-T540p:7077) //(local[2]) .setAppName(avro-spark-test); JavaSparkContext sparkContext = new JavaSparkContext(sparkConf); JavaSQLContext sqlContext = new JavaSQLContext(sparkContext); JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext, /home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro); episodes.printSchema(); episodes.registerTempTable(avroTable); ListRow result = sqlContext.sql(SELECT * FROM avroTable).collect(); for (Row row : result) { System.out.println(row.toString()); } } } As you pointed out, this error occurs while adding the hadoop dependency. this runs without a problem when the hadoop dependency is removed and the master is set to local[]. Cheers On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen so...@cloudera.com wrote: -dev Guava was not downgraded to 11. That PR was not merged. It was part of a discussion about, indeed, what to do about potential Guava version conflicts. Spark uses Guava, but so does Hadoop, and so do user programs. Spark uses 14.0.1 in fact: https://github.com/apache/spark/blob/master/pom.xml#L330 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as well. Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a lot of these problems are solved. As we've seen though, this one is tricky. What's your Spark version? and what are you executing? what mode -- standalone, YARN? What Hadoop version? On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi, I have been running a simple Spark app on a local spark cluster and I came across this error. Exception in thread main java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) at org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) at org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) at org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695) at com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45) at com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44) at org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at
Re: Guava 11 dependency issue in Spark 1.2.0
Oh, are you actually bundling Hadoop in your app? that may be the problem. If you're using stand-alone mode, why include Hadoop? In any event, Spark and Hadoop are intended to be 'provided' dependencies in the app you send to spark-submit. On Tue, Jan 6, 2015 at 10:15 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi Sean, My mistake, Guava 11 dependency came from the hadoop-commons indeed. I'm running the following simple app in spark 1.2.0 standalone local cluster (2 workers) with Hadoop 1.2.1 public class AvroSparkTest { public static void main(String[] args) throws Exception { SparkConf sparkConf = new SparkConf() .setMaster(spark://niranda-ThinkPad-T540p:7077) //(local[2]) .setAppName(avro-spark-test); JavaSparkContext sparkContext = new JavaSparkContext(sparkConf); JavaSQLContext sqlContext = new JavaSQLContext(sparkContext); JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext, /home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro); episodes.printSchema(); episodes.registerTempTable(avroTable); ListRow result = sqlContext.sql(SELECT * FROM avroTable).collect(); for (Row row : result) { System.out.println(row.toString()); } } } As you pointed out, this error occurs while adding the hadoop dependency. this runs without a problem when the hadoop dependency is removed and the master is set to local[]. Cheers On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen so...@cloudera.com wrote: -dev Guava was not downgraded to 11. That PR was not merged. It was part of a discussion about, indeed, what to do about potential Guava version conflicts. Spark uses Guava, but so does Hadoop, and so do user programs. Spark uses 14.0.1 in fact: https://github.com/apache/spark/blob/master/pom.xml#L330 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as well. Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a lot of these problems are solved. As we've seen though, this one is tricky. What's your Spark version? and what are you executing? what mode -- standalone, YARN? What Hadoop version? On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi, I have been running a simple Spark app on a local spark cluster and I came across this error. Exception in thread main java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) at org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) at org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) at org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136) at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695) at