Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-19 Thread Romi Kuntsman
I have recently encountered a similar problem with Guava version collision
with Hadoop.

Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are
they staying in version 11, does anyone know?

*Romi Kuntsman*, *Big Data Engineer*
 http://www.totango.com

On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com
wrote:

 Hi Sean,

 I removed the hadoop dependencies from the app and ran it on the cluster.
 It gives a java.io.EOFException

 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with
 curMem=0, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in
 memory (estimated size 173.0 KB, free 1911.2 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with
 curMem=177166, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as
 bytes in memory (estimated size 24.9 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in
 memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
 broadcast_0_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile
 at AvroRelation.scala:45
 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1
 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at
 SparkPlan.scala:84
 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at
 SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at
 SparkPlan.scala:84)
 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List()
 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List()
 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at
 map at SparkPlan.scala:84), which has no missing parents
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with
 curMem=202668, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in
 memory (estimated size 4.8 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with
 curMem=207532, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as
 bytes in memory (estimated size 3.4 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in
 memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
 broadcast_1_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at
 DAGScheduler.scala:838
 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage
 0 (MappedRDD[6] at map at SparkPlan.scala:84)
 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
 10.100.5.109): java.io.EOFException
 at
 java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722)
 at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009)
 at
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
 at
 org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
 at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
 at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
 at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
 at
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
 at
 org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
 at
 org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
 at
 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
 at
 

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-19 Thread Romi Kuntsman
Actually there is already someone on Hadoop-Common-Dev taking care of
removing the old Guava dependency

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201501.mbox/browser
https://issues.apache.org/jira/browse/HADOOP-11470

*Romi Kuntsman*, *Big Data Engineer*
 http://www.totango.com

On Mon, Jan 19, 2015 at 4:03 PM, Romi Kuntsman r...@totango.com wrote:

 I have recently encountered a similar problem with Guava version collision
 with Hadoop.

 Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are
 they staying in version 11, does anyone know?

 *Romi Kuntsman*, *Big Data Engineer*
  http://www.totango.com

 On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com
 wrote:

 Hi Sean,

 I removed the hadoop dependencies from the app and ran it on the cluster.
 It gives a java.io.EOFException

 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with
 curMem=0, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in
 memory (estimated size 173.0 KB, free 1911.2 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with
 curMem=177166, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as
 bytes in memory (estimated size 24.9 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in
 memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
 broadcast_0_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile
 at AvroRelation.scala:45
 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1
 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at
 SparkPlan.scala:84
 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at
 SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at
 SparkPlan.scala:84)
 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List()
 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List()
 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at
 map at SparkPlan.scala:84), which has no missing parents
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with
 curMem=202668, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in
 memory (estimated size 4.8 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with
 curMem=207532, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as
 bytes in memory (estimated size 3.4 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in
 memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
 broadcast_1_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast
 at DAGScheduler.scala:838
 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from
 Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84)
 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0
 (TID 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0
 (TID 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
 10.100.5.109): java.io.EOFException
 at
 java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722)
 at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009)
 at
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
 at
 org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
 at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
 at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
 at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
 at
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
 at
 org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
 at
 org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
 at
 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
 at
 

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-19 Thread Ted Yu
Please see this thread:

http://search-hadoop.com/m/LgpTk2aVYgr/Hadoop+guava+upgradesubj=Re+Time+to+address+the+Guava+version+problem


 On Jan 19, 2015, at 6:03 AM, Romi Kuntsman r...@totango.com wrote:
 
 I have recently encountered a similar problem with Guava version collision 
 with Hadoop.
 
 Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they 
 staying in version 11, does anyone know?
 
 Romi Kuntsman, Big Data Engineer
 http://www.totango.com
 
 On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera niranda.per...@gmail.com 
 wrote:
 Hi Sean, 
 
 I removed the hadoop dependencies from the app and ran it on the cluster. It 
 gives a java.io.EOFException 
 
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with 
 curMem=0, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 173.0 KB, free 1911.2 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with 
 curMem=177166, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 24.9 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block 
 broadcast_0_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at 
 AvroRelation.scala:45
 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1
 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at 
 SparkPlan.scala:84
 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at 
 SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at 
 SparkPlan.scala:84)
 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List()
 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List()
 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map 
 at SparkPlan.scala:84), which has no missing parents
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with 
 curMem=202668, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in 
 memory (estimated size 4.8 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with 
 curMem=207532, maxMem=2004174766
 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
 in memory (estimated size 3.4 KB, free 1911.1 MB)
 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
 on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB)
 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block 
 broadcast_1_piece0
 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at 
 DAGScheduler.scala:838
 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
 (MappedRDD[6] at map at SparkPlan.scala:84)
 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 
 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 
 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 
 10.100.5.109): java.io.EOFException
 at 
 java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722)
 at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009)
 at 
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
 at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
 at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
 at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
 at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
 at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
 at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
 at 
 org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
 at 
 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
 at 

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Sean Owen
-dev

Guava was not downgraded to 11. That PR was not merged. It was part of a
discussion about, indeed, what to do about potential Guava version
conflicts. Spark uses Guava, but so does Hadoop, and so do user programs.

Spark uses 14.0.1 in fact:
https://github.com/apache/spark/blob/master/pom.xml#L330

This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava
11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as well.

Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a
lot of these problems are solved. As we've seen though, this one is tricky.

What's your Spark version? and what are you executing? what mode --
standalone, YARN? What Hadoop version?


On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com
wrote:

 Hi,

 I have been running a simple Spark app on a local spark cluster and I came
 across this error.

 Exception in thread main java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
 at org.apache.spark.util.collection.OpenHashSet.org
 $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
 at
 org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
 at
 org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
 at
 org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
 at
 org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
 at
 org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
 at
 org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
 at
 org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
 at
 org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
 at
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
 at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
 at
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
 at
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
 at
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
 at
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
 at
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
 at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
 at
 com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45)
 at
 com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44)
 at
 org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56)
 at
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 at
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 at
 org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 at
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
 at
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
 at
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
 at
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
 at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
 at
 org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)


 While looking into this I found out that Guava was downgraded to version
 11 in this PR.
 https://github.com/apache/spark/pull/1610

 In this PR OpenHashSet.scala:261 line hashInt has been changed to
 hashLong.
 But when I actually run my app,  java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashInt error occurs,
 which is understandable because hashInt is not available before Guava 12.

 So, I''m wondering why this occurs?

 Cheers
 --
 Niranda Perera




Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Niranda Perera
Hi Sean,

My mistake, Guava 11 dependency came from the hadoop-commons indeed.

I'm running the following simple app in spark 1.2.0 standalone local
cluster (2 workers) with Hadoop 1.2.1

public class AvroSparkTest {
public static void main(String[] args) throws Exception {
SparkConf sparkConf = new SparkConf()
.setMaster(spark://niranda-ThinkPad-T540p:7077)
//(local[2])
.setAppName(avro-spark-test);

JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
JavaSQLContext sqlContext = new JavaSQLContext(sparkContext);
JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext,

/home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro);
episodes.printSchema();
episodes.registerTempTable(avroTable);
ListRow result = sqlContext.sql(SELECT * FROM
avroTable).collect();

for (Row row : result) {
System.out.println(row.toString());
}
}
}

As you pointed out, this error occurs while adding the hadoop dependency.
this runs without a problem when the hadoop dependency is removed and the
master is set to local[].

Cheers

On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen so...@cloudera.com wrote:

 -dev

 Guava was not downgraded to 11. That PR was not merged. It was part of a
 discussion about, indeed, what to do about potential Guava version
 conflicts. Spark uses Guava, but so does Hadoop, and so do user programs.

 Spark uses 14.0.1 in fact:
 https://github.com/apache/spark/blob/master/pom.xml#L330

 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava
 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as
 well.

 Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a
 lot of these problems are solved. As we've seen though, this one is tricky.

 What's your Spark version? and what are you executing? what mode --
 standalone, YARN? What Hadoop version?


 On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com
 wrote:

 Hi,

 I have been running a simple Spark app on a local spark cluster and I
 came across this error.

 Exception in thread main java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
 at org.apache.spark.util.collection.OpenHashSet.org
 $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
 at
 org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
 at
 org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
 at
 org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
 at
 org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
 at
 org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
 at
 org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
 at
 org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
 at
 org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
 at
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
 at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
 at
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
 at
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
 at
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
 at
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
 at
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
 at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
 at
 com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45)
 at
 com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44)
 at
 org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56)
 at
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 at
 

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Sean Owen
Oh, are you actually bundling Hadoop in your app? that may be the problem.
If you're using stand-alone mode, why include Hadoop? In any event, Spark
and Hadoop are intended to be 'provided' dependencies in the app you send
to spark-submit.

On Tue, Jan 6, 2015 at 10:15 AM, Niranda Perera niranda.per...@gmail.com
wrote:

 Hi Sean,

 My mistake, Guava 11 dependency came from the hadoop-commons indeed.

 I'm running the following simple app in spark 1.2.0 standalone local
 cluster (2 workers) with Hadoop 1.2.1

 public class AvroSparkTest {
 public static void main(String[] args) throws Exception {
 SparkConf sparkConf = new SparkConf()
 .setMaster(spark://niranda-ThinkPad-T540p:7077)
 //(local[2])
 .setAppName(avro-spark-test);

 JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
 JavaSQLContext sqlContext = new JavaSQLContext(sparkContext);
 JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext,

 /home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro);
 episodes.printSchema();
 episodes.registerTempTable(avroTable);
 ListRow result = sqlContext.sql(SELECT * FROM
 avroTable).collect();

 for (Row row : result) {
 System.out.println(row.toString());
 }
 }
 }

 As you pointed out, this error occurs while adding the hadoop dependency.
 this runs without a problem when the hadoop dependency is removed and the
 master is set to local[].

 Cheers

 On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen so...@cloudera.com wrote:

 -dev

 Guava was not downgraded to 11. That PR was not merged. It was part of a
 discussion about, indeed, what to do about potential Guava version
 conflicts. Spark uses Guava, but so does Hadoop, and so do user programs.

 Spark uses 14.0.1 in fact:
 https://github.com/apache/spark/blob/master/pom.xml#L330

 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava
 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as
 well.

 Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a
 lot of these problems are solved. As we've seen though, this one is tricky.

 What's your Spark version? and what are you executing? what mode --
 standalone, YARN? What Hadoop version?


 On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com
 wrote:

 Hi,

 I have been running a simple Spark app on a local spark cluster and I
 came across this error.

 Exception in thread main java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
 at org.apache.spark.util.collection.OpenHashSet.org
 $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
 at
 org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
 at
 org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
 at
 org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
 at
 org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
 at
 org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
 at
 org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
 at
 org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
 at
 org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
 at
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
 at
 org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
 at
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
 at
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
 at
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
 at
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
 at
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
 at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
 at