Re: Guava 11 dependency issue in Spark 1.2.0
Please see this thread: http://search-hadoop.com/m/LgpTk2aVYgr/Hadoop+guava+upgrade&subj=Re+Time+to+address+the+Guava+version+problem > On Jan 19, 2015, at 6:03 AM, Romi Kuntsman wrote: > > I have recently encountered a similar problem with Guava version collision > with Hadoop. > > Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they > staying in version 11, does anyone know? > > Romi Kuntsman, Big Data Engineer > http://www.totango.com > >> On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera >> wrote: >> Hi Sean, >> >> I removed the hadoop dependencies from the app and ran it on the cluster. It >> gives a java.io.EOFException >> >> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with >> curMem=0, maxMem=2004174766 >> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in >> memory (estimated size 173.0 KB, free 1911.2 MB) >> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with >> curMem=177166, maxMem=2004174766 >> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes >> in memory (estimated size 24.9 KB, free 1911.1 MB) >> 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory >> on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB) >> 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block >> broadcast_0_piece0 >> 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at >> AvroRelation.scala:45 >> 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1 >> 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at >> SparkPlan.scala:84 >> 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at >> SparkPlan.scala:84) with 2 output partitions (allowLocal=false) >> 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at >> SparkPlan.scala:84) >> 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List() >> 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List() >> 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map >> at SparkPlan.scala:84), which has no missing parents >> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with >> curMem=202668, maxMem=2004174766 >> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in >> memory (estimated size 4.8 KB, free 1911.1 MB) >> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with >> curMem=207532, maxMem=2004174766 >> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes >> in memory (estimated size 3.4 KB, free 1911.1 MB) >> 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory >> on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB) >> 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block >> broadcast_1_piece0 >> 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at >> DAGScheduler.scala:838 >> 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 >> (MappedRDD[6] at map at SparkPlan.scala:84) >> 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks >> 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID >> 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) >> 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID >> 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) >> 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, >> 10.100.5.109): java.io.EOFException >> at >> java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722) >> at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009) >> at >> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) >> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) >> at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) >> at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) >> at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) >> at >> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) >> at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) >> at >> org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) >> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) >> at >> org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) >> at java.io.ObjectInputStream.readSerialData(Obje
Re: Guava 11 dependency issue in Spark 1.2.0
Actually there is already someone on Hadoop-Common-Dev taking care of removing the old Guava dependency http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201501.mbox/browser https://issues.apache.org/jira/browse/HADOOP-11470 *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Jan 19, 2015 at 4:03 PM, Romi Kuntsman wrote: > I have recently encountered a similar problem with Guava version collision > with Hadoop. > > Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are > they staying in version 11, does anyone know? > > *Romi Kuntsman*, *Big Data Engineer* > http://www.totango.com > > On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera > wrote: > >> Hi Sean, >> >> I removed the hadoop dependencies from the app and ran it on the cluster. >> It gives a java.io.EOFException >> >> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with >> curMem=0, maxMem=2004174766 >> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in >> memory (estimated size 173.0 KB, free 1911.2 MB) >> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with >> curMem=177166, maxMem=2004174766 >> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as >> bytes in memory (estimated size 24.9 KB, free 1911.1 MB) >> 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in >> memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB) >> 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block >> broadcast_0_piece0 >> 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile >> at AvroRelation.scala:45 >> 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1 >> 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at >> SparkPlan.scala:84 >> 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at >> SparkPlan.scala:84) with 2 output partitions (allowLocal=false) >> 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at >> SparkPlan.scala:84) >> 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List() >> 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List() >> 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at >> map at SparkPlan.scala:84), which has no missing parents >> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with >> curMem=202668, maxMem=2004174766 >> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in >> memory (estimated size 4.8 KB, free 1911.1 MB) >> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with >> curMem=207532, maxMem=2004174766 >> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as >> bytes in memory (estimated size 3.4 KB, free 1911.1 MB) >> 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in >> memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB) >> 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block >> broadcast_1_piece0 >> 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast >> at DAGScheduler.scala:838 >> 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from >> Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84) >> 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks >> 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 >> (TID 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) >> 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 >> (TID 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) >> 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, >> 10.100.5.109): java.io.EOFException >> at >> java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722) >> at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009) >> at >> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) >> at >> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) >> at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) >> at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) >> at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) >> at >> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) >> at >> org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) >> at >> org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) >> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) >> at >> org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >>
Re: Guava 11 dependency issue in Spark 1.2.0
I have recently encountered a similar problem with Guava version collision with Hadoop. Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they staying in version 11, does anyone know? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera wrote: > Hi Sean, > > I removed the hadoop dependencies from the app and ran it on the cluster. > It gives a java.io.EOFException > > 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with > curMem=0, maxMem=2004174766 > 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in > memory (estimated size 173.0 KB, free 1911.2 MB) > 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with > curMem=177166, maxMem=2004174766 > 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as > bytes in memory (estimated size 24.9 KB, free 1911.1 MB) > 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in > memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB) > 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block > broadcast_0_piece0 > 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile > at AvroRelation.scala:45 > 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1 > 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at > SparkPlan.scala:84 > 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at > SparkPlan.scala:84) with 2 output partitions (allowLocal=false) > 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at > SparkPlan.scala:84) > 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List() > 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List() > 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at > map at SparkPlan.scala:84), which has no missing parents > 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with > curMem=202668, maxMem=2004174766 > 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 4.8 KB, free 1911.1 MB) > 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with > curMem=207532, maxMem=2004174766 > 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as > bytes in memory (estimated size 3.4 KB, free 1911.1 MB) > 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in > memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB) > 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block > broadcast_1_piece0 > 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at > DAGScheduler.scala:838 > 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage > 0 (MappedRDD[6] at map at SparkPlan.scala:84) > 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks > 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID > 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) > 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID > 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) > 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, > 10.100.5.109): java.io.EOFException > at > java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722) > at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009) > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) > at > org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) > at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) > at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) > at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) > at > org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) > at > org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) > at > org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) > at > org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969) > at > java.io.ObjectInputStream.readSerialD
Re: Guava 11 dependency issue in Spark 1.2.0
Hi Sean, I removed the hadoop dependencies from the app and ran it on the cluster. It gives a java.io.EOFException 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with curMem=0, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 173.0 KB, free 1911.2 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with curMem=177166, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.9 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at AvroRelation.scala:45 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at SparkPlan.scala:84 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at SparkPlan.scala:84) with 2 output partitions (allowLocal=false) 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at SparkPlan.scala:84) 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List() 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List() 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84), which has no missing parents 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with curMem=202668, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with curMem=207532, maxMem=2004174766 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 1911.1 MB) 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB) 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84) 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes) 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 10.100.5.109): java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722) at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
Re: Guava 11 dependency issue in Spark 1.2.0
Oh, are you actually bundling Hadoop in your app? that may be the problem. If you're using stand-alone mode, why include Hadoop? In any event, Spark and Hadoop are intended to be 'provided' dependencies in the app you send to spark-submit. On Tue, Jan 6, 2015 at 10:15 AM, Niranda Perera wrote: > Hi Sean, > > My mistake, Guava 11 dependency came from the hadoop-commons indeed. > > I'm running the following simple app in spark 1.2.0 standalone local > cluster (2 workers) with Hadoop 1.2.1 > > public class AvroSparkTest { > public static void main(String[] args) throws Exception { > SparkConf sparkConf = new SparkConf() > .setMaster("spark://niranda-ThinkPad-T540p:7077") > //("local[2]") > .setAppName("avro-spark-test"); > > JavaSparkContext sparkContext = new JavaSparkContext(sparkConf); > JavaSQLContext sqlContext = new JavaSQLContext(sparkContext); > JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext, > > "/home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro"); > episodes.printSchema(); > episodes.registerTempTable("avroTable"); > List result = sqlContext.sql("SELECT * FROM > avroTable").collect(); > > for (Row row : result) { > System.out.println(row.toString()); > } > } > } > > As you pointed out, this error occurs while adding the hadoop dependency. > this runs without a problem when the hadoop dependency is removed and the > master is set to local[]. > > Cheers > > On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen wrote: > >> -dev >> >> Guava was not downgraded to 11. That PR was not merged. It was part of a >> discussion about, indeed, what to do about potential Guava version >> conflicts. Spark uses Guava, but so does Hadoop, and so do user programs. >> >> Spark uses 14.0.1 in fact: >> https://github.com/apache/spark/blob/master/pom.xml#L330 >> >> This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava >> 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as >> well. >> >> Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a >> lot of these problems are solved. As we've seen though, this one is tricky. >> >> What's your Spark version? and what are you executing? what mode -- >> standalone, YARN? What Hadoop version? >> >> >> On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera >> wrote: >> >>> Hi, >>> >>> I have been running a simple Spark app on a local spark cluster and I >>> came across this error. >>> >>> Exception in thread "main" java.lang.NoSuchMethodError: >>> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; >>> at org.apache.spark.util.collection.OpenHashSet.org >>> $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) >>> at >>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) >>> at >>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) >>> at >>> org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) >>> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) >>> at >>> org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) >>> at >>> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) >>> at >>> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) >>> at >>> org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) >>> at >>> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) >>> at >>> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) >>> at >>> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) >>> at >>> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) >>> at >>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136) >>> at >>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114) >>> at >>> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787) >>> at >>> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) >>> at >>> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992) >>> at >>> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98) >>> at >>> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:84) >>> at >>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) >>> at >>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) >>> at >>> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) >>> at org.apache.spark.SparkContext.bro
Re: Guava 11 dependency issue in Spark 1.2.0
Hi Sean, My mistake, Guava 11 dependency came from the hadoop-commons indeed. I'm running the following simple app in spark 1.2.0 standalone local cluster (2 workers) with Hadoop 1.2.1 public class AvroSparkTest { public static void main(String[] args) throws Exception { SparkConf sparkConf = new SparkConf() .setMaster("spark://niranda-ThinkPad-T540p:7077") //("local[2]") .setAppName("avro-spark-test"); JavaSparkContext sparkContext = new JavaSparkContext(sparkConf); JavaSQLContext sqlContext = new JavaSQLContext(sparkContext); JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext, "/home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro"); episodes.printSchema(); episodes.registerTempTable("avroTable"); List result = sqlContext.sql("SELECT * FROM avroTable").collect(); for (Row row : result) { System.out.println(row.toString()); } } } As you pointed out, this error occurs while adding the hadoop dependency. this runs without a problem when the hadoop dependency is removed and the master is set to local[]. Cheers On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen wrote: > -dev > > Guava was not downgraded to 11. That PR was not merged. It was part of a > discussion about, indeed, what to do about potential Guava version > conflicts. Spark uses Guava, but so does Hadoop, and so do user programs. > > Spark uses 14.0.1 in fact: > https://github.com/apache/spark/blob/master/pom.xml#L330 > > This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava > 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as > well. > > Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a > lot of these problems are solved. As we've seen though, this one is tricky. > > What's your Spark version? and what are you executing? what mode -- > standalone, YARN? What Hadoop version? > > > On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera > wrote: > >> Hi, >> >> I have been running a simple Spark app on a local spark cluster and I >> came across this error. >> >> Exception in thread "main" java.lang.NoSuchMethodError: >> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; >> at org.apache.spark.util.collection.OpenHashSet.org >> $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) >> at >> org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) >> at >> org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) >> at >> org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) >> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) >> at >> org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) >> at >> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) >> at >> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) >> at >> org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) >> at >> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) >> at >> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) >> at >> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) >> at >> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) >> at >> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136) >> at >> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114) >> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787) >> at >> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) >> at >> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992) >> at >> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98) >> at >> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:84) >> at >> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) >> at >> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) >> at >> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) >> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945) >> at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695) >> at >> com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45) >> at >> com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44) >> at >> org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56) >> at >> org.apache.spark.sql.catalyst.planning.QueryPl
Re: Guava 11 dependency issue in Spark 1.2.0
-dev Guava was not downgraded to 11. That PR was not merged. It was part of a discussion about, indeed, what to do about potential Guava version conflicts. Spark uses Guava, but so does Hadoop, and so do user programs. Spark uses 14.0.1 in fact: https://github.com/apache/spark/blob/master/pom.xml#L330 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as well. Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a lot of these problems are solved. As we've seen though, this one is tricky. What's your Spark version? and what are you executing? what mode -- standalone, YARN? What Hadoop version? On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera wrote: > Hi, > > I have been running a simple Spark app on a local spark cluster and I came > across this error. > > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; > at org.apache.spark.util.collection.OpenHashSet.org > $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) > at > org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) > at > org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) > at > org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at > org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) > at > org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) > at > org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) > at > org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) > at > org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) > at > org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) > at > org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) > at > org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) > at > org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136) > at > org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787) > at > org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) > at > org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992) > at > org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98) > at > org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:84) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) > at > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945) > at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695) > at > com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45) > at > com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44) > at > org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) > at > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) > at > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) > at > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) > at > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) > at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) > at > org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114) > > > While looking into this I found out that Guava was downgraded to version > 11 in this PR. > https://github.com/apache/spark/pull/1610 > > In this PR OpenHashSet.scala:261 line hashInt has been changed to > hashLong. > But when I actually run my app, "java.lang.NoSuchMethodError: > com.google.common.hash.HashFunction.hashInt" error occurs, > which is understandable because hashInt is not available before Guava 12. > > So, I''m wondering why this occurs? > > Che