Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-19 Thread Ted Yu
Please see this thread:

http://search-hadoop.com/m/LgpTk2aVYgr/Hadoop+guava+upgrade&subj=Re+Time+to+address+the+Guava+version+problem


> On Jan 19, 2015, at 6:03 AM, Romi Kuntsman  wrote:
> 
> I have recently encountered a similar problem with Guava version collision 
> with Hadoop.
> 
> Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are they 
> staying in version 11, does anyone know?
> 
> Romi Kuntsman, Big Data Engineer
> http://www.totango.com
> 
>> On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera  
>> wrote:
>> Hi Sean, 
>> 
>> I removed the hadoop dependencies from the app and ran it on the cluster. It 
>> gives a java.io.EOFException 
>> 
>> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with 
>> curMem=0, maxMem=2004174766
>> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in 
>> memory (estimated size 173.0 KB, free 1911.2 MB)
>> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with 
>> curMem=177166, maxMem=2004174766
>> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
>> in memory (estimated size 24.9 KB, free 1911.1 MB)
>> 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
>> on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB)
>> 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block 
>> broadcast_0_piece0
>> 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at 
>> AvroRelation.scala:45
>> 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1
>> 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at 
>> SparkPlan.scala:84
>> 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at 
>> SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
>> 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at 
>> SparkPlan.scala:84)
>> 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List()
>> 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List()
>> 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at map 
>> at SparkPlan.scala:84), which has no missing parents
>> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with 
>> curMem=202668, maxMem=2004174766
>> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in 
>> memory (estimated size 4.8 KB, free 1911.1 MB)
>> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with 
>> curMem=207532, maxMem=2004174766
>> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
>> in memory (estimated size 3.4 KB, free 1911.1 MB)
>> 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
>> on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB)
>> 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block 
>> broadcast_1_piece0
>> 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at 
>> DAGScheduler.scala:838
>> 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
>> (MappedRDD[6] at map at SparkPlan.scala:84)
>> 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
>> 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 
>> 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
>> 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 
>> 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
>> 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 
>> 10.100.5.109): java.io.EOFException
>> at 
>> java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722)
>> at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009)
>> at 
>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
>> at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
>> at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
>> at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
>> at 
>> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
>> at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
>> at 
>> org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
>> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
>> at 
>> org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
>> at java.io.ObjectInputStream.readSerialData(Obje

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-19 Thread Romi Kuntsman
Actually there is already someone on Hadoop-Common-Dev taking care of
removing the old Guava dependency

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201501.mbox/browser
https://issues.apache.org/jira/browse/HADOOP-11470

*Romi Kuntsman*, *Big Data Engineer*
 http://www.totango.com

On Mon, Jan 19, 2015 at 4:03 PM, Romi Kuntsman  wrote:

> I have recently encountered a similar problem with Guava version collision
> with Hadoop.
>
> Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are
> they staying in version 11, does anyone know?
>
> *Romi Kuntsman*, *Big Data Engineer*
>  http://www.totango.com
>
> On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera 
> wrote:
>
>> Hi Sean,
>>
>> I removed the hadoop dependencies from the app and ran it on the cluster.
>> It gives a java.io.EOFException
>>
>> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with
>> curMem=0, maxMem=2004174766
>> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in
>> memory (estimated size 173.0 KB, free 1911.2 MB)
>> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with
>> curMem=177166, maxMem=2004174766
>> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as
>> bytes in memory (estimated size 24.9 KB, free 1911.1 MB)
>> 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>> memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB)
>> 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
>> broadcast_0_piece0
>> 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile
>> at AvroRelation.scala:45
>> 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1
>> 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at
>> SparkPlan.scala:84
>> 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at
>> SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
>> 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at
>> SparkPlan.scala:84)
>> 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List()
>> 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List()
>> 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at
>> map at SparkPlan.scala:84), which has no missing parents
>> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with
>> curMem=202668, maxMem=2004174766
>> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in
>> memory (estimated size 4.8 KB, free 1911.1 MB)
>> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with
>> curMem=207532, maxMem=2004174766
>> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as
>> bytes in memory (estimated size 3.4 KB, free 1911.1 MB)
>> 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in
>> memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB)
>> 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
>> broadcast_1_piece0
>> 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast
>> at DAGScheduler.scala:838
>> 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from
>> Stage 0 (MappedRDD[6] at map at SparkPlan.scala:84)
>> 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
>> 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0
>> (TID 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
>> 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0
>> (TID 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
>> 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
>> 10.100.5.109): java.io.EOFException
>> at
>> java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722)
>> at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009)
>> at
>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
>> at
>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
>> at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
>> at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
>> at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
>> at
>> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
>> at
>> org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
>> at
>> org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
>> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
>> at
>> org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>>  

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-19 Thread Romi Kuntsman
I have recently encountered a similar problem with Guava version collision
with Hadoop.

Isn't it more correct to upgrade Hadoop to use the latest Guava? Why are
they staying in version 11, does anyone know?

*Romi Kuntsman*, *Big Data Engineer*
 http://www.totango.com

On Wed, Jan 7, 2015 at 7:59 AM, Niranda Perera 
wrote:

> Hi Sean,
>
> I removed the hadoop dependencies from the app and ran it on the cluster.
> It gives a java.io.EOFException
>
> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with
> curMem=0, maxMem=2004174766
> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in
> memory (estimated size 173.0 KB, free 1911.2 MB)
> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with
> curMem=177166, maxMem=2004174766
> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as
> bytes in memory (estimated size 24.9 KB, free 1911.1 MB)
> 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in
> memory on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB)
> 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
> broadcast_0_piece0
> 15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile
> at AvroRelation.scala:45
> 15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1
> 15/01/07 11:19:29 INFO SparkContext: Starting job: collect at
> SparkPlan.scala:84
> 15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at
> SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
> 15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at
> SparkPlan.scala:84)
> 15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List()
> 15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List()
> 15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at
> map at SparkPlan.scala:84), which has no missing parents
> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with
> curMem=202668, maxMem=2004174766
> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in
> memory (estimated size 4.8 KB, free 1911.1 MB)
> 15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with
> curMem=207532, maxMem=2004174766
> 15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as
> bytes in memory (estimated size 3.4 KB, free 1911.1 MB)
> 15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in
> memory on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB)
> 15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
> broadcast_1_piece0
> 15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at
> DAGScheduler.scala:838
> 15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage
> 0 (MappedRDD[6] at map at SparkPlan.scala:84)
> 15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
> 15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
> 0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
> 15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
> 1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
> 15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
> 10.100.5.109): java.io.EOFException
> at
> java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722)
> at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009)
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
> at
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
> at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
> at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
> at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
> at
> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
> at
> org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
> at
> org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
> at
> org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969)
> at
> java.io.ObjectInputStream.readSerialD

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Niranda Perera
Hi Sean,

I removed the hadoop dependencies from the app and ran it on the cluster.
It gives a java.io.EOFException

15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(177166) called with
curMem=0, maxMem=2004174766
15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 173.0 KB, free 1911.2 MB)
15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(25502) called with
curMem=177166, maxMem=2004174766
15/01/07 11:19:29 INFO MemoryStore: Block broadcast_0_piece0 stored as
bytes in memory (estimated size 24.9 KB, free 1911.1 MB)
15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on 10.100.5.109:43924 (size: 24.9 KB, free: 1911.3 MB)
15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
broadcast_0_piece0
15/01/07 11:19:29 INFO SparkContext: Created broadcast 0 from hadoopFile at
AvroRelation.scala:45
15/01/07 11:19:29 INFO FileInputFormat: Total input paths to process : 1
15/01/07 11:19:29 INFO SparkContext: Starting job: collect at
SparkPlan.scala:84
15/01/07 11:19:29 INFO DAGScheduler: Got job 0 (collect at
SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
15/01/07 11:19:29 INFO DAGScheduler: Final stage: Stage 0(collect at
SparkPlan.scala:84)
15/01/07 11:19:29 INFO DAGScheduler: Parents of final stage: List()
15/01/07 11:19:29 INFO DAGScheduler: Missing parents: List()
15/01/07 11:19:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[6] at
map at SparkPlan.scala:84), which has no missing parents
15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(4864) called with
curMem=202668, maxMem=2004174766
15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 4.8 KB, free 1911.1 MB)
15/01/07 11:19:29 INFO MemoryStore: ensureFreeSpace(3481) called with
curMem=207532, maxMem=2004174766
15/01/07 11:19:29 INFO MemoryStore: Block broadcast_1_piece0 stored as
bytes in memory (estimated size 3.4 KB, free 1911.1 MB)
15/01/07 11:19:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
on 10.100.5.109:43924 (size: 3.4 KB, free: 1911.3 MB)
15/01/07 11:19:29 INFO BlockManagerMaster: Updated info of block
broadcast_1_piece0
15/01/07 11:19:29 INFO SparkContext: Created broadcast 1 from broadcast at
DAGScheduler.scala:838
15/01/07 11:19:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage
0 (MappedRDD[6] at map at SparkPlan.scala:84)
15/01/07 11:19:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/01/07 11:19:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
15/01/07 11:19:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
1, 10.100.5.109, PROCESS_LOCAL, 1340 bytes)
15/01/07 11:19:29 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
10.100.5.109): java.io.EOFException
at
java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2722)
at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1009)
at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
at
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
at
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
at
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
at
org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
at
org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
   

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Sean Owen
Oh, are you actually bundling Hadoop in your app? that may be the problem.
If you're using stand-alone mode, why include Hadoop? In any event, Spark
and Hadoop are intended to be 'provided' dependencies in the app you send
to spark-submit.

On Tue, Jan 6, 2015 at 10:15 AM, Niranda Perera 
wrote:

> Hi Sean,
>
> My mistake, Guava 11 dependency came from the hadoop-commons indeed.
>
> I'm running the following simple app in spark 1.2.0 standalone local
> cluster (2 workers) with Hadoop 1.2.1
>
> public class AvroSparkTest {
> public static void main(String[] args) throws Exception {
> SparkConf sparkConf = new SparkConf()
> .setMaster("spark://niranda-ThinkPad-T540p:7077")
> //("local[2]")
> .setAppName("avro-spark-test");
>
> JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
> JavaSQLContext sqlContext = new JavaSQLContext(sparkContext);
> JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext,
>
> "/home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro");
> episodes.printSchema();
> episodes.registerTempTable("avroTable");
> List result = sqlContext.sql("SELECT * FROM
> avroTable").collect();
>
> for (Row row : result) {
> System.out.println(row.toString());
> }
> }
> }
>
> As you pointed out, this error occurs while adding the hadoop dependency.
> this runs without a problem when the hadoop dependency is removed and the
> master is set to local[].
>
> Cheers
>
> On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen  wrote:
>
>> -dev
>>
>> Guava was not downgraded to 11. That PR was not merged. It was part of a
>> discussion about, indeed, what to do about potential Guava version
>> conflicts. Spark uses Guava, but so does Hadoop, and so do user programs.
>>
>> Spark uses 14.0.1 in fact:
>> https://github.com/apache/spark/blob/master/pom.xml#L330
>>
>> This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava
>> 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as
>> well.
>>
>> Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a
>> lot of these problems are solved. As we've seen though, this one is tricky.
>>
>> What's your Spark version? and what are you executing? what mode --
>> standalone, YARN? What Hadoop version?
>>
>>
>> On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera 
>> wrote:
>>
>>> Hi,
>>>
>>> I have been running a simple Spark app on a local spark cluster and I
>>> came across this error.
>>>
>>> Exception in thread "main" java.lang.NoSuchMethodError:
>>> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
>>> at org.apache.spark.util.collection.OpenHashSet.org
>>> $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
>>> at
>>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
>>> at
>>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
>>> at
>>> org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
>>> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>>> at
>>> org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
>>> at
>>> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
>>> at
>>> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
>>> at
>>> org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
>>> at
>>> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
>>> at
>>> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
>>> at
>>> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
>>> at
>>> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
>>> at
>>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
>>> at
>>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
>>> at
>>> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
>>> at
>>> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
>>> at
>>> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:84)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
>>> at
>>> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>>> at org.apache.spark.SparkContext.bro

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Niranda Perera
Hi Sean,

My mistake, Guava 11 dependency came from the hadoop-commons indeed.

I'm running the following simple app in spark 1.2.0 standalone local
cluster (2 workers) with Hadoop 1.2.1

public class AvroSparkTest {
public static void main(String[] args) throws Exception {
SparkConf sparkConf = new SparkConf()
.setMaster("spark://niranda-ThinkPad-T540p:7077")
//("local[2]")
.setAppName("avro-spark-test");

JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
JavaSQLContext sqlContext = new JavaSQLContext(sparkContext);
JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext,

"/home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro");
episodes.printSchema();
episodes.registerTempTable("avroTable");
List result = sqlContext.sql("SELECT * FROM
avroTable").collect();

for (Row row : result) {
System.out.println(row.toString());
}
}
}

As you pointed out, this error occurs while adding the hadoop dependency.
this runs without a problem when the hadoop dependency is removed and the
master is set to local[].

Cheers

On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen  wrote:

> -dev
>
> Guava was not downgraded to 11. That PR was not merged. It was part of a
> discussion about, indeed, what to do about potential Guava version
> conflicts. Spark uses Guava, but so does Hadoop, and so do user programs.
>
> Spark uses 14.0.1 in fact:
> https://github.com/apache/spark/blob/master/pom.xml#L330
>
> This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava
> 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as
> well.
>
> Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a
> lot of these problems are solved. As we've seen though, this one is tricky.
>
> What's your Spark version? and what are you executing? what mode --
> standalone, YARN? What Hadoop version?
>
>
> On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera 
> wrote:
>
>> Hi,
>>
>> I have been running a simple Spark app on a local spark cluster and I
>> came across this error.
>>
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
>> at org.apache.spark.util.collection.OpenHashSet.org
>> $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
>> at
>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
>> at
>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
>> at
>> org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
>> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>> at
>> org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
>> at
>> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
>> at
>> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
>> at
>> org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
>> at
>> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
>> at
>> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
>> at
>> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
>> at
>> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
>> at
>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
>> at
>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
>> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
>> at
>> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
>> at
>> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
>> at
>> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
>> at
>> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:84)
>> at
>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>> at
>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
>> at
>> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
>> at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
>> at
>> com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45)
>> at
>> com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44)
>> at
>> org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56)
>> at
>> org.apache.spark.sql.catalyst.planning.QueryPl

Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Sean Owen
-dev

Guava was not downgraded to 11. That PR was not merged. It was part of a
discussion about, indeed, what to do about potential Guava version
conflicts. Spark uses Guava, but so does Hadoop, and so do user programs.

Spark uses 14.0.1 in fact:
https://github.com/apache/spark/blob/master/pom.xml#L330

This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava
11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as well.

Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a
lot of these problems are solved. As we've seen though, this one is tricky.

What's your Spark version? and what are you executing? what mode --
standalone, YARN? What Hadoop version?


On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera 
wrote:

> Hi,
>
> I have been running a simple Spark app on a local spark cluster and I came
> across this error.
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
> at org.apache.spark.util.collection.OpenHashSet.org
> $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
> at
> org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
> at
> org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
> at
> org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
> at
> org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
> at
> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
> at
> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
> at
> org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
> at
> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
> at
> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
> at
> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
> at
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
> at
> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
> at
> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
> at
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
> at
> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
> at
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
> at
> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:84)
> at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
> at
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
> at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
> at
> com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45)
> at
> com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44)
> at
> org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
> at
> org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)
>
>
> While looking into this I found out that Guava was downgraded to version
> 11 in this PR.
> https://github.com/apache/spark/pull/1610
>
> In this PR OpenHashSet.scala:261 line hashInt has been changed to
> hashLong.
> But when I actually run my app,  "java.lang.NoSuchMethodError:
> com.google.common.hash.HashFunction.hashInt" error occurs,
> which is understandable because hashInt is not available before Guava 12.
>
> So, I''m wondering why this occurs?
>
> Che