Re: Upgrade to Spark 1.2.1 using Guava

2015-03-02 Thread Pat Ferrel
Marcelo’s work-around works. So if you are using the itemsimilarity stuff, the 
CLI has a way to solve the class not found and I can point out how to do the 
equivalent if you are using the library API. Ping me if you care.


On Feb 28, 2015, at 2:27 PM, Erlend Hamnaberg erl...@hamnaberg.net wrote:

Yes. I ran into this problem with mahout snapshot and spark 1.2.0 not really 
trying to figure out why that was a problem, since there were already too many 
moving parts in my app. Obviously there is a classpath issue somewhere.

/Erlend

On 27 Feb 2015 22:30, Pat Ferrel p...@occamsmachete.com 
mailto:p...@occamsmachete.com wrote:
@Erlend hah, we were trying to merge your PR and ran into this—small world. You 
actually compile the JavaSerializer source in your project?

@Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, 
that didn’t seem to work?

On Feb 27, 2015, at 1:23 PM, Erlend Hamnaberg erl...@hamnaberg.net 
mailto:erl...@hamnaberg.net wrote:

Hi.

I have had a simliar issue. I had to pull the JavaSerializer source into my own 
project, just so I got the classloading of this class under control.

This must be a class loader issue with spark.

-E

On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com 
mailto:p...@occamsmachete.com wrote:
I understand that I need to supply Guava to Spark. The HashBiMap is created in 
the client and broadcast to the workers. So it is needed in both. To achieve 
this there is a deps.jar with Guava (and Scopt but that is only for the 
client). Scopt is found so I know the jar is fine for the client.

I pass in the deps.jar to the context creation code. I’ve checked the content 
of the jar and have verified that it is used at context creation time.

I register the serializer as follows:

class MyKryoRegistrator extends KryoRegistrator {

  override def registerClasses(kryo: Kryo) = {

val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
//kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
log.info http://log.info/(\n\n\nRegister Serializer for  + 
h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed get 
logged
kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], 
new JavaSerializer())
  }
}

The job proceeds up until the broadcast value, a HashBiMap, is deserialized, 
which is where I get the following error.

Have I missed a step for deserialization of broadcast values? Odd that 
serialization happened but deserialization failed. I’m running on a standalone 
localhost-only cluster.


15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 
(TID 9, 192.168.0.2): java.io.IOException: 
com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at 
org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
deserialization.
at 
com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
at 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-28 Thread Pat Ferrel
Maybe but any time the work around is to use spark-submit --conf 
spark.executor.extraClassPath=/guava.jar blah” that means that standalone apps 
must have hard coded paths that are honored on every worker. And as you know a 
lib is pretty much blocked from use of this version of Spark—hence the blocker 
severity.

I could easily be wrong but userClassPathFirst doesn’t seem to be the issue. 
There is no class conflict.

On Feb 27, 2015, at 7:13 PM, Sean Owen so...@cloudera.com wrote:

This seems like a job for userClassPathFirst. Or could be. It's
definitely an issue of visibility between where the serializer is and
where the user class is.

At the top you said Pat that you didn't try this, but why not?

On Fri, Feb 27, 2015 at 10:11 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I’ll try to find a Jira for it. I hope a fix is in 1.3
 
 
 On Feb 27, 2015, at 1:59 PM, Pat Ferrel p...@occamsmachete.com wrote:
 
 Thanks! that worked.
 
 On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote:
 
 I don’t use spark-submit I have a standalone app.
 
 So I guess you want me to add that key/value to the conf in my code and make 
 sure it exists on workers.
 
 
 On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote:
 
 On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.
 
 Sorry, I'm still confused about what config you're changing. I'm
 suggesting using:
 
 spark-submit --conf spark.executor.extraClassPath=/guava.jar blah
 
 
 --
 Marcelo
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-28 Thread Erlend Hamnaberg
Yes. I ran into this problem with mahout snapshot and spark 1.2.0 not
really trying to figure out why that was a problem, since there were
already too many moving parts in my app. Obviously there is a classpath
issue somewhere.

/Erlend
On 27 Feb 2015 22:30, Pat Ferrel p...@occamsmachete.com wrote:

 @Erlend hah, we were trying to merge your PR and ran into this—small
 world. You actually compile the JavaSerializer source in your project?

 @Marcelo do you mean by modifying spark.executor.extraClassPath on all
 workers, that didn’t seem to work?

 On Feb 27, 2015, at 1:23 PM, Erlend Hamnaberg erl...@hamnaberg.net
 wrote:

 Hi.

 I have had a simliar issue. I had to pull the JavaSerializer source into
 my own project, just so I got the classloading of this class under control.

 This must be a class loader issue with spark.

 -E

 On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com wrote:

 I understand that I need to supply Guava to Spark. The HashBiMap is
 created in the client and broadcast to the workers. So it is needed in
 both. To achieve this there is a deps.jar with Guava (and Scopt but that is
 only for the client). Scopt is found so I know the jar is fine for the
 client.

 I pass in the deps.jar to the context creation code. I’ve checked the
 content of the jar and have verified that it is used at context creation
 time.

 I register the serializer as follows:

 class MyKryoRegistrator extends KryoRegistrator {

   override def registerClasses(kryo: Kryo) = {

 val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
 //kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
 log.info(\n\n\nRegister Serializer for  +
 h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed
 get logged
 kryo.register(classOf[com.google.common.collect.HashBiMap[String,
 Int]], new JavaSerializer())
   }
 }

 The job proceeds up until the broadcast value, a HashBiMap, is
 deserialized, which is where I get the following error.

 Have I missed a step for deserialization of broadcast values? Odd that
 serialization happened but deserialization failed. I’m running on a
 standalone localhost-only cluster.


 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage
 4.0 (TID 9, 192.168.0.2): java.io.IOException:
 com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
 at
 org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
 at
 org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
 at
 org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
 at
 org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
 at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
 at
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
 at
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at
 org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
 at
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
 at
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: com.esotericsoftware.kryo.KryoException: Error during Java
 deserialization.
 at
 com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
 at
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
 at
 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
 at
 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
 ... 19 more

  root eror ==
 Caused by: java.lang.ClassNotFoundException:
 com.google.common.collect.HashBiMap
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Sean Owen
This seems like a job for userClassPathFirst. Or could be. It's
definitely an issue of visibility between where the serializer is and
where the user class is.

At the top you said Pat that you didn't try this, but why not?

On Fri, Feb 27, 2015 at 10:11 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I’ll try to find a Jira for it. I hope a fix is in 1.3


 On Feb 27, 2015, at 1:59 PM, Pat Ferrel p...@occamsmachete.com wrote:

 Thanks! that worked.

 On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote:

 I don’t use spark-submit I have a standalone app.

 So I guess you want me to add that key/value to the conf in my code and make 
 sure it exists on workers.


 On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote:

 On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.

 Sorry, I'm still confused about what config you're changing. I'm
 suggesting using:

 spark-submit --conf spark.executor.extraClassPath=/guava.jar blah


 --
 Marcelo

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
I understand that I need to supply Guava to Spark. The HashBiMap is created in 
the client and broadcast to the workers. So it is needed in both. To achieve 
this there is a deps.jar with Guava (and Scopt but that is only for the 
client). Scopt is found so I know the jar is fine for the client. 

I pass in the deps.jar to the context creation code. I’ve checked the content 
of the jar and have verified that it is used at context creation time. 

I register the serializer as follows:

class MyKryoRegistrator extends KryoRegistrator {

  override def registerClasses(kryo: Kryo) = {

val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
//kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
log.info(\n\n\nRegister Serializer for  + h.getClass.getCanonicalName + 
\n\n\n) // just to be sure this does indeed get logged
kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], 
new JavaSerializer())
  }
}

The job proceeds up until the broadcast value, a HashBiMap, is deserialized, 
which is where I get the following error. 

Have I missed a step for deserialization of broadcast values? Odd that 
serialization happened but deserialization failed. I’m running on a standalone 
localhost-only cluster.


15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 
(TID 9, 192.168.0.2): java.io.IOException: 
com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at 
org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
deserialization.
at 
com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
... 19 more

 root eror ==
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
...








On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote:

Guava is not in Spark. (Well, long version: it's in Spark but it's
relocated to a different package except for some special classes
leaked through the public API.)

If your app needs Guava, it needs to package Guava with it (e.g. by
using maven-shade-plugin, or using --jars if only executors use
Guava).

On Wed, Feb 25, 2015 at 5:17 PM, Pat Ferrel p...@occamsmachete.com wrote:
 The root Spark pom has guava set at a certain version number. It’s very hard
 to read the shading xml. Someone suggested that I try using
 userClassPathFirst but that sounds too heavy handed since I don’t really
 care which version of guava I get, not picky.
 
 When I set my project to use the same version as Spark I get a missing
 classdef, which usually means a version conflict.
 
 At this point I am quite confused about what is 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
Ah, I see. That makes a lot of sense now.

You might be running into some weird class loader visibility issue.
I've seen some bugs in jira about this in the past, maybe you're
hitting one of them.

Until I have some time to investigate (of if you're curious feel free
to scavenge jira), a workaround could be to manually copy the guava
jar to your executor nodes, and add them to the executor's class path
manually (spark.executor.extraClassPath). That will place your guava
in the Spark classloader (vs. your app's class loader when using
--jars), and things should work.


On Fri, Feb 27, 2015 at 11:52 AM, Pat Ferrel p...@occamsmachete.com wrote:
 I understand that I need to supply Guava to Spark. The HashBiMap is created 
 in the client and broadcast to the workers. So it is needed in both. To 
 achieve this there is a deps.jar with Guava (and Scopt but that is only for 
 the client). Scopt is found so I know the jar is fine for the client.

 I pass in the deps.jar to the context creation code. I’ve checked the content 
 of the jar and have verified that it is used at context creation time.

 I register the serializer as follows:

 class MyKryoRegistrator extends KryoRegistrator {

   override def registerClasses(kryo: Kryo) = {

 val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
 //kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
 log.info(\n\n\nRegister Serializer for  + h.getClass.getCanonicalName + 
 \n\n\n) // just to be sure this does indeed get logged
 kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], 
 new JavaSerializer())
   }
 }

 The job proceeds up until the broadcast value, a HashBiMap, is deserialized, 
 which is where I get the following error.

 Have I missed a step for deserialization of broadcast values? Odd that 
 serialization happened but deserialization failed. I’m running on a 
 standalone localhost-only cluster.


 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 
 (TID 9, 192.168.0.2): java.io.IOException: 
 com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
 at 
 org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
 at 
 org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
 at 
 org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
 at 
 org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
 at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
 at 
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
 at 
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at 
 org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
 at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
 at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
 deserialization.
 at 
 com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
 at 
 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
 at 
 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
 ... 19 more

  root eror ==
 Caused by: java.lang.ClassNotFoundException: 
 com.google.common.collect.HashBiMap
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 ...








 On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote:

 Guava is not in Spark. (Well, long version: it's in Spark but it's
 relocated to a 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Erlend Hamnaberg
Hi.

I have had a simliar issue. I had to pull the JavaSerializer source into my
own project, just so I got the classloading of this class under control.

This must be a class loader issue with spark.

-E

On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com wrote:

 I understand that I need to supply Guava to Spark. The HashBiMap is
 created in the client and broadcast to the workers. So it is needed in
 both. To achieve this there is a deps.jar with Guava (and Scopt but that is
 only for the client). Scopt is found so I know the jar is fine for the
 client.

 I pass in the deps.jar to the context creation code. I’ve checked the
 content of the jar and have verified that it is used at context creation
 time.

 I register the serializer as follows:

 class MyKryoRegistrator extends KryoRegistrator {

   override def registerClasses(kryo: Kryo) = {

 val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
 //kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
 log.info(\n\n\nRegister Serializer for  +
 h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed
 get logged
 kryo.register(classOf[com.google.common.collect.HashBiMap[String,
 Int]], new JavaSerializer())
   }
 }

 The job proceeds up until the broadcast value, a HashBiMap, is
 deserialized, which is where I get the following error.

 Have I missed a step for deserialization of broadcast values? Odd that
 serialization happened but deserialization failed. I’m running on a
 standalone localhost-only cluster.


 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage
 4.0 (TID 9, 192.168.0.2): java.io.IOException:
 com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
 at
 org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
 at
 org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
 at
 org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
 at
 org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
 at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
 at
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
 at
 my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at
 org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
 at
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
 at
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:56)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: com.esotericsoftware.kryo.KryoException: Error during Java
 deserialization.
 at
 com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
 at
 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
 at
 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
 ... 19 more

  root eror ==
 Caused by: java.lang.ClassNotFoundException:
 com.google.common.collect.HashBiMap
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 ...








 On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote:

 Guava is not in Spark. (Well, long version: it's in Spark but it's
 relocated to a different package except for some special classes
 leaked through the public API.)

 If your app needs Guava, it needs to package Guava with it (e.g. by
 using maven-shade-plugin, or using --jars if only executors use
 Guava).

 On Wed, Feb 25, 2015 at 5:17 PM, Pat Ferrel p...@occamsmachete.com wrote:
  The root Spark pom has guava set at a certain version number. It’s very
 hard
  to read the 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
On Fri, Feb 27, 2015 at 1:30 PM, Pat Ferrel p...@occamsmachete.com wrote:
 @Marcelo do you mean by modifying spark.executor.extraClassPath on all
 workers, that didn’t seem to work?

That's an app configuration, not a worker configuration, so if you're
trying to set it on the worker configuration it will definitely not
work.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
I don’t use spark-submit I have a standalone app.

So I guess you want me to add that key/value to the conf in my code and make 
sure it exists on workers.


On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote:

On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.

Sorry, I'm still confused about what config you're changing. I'm
suggesting using:

spark-submit --conf spark.executor.extraClassPath=/guava.jar blah


-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
@Erlend hah, we were trying to merge your PR and ran into this—small world. You 
actually compile the JavaSerializer source in your project?

@Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, 
that didn’t seem to work?

On Feb 27, 2015, at 1:23 PM, Erlend Hamnaberg erl...@hamnaberg.net wrote:

Hi.

I have had a simliar issue. I had to pull the JavaSerializer source into my own 
project, just so I got the classloading of this class under control.

This must be a class loader issue with spark.

-E

On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel p...@occamsmachete.com 
mailto:p...@occamsmachete.com wrote:
I understand that I need to supply Guava to Spark. The HashBiMap is created in 
the client and broadcast to the workers. So it is needed in both. To achieve 
this there is a deps.jar with Guava (and Scopt but that is only for the 
client). Scopt is found so I know the jar is fine for the client.

I pass in the deps.jar to the context creation code. I’ve checked the content 
of the jar and have verified that it is used at context creation time.

I register the serializer as follows:

class MyKryoRegistrator extends KryoRegistrator {

  override def registerClasses(kryo: Kryo) = {

val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]()
//kryo.addDefaultSerializer(h.getClass, new JavaSerializer())
log.info http://log.info/(\n\n\nRegister Serializer for  + 
h.getClass.getCanonicalName + \n\n\n) // just to be sure this does indeed get 
logged
kryo.register(classOf[com.google.common.collect.HashBiMap[String, Int]], 
new JavaSerializer())
  }
}

The job proceeds up until the broadcast value, a HashBiMap, is deserialized, 
which is where I get the following error.

Have I missed a step for deserialization of broadcast values? Odd that 
serialization happened but deserialization failed. I’m running on a standalone 
localhost-only cluster.


15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 
(TID 9, 192.168.0.2): java.io.IOException: 
com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95)
at 
my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at 
org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
deserialization.
at 
com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
... 19 more

 root eror ==
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
...








On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com 
mailto:van...@cloudera.com wrote:

Guava is not in Spark. (Well, long version: it's in Spark but it's
relocated to a different package except for some special classes
leaked 

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.

Sorry, I'm still confused about what config you're changing. I'm
suggesting using:

spark-submit --conf spark.executor.extraClassPath=/guava.jar blah


-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
Thanks! that worked.

On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote:

I don’t use spark-submit I have a standalone app.

So I guess you want me to add that key/value to the conf in my code and make 
sure it exists on workers.


On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote:

On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.

Sorry, I'm still confused about what config you're changing. I'm
suggesting using:

spark-submit --conf spark.executor.extraClassPath=/guava.jar blah


-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
I’ll try to find a Jira for it. I hope a fix is in 1.3


On Feb 27, 2015, at 1:59 PM, Pat Ferrel p...@occamsmachete.com wrote:

Thanks! that worked.

On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote:

I don’t use spark-submit I have a standalone app.

So I guess you want me to add that key/value to the conf in my code and make 
sure it exists on workers.


On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote:

On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.

Sorry, I'm still confused about what config you're changing. I'm
suggesting using:

spark-submit --conf spark.executor.extraClassPath=/guava.jar blah


-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Pat Ferrel
I changed in the spark master conf, which is also the only worker. I added a 
path to the jar that has guava in it. Still can’t find the class.

Trying Erland’s idea next.

On Feb 27, 2015, at 1:35 PM, Marcelo Vanzin van...@cloudera.com wrote:

On Fri, Feb 27, 2015 at 1:30 PM, Pat Ferrel p...@occamsmachete.com wrote:
 @Marcelo do you mean by modifying spark.executor.extraClassPath on all
 workers, that didn’t seem to work?

That's an app configuration, not a worker configuration, so if you're
trying to set it on the worker configuration it will definitely not
work.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org