Yes. I ran into this problem with mahout snapshot and spark 1.2.0 not really trying to figure out why that was a problem, since there were already too many moving parts in my app. Obviously there is a classpath issue somewhere.
/Erlend On 27 Feb 2015 22:30, "Pat Ferrel" <p...@occamsmachete.com> wrote: > @Erlend hah, we were trying to merge your PR and ran into this—small > world. You actually compile the JavaSerializer source in your project? > > @Marcelo do you mean by modifying spark.executor.extraClassPath on all > workers, that didn’t seem to work? > > On Feb 27, 2015, at 1:23 PM, Erlend Hamnaberg <erl...@hamnaberg.net> > wrote: > > Hi. > > I have had a simliar issue. I had to pull the JavaSerializer source into > my own project, just so I got the classloading of this class under control. > > This must be a class loader issue with spark. > > -E > > On Fri, Feb 27, 2015 at 8:52 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > >> I understand that I need to supply Guava to Spark. The HashBiMap is >> created in the client and broadcast to the workers. So it is needed in >> both. To achieve this there is a deps.jar with Guava (and Scopt but that is >> only for the client). Scopt is found so I know the jar is fine for the >> client. >> >> I pass in the deps.jar to the context creation code. I’ve checked the >> content of the jar and have verified that it is used at context creation >> time. >> >> I register the serializer as follows: >> >> class MyKryoRegistrator extends KryoRegistrator { >> >> override def registerClasses(kryo: Kryo) = { >> >> val h: HashBiMap[String, Int] = HashBiMap.create[String, Int]() >> //kryo.addDefaultSerializer(h.getClass, new JavaSerializer()) >> log.info("\n\n\nRegister Serializer for " + >> h.getClass.getCanonicalName + "\n\n\n") // just to be sure this does indeed >> get logged >> kryo.register(classOf[com.google.common.collect.HashBiMap[String, >> Int]], new JavaSerializer()) >> } >> } >> >> The job proceeds up until the broadcast value, a HashBiMap, is >> deserialized, which is where I get the following error. >> >> Have I missed a step for deserialization of broadcast values? Odd that >> serialization happened but deserialization failed. I’m running on a >> standalone localhost-only cluster. >> >> >> 15/02/27 11:40:34 WARN scheduler.TaskSetManager: Lost task 1.0 in stage >> 4.0 (TID 9, 192.168.0.2): java.io.IOException: >> com.esotericsoftware.kryo.KryoException: Error during Java deserialization. >> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1093) >> at >> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) >> at >> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) >> at >> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) >> at >> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) >> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) >> at >> my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:95) >> at >> my.TDIndexedDatasetReader$$anonfun$5.apply(TextDelimitedReaderWriter.scala:94) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at >> org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366) >> at >> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211) >> at >> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:56) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: com.esotericsoftware.kryo.KryoException: Error during Java >> deserialization. >> at >> com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42) >> at >> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) >> at >> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144) >> at >> org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216) >> at >> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177) >> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090) >> ... 19 more >> >> ======== root eror ========== >> Caused by: java.lang.ClassNotFoundException: >> com.google.common.collect.HashBiMap >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> ... >> >> >> >> >> >> >> >> >> On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> Guava is not in Spark. (Well, long version: it's in Spark but it's >> relocated to a different package except for some special classes >> leaked through the public API.) >> >> If your app needs Guava, it needs to package Guava with it (e.g. by >> using maven-shade-plugin, or using "--jars" if only executors use >> Guava). >> >> On Wed, Feb 25, 2015 at 5:17 PM, Pat Ferrel <p...@occamsmachete.com> >> wrote: >> > The root Spark pom has guava set at a certain version number. It’s very >> hard >> > to read the shading xml. Someone suggested that I try using >> > userClassPathFirst but that sounds too heavy handed since I don’t really >> > care which version of guava I get, not picky. >> > >> > When I set my project to use the same version as Spark I get a missing >> > classdef, which usually means a version conflict. >> > >> > At this point I am quite confused about what is actually in Spark as >> far as >> > Guava and how to coexist with it happily. >> > >> > Let me rephrase my question: Does anyone know how or has anyone used >> Guava >> > in a project? Is there a recommended way to use it in a job? >> >> -- >> Marcelo >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > >