[ 
https://issues.apache.org/jira/browse/SPARK-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951772#comment-14951772
 ] 

Sean Owen commented on SPARK-11016:
-----------------------------------

Yes, I get that roaringbitmap has a particular serialization mechanism that 
Kryo has to be taught to use. I think the answer to my dumb question was: yes 
Spark still uses roaringbitmap so it has to ensure Kryo knows how to serialize 
it, including registering serializers. Yes you're doing the right thing then.

These classes implement Externalizable but not Serializable; I think the 
KryoJavaSerializer could be registered to delegate serialization to the 
correct, custom Java serialization these classes define. But they're not 
Serializable. I wonder if we can build a KryoJavaExternalizableSerializer to do 
something similar automatically? that's tidy. Then it just needs to register 
roaringbitmaps classes to use this.

Otherwise, if it has to be bridged by hand, then since DataOutputStream is a 
DataOutput that can wrap an OutputStream, and you can get an OutputStream from 
KryoOutput, it seems possible.

> Spark fails when running with a task that requires a more recent version of 
> RoaringBitmaps
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-11016
>                 URL: https://issues.apache.org/jira/browse/SPARK-11016
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.4.0
>            Reporter: Charles Allen
>
> The following error appears during Kryo init whenever a more recent version 
> (>0.5.0) of Roaring bitmaps is required by a job. 
> org/roaringbitmap/RoaringArray$Element was removed in 0.5.0
> {code}
> A needed class was not found. This could be due to an error in your runpath. 
> Missing class: org/roaringbitmap/RoaringArray$Element
> java.lang.NoClassDefFoundError: org/roaringbitmap/RoaringArray$Element
>       at 
> org.apache.spark.serializer.KryoSerializer$.<init>(KryoSerializer.scala:338)
>       at 
> org.apache.spark.serializer.KryoSerializer$.<clinit>(KryoSerializer.scala)
>       at 
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:93)
>       at 
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:237)
>       at 
> org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:222)
>       at 
> org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:138)
>       at 
> org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201)
>       at 
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
>       at 
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85)
>       at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>       at 
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)
>       at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1318)
>       at 
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1006)
>       at 
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1003)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>       at org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>       at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1003)
>       at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:818)
>       at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:816)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>       at org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>       at org.apache.spark.SparkContext.textFile(SparkContext.scala:816)
> {code}
> See https://issues.apache.org/jira/browse/SPARK-5949 for related info



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to