Hi,

I am getting this serialization exception and I am not too sure what "Graph
is unexpectedly null when DStream is being serialized" means?

15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status: FAILED,
exitCode: 15, (reason: User class threw exception: Task not serializable)
Exception in thread "Driver" org.apache.spark.SparkException: Task not
serializable
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
ClosureCleaner.scala:166)
        at org.apache.spark.util.ClosureCleaner$.clean(
ClosureCleaner.scala:158)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:1435)
        at org.apache.spark.streaming.dstream.DStream.map(DStream.scala:438)
        [...]
Caused by: java.io.NotSerializableException: Graph is unexpectedly null
when DStream is being serialized.
        at org.apache.spark.streaming.dstream.DStream$anonfun$
writeObject$1.apply$mcV$sp(DStream.scala:420)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
        at org.apache.spark.streaming.dstream.DStream.writeObject(
DStream.scala:403)

The operation comes down to something like this:

dstream.map(tuple => {
val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
(tuple._1, (tuple._2, w)) })

And StreamState being a very simple standalone object:

object StreamState {
  def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey: String, key:
K) : Option[V] = None
}

However if I remove the context bounds from K in fetch e.g. removing
ClassTag and Ordering then everything is fine.

If anyone has some pointers, I'd really appreciate it.

Thanks,

Reply via email to