Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-22 Thread Tathagata Das
Vaguely makes sense. :) Wow that's an interesting corner case.

On Wed, Apr 22, 2015 at 1:57 PM, Jean-Pascal Billaud 
wrote:

> I have now a fair understanding of the situation after looking at javap
> output. So as a reminder:
>
> dstream.map(tuple => {
> val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
> (tuple._1, (tuple._2, w)) })
>
> And StreamState being a very simple standalone object:
>
> object StreamState {
>   def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey: String, key:
> K) : Option[V] = None
> }
>
> Basically the serialization failed because the ClassTag[K] came from the
> enclosing class, in which the dstream.map() code is running e.g. :
>
> class A[K : ClassTag](val dstream: DStream[K]) {
>   [...]
>   def fun =
> dstream.map(tuple => {
>   val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
>   (tuple._1, (tuple._2, w)) })
>}
>
> therefore the instance of class A is being serialized and it fails when
> the dstream field call writeObject() when it checks for the graph field...
>
> The fact that graph is not set might be expected given that I have not
> started the context yet...
>
> Cheers,
>
>
> On Tue, Apr 21, 2015 at 6:17 PM, Tathagata Das 
> wrote:
>
>> It is kind of unexpected, i can imagine a real scenario under which it
>> should trigger. But obviously I am missing something :)
>>
>> TD
>>
>> On Tue, Apr 21, 2015 at 4:23 PM, Jean-Pascal Billaud 
>> wrote:
>>
>>> Sure. But in general, I am assuming this ""Graph is unexpectedly null
>>> when DStream is being serialized" must mean something. Under which
>>> circumstances, such an exception would trigger?
>>>
>>> On Tue, Apr 21, 2015 at 4:12 PM, Tathagata Das 
>>> wrote:
>>>
 Yeah, I am not sure what is going on. The only way to figure to take a
 look at the disassembled bytecodes using javap.

 TD

 On Tue, Apr 21, 2015 at 1:53 PM, Jean-Pascal Billaud 
 wrote:

> At this point I am assuming that nobody has an idea... I am still
> going to give it a last shot just in case it was missed by some people :)
>
> Thanks,
>
> On Mon, Apr 20, 2015 at 2:20 PM, Jean-Pascal Billaud  > wrote:
>
>> Hey, so I start the context at the very end when all the piping is
>> done. BTW a foreachRDD will be called on the resulting dstream.map() 
>> right
>> after that.
>>
>> The puzzling thing is why removing the context bounds solve the
>> problem... What does this exception mean in general?
>>
>> On Mon, Apr 20, 2015 at 1:33 PM, Tathagata Das 
>> wrote:
>>
>>> When are you getting this exception? After starting the context?
>>>
>>> TD
>>>
>>> On Mon, Apr 20, 2015 at 10:44 AM, Jean-Pascal Billaud <
>>> j...@tellapart.com> wrote:
>>>
 Hi,

 I am getting this serialization exception and I am not too sure
 what "Graph is unexpectedly null when DStream is being serialized" 
 means?

 15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status:
 FAILED, exitCode: 15, (reason: User class threw exception: Task not
 serializable)
 Exception in thread "Driver" org.apache.spark.SparkException: Task
 not serializable
 at org.apache.spark.util.ClosureCleaner$.
 ensureSerializable(ClosureCleaner.scala:166)
 at org.apache.spark.util.ClosureCleaner$.clean(
 ClosureCleaner.scala:158)
 at org.apache.spark.SparkContext.
 clean(SparkContext.scala:1435)
 at org.apache.spark.streaming.dstream.DStream.map(DStream.
 scala:438)
 [...]
 Caused by: java.io.NotSerializableException: Graph is unexpectedly
 null when DStream is being serialized.
 at org.apache.spark.streaming.dstream.DStream$anonfun$
 writeObject$1.apply$mcV$sp(DStream.scala:420)
 at org.apache.spark.util.Utils$.
 tryOrIOException(Utils.scala:985)
 at org.apache.spark.streaming.dstream.DStream.writeObject(
 DStream.scala:403)

 The operation comes down to something like this:

 dstream.map(tuple => {
 val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
 (tuple._1, (tuple._2, w)) })

 And StreamState being a very simple standalone object:

 object StreamState {
   def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey:
 String, key: K) : Option[V] = None
 }

 However if I remove the context bounds from K in fetch e.g.
 removing ClassTag and Ordering then everything is fine.

 If anyone has some pointers, I'd really appreciate it.

 Thanks,

>>>
>>>
>>
>

>>>
>>
>


Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-22 Thread Jean-Pascal Billaud
I have now a fair understanding of the situation after looking at javap
output. So as a reminder:

dstream.map(tuple => {
val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
(tuple._1, (tuple._2, w)) })

And StreamState being a very simple standalone object:

object StreamState {
  def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey: String, key:
K) : Option[V] = None
}

Basically the serialization failed because the ClassTag[K] came from the
enclosing class, in which the dstream.map() code is running e.g. :

class A[K : ClassTag](val dstream: DStream[K]) {
  [...]
  def fun =
dstream.map(tuple => {
  val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
  (tuple._1, (tuple._2, w)) })
   }

therefore the instance of class A is being serialized and it fails when the
dstream field call writeObject() when it checks for the graph field...

The fact that graph is not set might be expected given that I have not
started the context yet...

Cheers,


On Tue, Apr 21, 2015 at 6:17 PM, Tathagata Das  wrote:

> It is kind of unexpected, i can imagine a real scenario under which it
> should trigger. But obviously I am missing something :)
>
> TD
>
> On Tue, Apr 21, 2015 at 4:23 PM, Jean-Pascal Billaud 
> wrote:
>
>> Sure. But in general, I am assuming this ""Graph is unexpectedly null
>> when DStream is being serialized" must mean something. Under which
>> circumstances, such an exception would trigger?
>>
>> On Tue, Apr 21, 2015 at 4:12 PM, Tathagata Das 
>> wrote:
>>
>>> Yeah, I am not sure what is going on. The only way to figure to take a
>>> look at the disassembled bytecodes using javap.
>>>
>>> TD
>>>
>>> On Tue, Apr 21, 2015 at 1:53 PM, Jean-Pascal Billaud 
>>> wrote:
>>>
 At this point I am assuming that nobody has an idea... I am still going
 to give it a last shot just in case it was missed by some people :)

 Thanks,

 On Mon, Apr 20, 2015 at 2:20 PM, Jean-Pascal Billaud 
 wrote:

> Hey, so I start the context at the very end when all the piping is
> done. BTW a foreachRDD will be called on the resulting dstream.map() right
> after that.
>
> The puzzling thing is why removing the context bounds solve the
> problem... What does this exception mean in general?
>
> On Mon, Apr 20, 2015 at 1:33 PM, Tathagata Das 
> wrote:
>
>> When are you getting this exception? After starting the context?
>>
>> TD
>>
>> On Mon, Apr 20, 2015 at 10:44 AM, Jean-Pascal Billaud <
>> j...@tellapart.com> wrote:
>>
>>> Hi,
>>>
>>> I am getting this serialization exception and I am not too sure what
>>> "Graph is unexpectedly null when DStream is being serialized" means?
>>>
>>> 15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status:
>>> FAILED, exitCode: 15, (reason: User class threw exception: Task not
>>> serializable)
>>> Exception in thread "Driver" org.apache.spark.SparkException: Task
>>> not serializable
>>> at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
>>> ClosureCleaner.scala:166)
>>> at org.apache.spark.util.ClosureCleaner$.clean(
>>> ClosureCleaner.scala:158)
>>> at org.apache.spark.SparkContext.
>>> clean(SparkContext.scala:1435)
>>> at org.apache.spark.streaming.dstream.DStream.map(DStream.
>>> scala:438)
>>> [...]
>>> Caused by: java.io.NotSerializableException: Graph is unexpectedly
>>> null when DStream is being serialized.
>>> at org.apache.spark.streaming.dstream.DStream$anonfun$
>>> writeObject$1.apply$mcV$sp(DStream.scala:420)
>>> at org.apache.spark.util.Utils$.
>>> tryOrIOException(Utils.scala:985)
>>> at org.apache.spark.streaming.dstream.DStream.writeObject(
>>> DStream.scala:403)
>>>
>>> The operation comes down to something like this:
>>>
>>> dstream.map(tuple => {
>>> val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
>>> (tuple._1, (tuple._2, w)) })
>>>
>>> And StreamState being a very simple standalone object:
>>>
>>> object StreamState {
>>>   def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey:
>>> String, key: K) : Option[V] = None
>>> }
>>>
>>> However if I remove the context bounds from K in fetch e.g. removing
>>> ClassTag and Ordering then everything is fine.
>>>
>>> If anyone has some pointers, I'd really appreciate it.
>>>
>>> Thanks,
>>>
>>
>>
>

>>>
>>
>


Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-21 Thread Tathagata Das
It is kind of unexpected, i can imagine a real scenario under which it
should trigger. But obviously I am missing something :)

TD

On Tue, Apr 21, 2015 at 4:23 PM, Jean-Pascal Billaud 
wrote:

> Sure. But in general, I am assuming this ""Graph is unexpectedly null
> when DStream is being serialized" must mean something. Under which
> circumstances, such an exception would trigger?
>
> On Tue, Apr 21, 2015 at 4:12 PM, Tathagata Das 
> wrote:
>
>> Yeah, I am not sure what is going on. The only way to figure to take a
>> look at the disassembled bytecodes using javap.
>>
>> TD
>>
>> On Tue, Apr 21, 2015 at 1:53 PM, Jean-Pascal Billaud 
>> wrote:
>>
>>> At this point I am assuming that nobody has an idea... I am still going
>>> to give it a last shot just in case it was missed by some people :)
>>>
>>> Thanks,
>>>
>>> On Mon, Apr 20, 2015 at 2:20 PM, Jean-Pascal Billaud 
>>> wrote:
>>>
 Hey, so I start the context at the very end when all the piping is
 done. BTW a foreachRDD will be called on the resulting dstream.map() right
 after that.

 The puzzling thing is why removing the context bounds solve the
 problem... What does this exception mean in general?

 On Mon, Apr 20, 2015 at 1:33 PM, Tathagata Das 
 wrote:

> When are you getting this exception? After starting the context?
>
> TD
>
> On Mon, Apr 20, 2015 at 10:44 AM, Jean-Pascal Billaud <
> j...@tellapart.com> wrote:
>
>> Hi,
>>
>> I am getting this serialization exception and I am not too sure what
>> "Graph is unexpectedly null when DStream is being serialized" means?
>>
>> 15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status:
>> FAILED, exitCode: 15, (reason: User class threw exception: Task not
>> serializable)
>> Exception in thread "Driver" org.apache.spark.SparkException: Task
>> not serializable
>> at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
>> ClosureCleaner.scala:166)
>> at org.apache.spark.util.ClosureCleaner$.clean(
>> ClosureCleaner.scala:158)
>> at org.apache.spark.SparkContext.
>> clean(SparkContext.scala:1435)
>> at org.apache.spark.streaming.dstream.DStream.map(DStream.
>> scala:438)
>> [...]
>> Caused by: java.io.NotSerializableException: Graph is unexpectedly
>> null when DStream is being serialized.
>> at org.apache.spark.streaming.dstream.DStream$anonfun$
>> writeObject$1.apply$mcV$sp(DStream.scala:420)
>> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:
>> 985)
>> at org.apache.spark.streaming.dstream.DStream.writeObject(
>> DStream.scala:403)
>>
>> The operation comes down to something like this:
>>
>> dstream.map(tuple => {
>> val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
>> (tuple._1, (tuple._2, w)) })
>>
>> And StreamState being a very simple standalone object:
>>
>> object StreamState {
>>   def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey: String,
>> key: K) : Option[V] = None
>> }
>>
>> However if I remove the context bounds from K in fetch e.g. removing
>> ClassTag and Ordering then everything is fine.
>>
>> If anyone has some pointers, I'd really appreciate it.
>>
>> Thanks,
>>
>
>

>>>
>>
>


Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-21 Thread Jean-Pascal Billaud
Sure. But in general, I am assuming this ""Graph is unexpectedly null when
DStream is being serialized" must mean something. Under which
circumstances, such an exception would trigger?

On Tue, Apr 21, 2015 at 4:12 PM, Tathagata Das  wrote:

> Yeah, I am not sure what is going on. The only way to figure to take a
> look at the disassembled bytecodes using javap.
>
> TD
>
> On Tue, Apr 21, 2015 at 1:53 PM, Jean-Pascal Billaud 
> wrote:
>
>> At this point I am assuming that nobody has an idea... I am still going
>> to give it a last shot just in case it was missed by some people :)
>>
>> Thanks,
>>
>> On Mon, Apr 20, 2015 at 2:20 PM, Jean-Pascal Billaud 
>> wrote:
>>
>>> Hey, so I start the context at the very end when all the piping is done.
>>> BTW a foreachRDD will be called on the resulting dstream.map() right after
>>> that.
>>>
>>> The puzzling thing is why removing the context bounds solve the
>>> problem... What does this exception mean in general?
>>>
>>> On Mon, Apr 20, 2015 at 1:33 PM, Tathagata Das 
>>> wrote:
>>>
 When are you getting this exception? After starting the context?

 TD

 On Mon, Apr 20, 2015 at 10:44 AM, Jean-Pascal Billaud >>> > wrote:

> Hi,
>
> I am getting this serialization exception and I am not too sure what
> "Graph is unexpectedly null when DStream is being serialized" means?
>
> 15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status:
> FAILED, exitCode: 15, (reason: User class threw exception: Task not
> serializable)
> Exception in thread "Driver" org.apache.spark.SparkException: Task
> not serializable
> at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
> ClosureCleaner.scala:166)
> at org.apache.spark.util.ClosureCleaner$.clean(
> ClosureCleaner.scala:158)
> at org.apache.spark.SparkContext.
> clean(SparkContext.scala:1435)
> at org.apache.spark.streaming.dstream.DStream.map(DStream.
> scala:438)
> [...]
> Caused by: java.io.NotSerializableException: Graph is unexpectedly
> null when DStream is being serialized.
> at org.apache.spark.streaming.dstream.DStream$anonfun$
> writeObject$1.apply$mcV$sp(DStream.scala:420)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:
> 985)
> at org.apache.spark.streaming.dstream.DStream.writeObject(
> DStream.scala:403)
>
> The operation comes down to something like this:
>
> dstream.map(tuple => {
> val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
> (tuple._1, (tuple._2, w)) })
>
> And StreamState being a very simple standalone object:
>
> object StreamState {
>   def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey: String,
> key: K) : Option[V] = None
> }
>
> However if I remove the context bounds from K in fetch e.g. removing
> ClassTag and Ordering then everything is fine.
>
> If anyone has some pointers, I'd really appreciate it.
>
> Thanks,
>


>>>
>>
>


Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-21 Thread Tathagata Das
Yeah, I am not sure what is going on. The only way to figure to take a look
at the disassembled bytecodes using javap.

TD

On Tue, Apr 21, 2015 at 1:53 PM, Jean-Pascal Billaud 
wrote:

> At this point I am assuming that nobody has an idea... I am still going to
> give it a last shot just in case it was missed by some people :)
>
> Thanks,
>
> On Mon, Apr 20, 2015 at 2:20 PM, Jean-Pascal Billaud 
> wrote:
>
>> Hey, so I start the context at the very end when all the piping is done.
>> BTW a foreachRDD will be called on the resulting dstream.map() right after
>> that.
>>
>> The puzzling thing is why removing the context bounds solve the
>> problem... What does this exception mean in general?
>>
>> On Mon, Apr 20, 2015 at 1:33 PM, Tathagata Das 
>> wrote:
>>
>>> When are you getting this exception? After starting the context?
>>>
>>> TD
>>>
>>> On Mon, Apr 20, 2015 at 10:44 AM, Jean-Pascal Billaud 
>>> wrote:
>>>
 Hi,

 I am getting this serialization exception and I am not too sure what
 "Graph is unexpectedly null when DStream is being serialized" means?

 15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status:
 FAILED, exitCode: 15, (reason: User class threw exception: Task not
 serializable)
 Exception in thread "Driver" org.apache.spark.SparkException: Task not
 serializable
 at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
 ClosureCleaner.scala:166)
 at org.apache.spark.util.ClosureCleaner$.clean(
 ClosureCleaner.scala:158)
 at org.apache.spark.SparkContext.clean(SparkContext.scala:1435)
 at org.apache.spark.streaming.dstream.DStream.map(DStream.
 scala:438)
 [...]
 Caused by: java.io.NotSerializableException: Graph is unexpectedly
 null when DStream is being serialized.
 at org.apache.spark.streaming.dstream.DStream$anonfun$
 writeObject$1.apply$mcV$sp(DStream.scala:420)
 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:
 985)
 at org.apache.spark.streaming.dstream.DStream.writeObject(
 DStream.scala:403)

 The operation comes down to something like this:

 dstream.map(tuple => {
 val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
 (tuple._1, (tuple._2, w)) })

 And StreamState being a very simple standalone object:

 object StreamState {
   def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey: String,
 key: K) : Option[V] = None
 }

 However if I remove the context bounds from K in fetch e.g. removing
 ClassTag and Ordering then everything is fine.

 If anyone has some pointers, I'd really appreciate it.

 Thanks,

>>>
>>>
>>
>


Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-21 Thread Jean-Pascal Billaud
At this point I am assuming that nobody has an idea... I am still going to
give it a last shot just in case it was missed by some people :)

Thanks,

On Mon, Apr 20, 2015 at 2:20 PM, Jean-Pascal Billaud 
wrote:

> Hey, so I start the context at the very end when all the piping is done.
> BTW a foreachRDD will be called on the resulting dstream.map() right after
> that.
>
> The puzzling thing is why removing the context bounds solve the problem...
> What does this exception mean in general?
>
> On Mon, Apr 20, 2015 at 1:33 PM, Tathagata Das 
> wrote:
>
>> When are you getting this exception? After starting the context?
>>
>> TD
>>
>> On Mon, Apr 20, 2015 at 10:44 AM, Jean-Pascal Billaud 
>> wrote:
>>
>>> Hi,
>>>
>>> I am getting this serialization exception and I am not too sure what
>>> "Graph is unexpectedly null when DStream is being serialized" means?
>>>
>>> 15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status: FAILED,
>>> exitCode: 15, (reason: User class threw exception: Task not serializable)
>>> Exception in thread "Driver" org.apache.spark.SparkException: Task not
>>> serializable
>>> at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
>>> ClosureCleaner.scala:166)
>>> at org.apache.spark.util.ClosureCleaner$.clean(
>>> ClosureCleaner.scala:158)
>>> at org.apache.spark.SparkContext.clean(SparkContext.scala:1435)
>>> at org.apache.spark.streaming.dstream.DStream.map(DStream.
>>> scala:438)
>>> [...]
>>> Caused by: java.io.NotSerializableException: Graph is unexpectedly null
>>> when DStream is being serialized.
>>> at org.apache.spark.streaming.dstream.DStream$anonfun$
>>> writeObject$1.apply$mcV$sp(DStream.scala:420)
>>> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:
>>> 985)
>>> at org.apache.spark.streaming.dstream.DStream.writeObject(
>>> DStream.scala:403)
>>>
>>> The operation comes down to something like this:
>>>
>>> dstream.map(tuple => {
>>> val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
>>> (tuple._1, (tuple._2, w)) })
>>>
>>> And StreamState being a very simple standalone object:
>>>
>>> object StreamState {
>>>   def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey: String,
>>> key: K) : Option[V] = None
>>> }
>>>
>>> However if I remove the context bounds from K in fetch e.g. removing
>>> ClassTag and Ordering then everything is fine.
>>>
>>> If anyone has some pointers, I'd really appreciate it.
>>>
>>> Thanks,
>>>
>>
>>
>


Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-20 Thread Jean-Pascal Billaud
Hey, so I start the context at the very end when all the piping is done.
BTW a foreachRDD will be called on the resulting dstream.map() right after
that.

The puzzling thing is why removing the context bounds solve the problem...
What does this exception mean in general?

On Mon, Apr 20, 2015 at 1:33 PM, Tathagata Das  wrote:

> When are you getting this exception? After starting the context?
>
> TD
>
> On Mon, Apr 20, 2015 at 10:44 AM, Jean-Pascal Billaud 
> wrote:
>
>> Hi,
>>
>> I am getting this serialization exception and I am not too sure what
>> "Graph is unexpectedly null when DStream is being serialized" means?
>>
>> 15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status: FAILED,
>> exitCode: 15, (reason: User class threw exception: Task not serializable)
>> Exception in thread "Driver" org.apache.spark.SparkException: Task not
>> serializable
>> at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
>> ClosureCleaner.scala:166)
>> at org.apache.spark.util.ClosureCleaner$.clean(
>> ClosureCleaner.scala:158)
>> at org.apache.spark.SparkContext.clean(SparkContext.scala:1435)
>> at org.apache.spark.streaming.dstream.DStream.map(DStream.
>> scala:438)
>> [...]
>> Caused by: java.io.NotSerializableException: Graph is unexpectedly null
>> when DStream is being serialized.
>> at org.apache.spark.streaming.dstream.DStream$anonfun$
>> writeObject$1.apply$mcV$sp(DStream.scala:420)
>> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
>> at org.apache.spark.streaming.dstream.DStream.writeObject(
>> DStream.scala:403)
>>
>> The operation comes down to something like this:
>>
>> dstream.map(tuple => {
>> val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
>> (tuple._1, (tuple._2, w)) })
>>
>> And StreamState being a very simple standalone object:
>>
>> object StreamState {
>>   def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey: String,
>> key: K) : Option[V] = None
>> }
>>
>> However if I remove the context bounds from K in fetch e.g. removing
>> ClassTag and Ordering then everything is fine.
>>
>> If anyone has some pointers, I'd really appreciate it.
>>
>> Thanks,
>>
>
>


Re: Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-20 Thread Tathagata Das
When are you getting this exception? After starting the context?

TD

On Mon, Apr 20, 2015 at 10:44 AM, Jean-Pascal Billaud 
wrote:

> Hi,
>
> I am getting this serialization exception and I am not too sure what
> "Graph is unexpectedly null when DStream is being serialized" means?
>
> 15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status: FAILED,
> exitCode: 15, (reason: User class threw exception: Task not serializable)
> Exception in thread "Driver" org.apache.spark.SparkException: Task not
> serializable
> at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
> ClosureCleaner.scala:166)
> at org.apache.spark.util.ClosureCleaner$.clean(
> ClosureCleaner.scala:158)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:1435)
> at org.apache.spark.streaming.dstream.DStream.map(DStream.
> scala:438)
> [...]
> Caused by: java.io.NotSerializableException: Graph is unexpectedly null
> when DStream is being serialized.
> at org.apache.spark.streaming.dstream.DStream$anonfun$
> writeObject$1.apply$mcV$sp(DStream.scala:420)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
> at org.apache.spark.streaming.dstream.DStream.writeObject(
> DStream.scala:403)
>
> The operation comes down to something like this:
>
> dstream.map(tuple => {
> val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
> (tuple._1, (tuple._2, w)) })
>
> And StreamState being a very simple standalone object:
>
> object StreamState {
>   def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey: String, key:
> K) : Option[V] = None
> }
>
> However if I remove the context bounds from K in fetch e.g. removing
> ClassTag and Ordering then everything is fine.
>
> If anyone has some pointers, I'd really appreciate it.
>
> Thanks,
>


Task not Serializable: Graph is unexpectedly null when DStream is being serialized

2015-04-20 Thread Jean-Pascal Billaud
Hi,

I am getting this serialization exception and I am not too sure what "Graph
is unexpectedly null when DStream is being serialized" means?

15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status: FAILED,
exitCode: 15, (reason: User class threw exception: Task not serializable)
Exception in thread "Driver" org.apache.spark.SparkException: Task not
serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
ClosureCleaner.scala:166)
at org.apache.spark.util.ClosureCleaner$.clean(
ClosureCleaner.scala:158)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1435)
at org.apache.spark.streaming.dstream.DStream.map(DStream.scala:438)
[...]
Caused by: java.io.NotSerializableException: Graph is unexpectedly null
when DStream is being serialized.
at org.apache.spark.streaming.dstream.DStream$anonfun$
writeObject$1.apply$mcV$sp(DStream.scala:420)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
at org.apache.spark.streaming.dstream.DStream.writeObject(
DStream.scala:403)

The operation comes down to something like this:

dstream.map(tuple => {
val w = StreamState.fetch[K,W](state.prefixKey, tuple._1)
(tuple._1, (tuple._2, w)) })

And StreamState being a very simple standalone object:

object StreamState {
  def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey: String, key:
K) : Option[V] = None
}

However if I remove the context bounds from K in fetch e.g. removing
ClassTag and Ordering then everything is fine.

If anyone has some pointers, I'd really appreciate it.

Thanks,