I would actually think about this the other way around. Move the functions you 
are passing to the streaming jobs out to their own object if possible. Spark's 
closure capture rules are necessarily far reaching and serialize the object 
that contains these methods, which is a common cause of the problem you're 
seeing. 

Another option is to mark the non-serializable state as "@transient" if it is 
never accessed by the worker processes. 

> On Jun 24, 2016, at 1:23 AM, Simon Scott <simon.sc...@viavisolutions.com> 
> wrote:
> 
> Hi,
>  
> I am developing a streaming application using checkpointing on Spark 1.5.1
>  
> I have just run into a NotSerializableException because some of the state 
> that my streaming functions need cannot be serialized. This state is only 
> used in the driver process, it is the checkpointing that requires the 
> serialization.
>  
> So I am considering moving that state into a Scala “object” – i.e. global 
> singleton that must be mutable to allow the state to be set at application 
> start.
>  
> I would prefer to be able to create immutable state and attach it to either 
> the SparkContext or SparkStreamingContext but I can’t find an api for that.
>  
> Does anybody else think is a good idea? Is there a better way? Or would such 
> an api be a useful enhancement to Spark?
>  
> Thanks in advance
> Simon
>  
> Research Developer
> Viavi Solutions

Reply via email to