Re: NullPointerException on reading checkpoint files

2014-09-23 Thread RodrigoB
Hi TD,

This is actually an important requirement (recovery of shared variables) for
us as we need to spread some referential data across the Spark nodes on
application startup. I just bumped into this issue on Spark version 1.0.1. I
assume the latest one also doesn't include this capability. Are there any
plans to do so. 

If not could you give me your opinion on how difficult would it be to
implement this? If it's nothing too complex I could consider contributing on
that level.

BTW, regarding recovery I have posted a topic on which I would very much
appreciate your comments on
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-data-checkpoint-cleaning-td14847.html

tnks,
Rod



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-on-reading-checkpoint-files-tp7306p14882.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: NullPointerException on reading checkpoint files

2014-09-23 Thread Tathagata Das
This is actually a very tricky as their two pretty big challenges that need
to be solved.
(i) Checkpointing for broadcast variables: Unlike RDDs, broadcasts variable
dont have checkpointing support (that is you cannot write the content of a
 broadcast variable to HDFS and recover it automatically when needed).
(ii) Remembering the checkpoint info of broacast vars used in every batch,
and recovering those vars from the checkpoint info. And exposing this in
the API such that it can be used such that all the checkpointing/recovering
can be done by Spark Streaming seamlessly without user's knowledge.

I have some thoughts on it, but nothing concrete yet. The first, that is,
broadcast checkpointing, should be straight forward, and may be rewarding
outside streaming.

TD

On Tue, Sep 23, 2014 at 4:22 PM, RodrigoB rodrigo.boav...@aspect.com
wrote:

 Hi TD,

 This is actually an important requirement (recovery of shared variables)
 for
 us as we need to spread some referential data across the Spark nodes on
 application startup. I just bumped into this issue on Spark version 1.0.1.
 I
 assume the latest one also doesn't include this capability. Are there any
 plans to do so.

 If not could you give me your opinion on how difficult would it be to
 implement this? If it's nothing too complex I could consider contributing
 on
 that level.

 BTW, regarding recovery I have posted a topic on which I would very much
 appreciate your comments on

 http://apache-spark-user-list.1001560.n3.nabble.com/RDD-data-checkpoint-cleaning-td14847.html

 tnks,
 Rod



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-on-reading-checkpoint-files-tp7306p14882.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: NullPointerException on reading checkpoint files

2014-06-12 Thread Kiran
I am also seeing similar problem when trying to continue job using saved
checkpoint. Can somebody help in solving this problem?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-on-reading-checkpoint-files-tp7306p7507.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.