You need to figure out why the receivers failed in the first place. Look in
your worker logs and see what really happened. When you run a streaming job
continuously for longer period mostly there'll be a lot of logs (you can
enable log rotation etc.) and if you are doing a groupBy, join, etc type of
operations, then there will be a lot of shuffle data. So You need to check
in the worker logs and see what happened (whether DISK full etc.), We have
streaming pipelines running for weeks without having any issues.

Thanks
Best Regards

On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang <yangjun...@gmail.com> wrote:

> Guys,
>
> We have a project which builds upon Spark streaming.
>
> We use Kafka as the input stream, and create 5 receivers.
>
> When this application runs for around 90 hour, all the 5 receivers failed
> for some unknown reasons.
>
> In my understanding, it is not guaranteed that Spark streaming receiver
> will do fault recovery automatically.
>
> So I just want to figure out a way for doing fault-recovery to deal with
> receiver failure.
>
> There is a JIRA post mentioned using StreamingLister for monitoring the
> status of receiver:
>
>
> https://issues.apache.org/jira/browse/SPARK-2381?focusedCommentId=14056836&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14056836
>
> However I haven't found any open doc about how to do this stuff.
>
> Any guys have met the same issue and deal with it?
>
> Our environment:
>    Spark 1.3.0
>    Dual Master Configuration
>    Kafka 0.8.2
>
> Thanks
>
> --
> yangjun...@gmail.com
> http://hi.baidu.com/yjpro
>

Reply via email to