There should be something in the worker/driver logs regarding the failure. For receiver failures, you can try the lowlevel kafka consumer <https://github.com/dibbhatt/kafka-spark-consumer/> as Dibyendu suggested, You need to have a high-availability setup with Monitoring enabled (nagios etc configured to trigger alerts/perform actions etc) to have your pipeline running smoothly.
Thanks Best Regards On Mon, Mar 16, 2015 at 1:03 PM, Jun Yang <yangjun...@gmail.com> wrote: > Akhil, > > I have checked the logs. There isn't any clue as to why the 5 receivers > failed. > > That's why I just take it for granted that it will be a common issue for > receiver failures, and we need to figure out a way to detect this kind of > failure and do fail-over. > > Thanks > > On Mon, Mar 16, 2015 at 3:17 PM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> You need to figure out why the receivers failed in the first place. Look >> in your worker logs and see what really happened. When you run a streaming >> job continuously for longer period mostly there'll be a lot of logs (you >> can enable log rotation etc.) and if you are doing a groupBy, join, etc >> type of operations, then there will be a lot of shuffle data. So You need >> to check in the worker logs and see what happened (whether DISK full etc.), >> We have streaming pipelines running for weeks without having any issues. >> >> Thanks >> Best Regards >> >> On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang <yangjun...@gmail.com> wrote: >> >>> Guys, >>> >>> We have a project which builds upon Spark streaming. >>> >>> We use Kafka as the input stream, and create 5 receivers. >>> >>> When this application runs for around 90 hour, all the 5 receivers >>> failed for some unknown reasons. >>> >>> In my understanding, it is not guaranteed that Spark streaming receiver >>> will do fault recovery automatically. >>> >>> So I just want to figure out a way for doing fault-recovery to deal with >>> receiver failure. >>> >>> There is a JIRA post mentioned using StreamingLister for monitoring the >>> status of receiver: >>> >>> >>> https://issues.apache.org/jira/browse/SPARK-2381?focusedCommentId=14056836&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14056836 >>> >>> However I haven't found any open doc about how to do this stuff. >>> >>> Any guys have met the same issue and deal with it? >>> >>> Our environment: >>> Spark 1.3.0 >>> Dual Master Configuration >>> Kafka 0.8.2 >>> >>> Thanks >>> >>> -- >>> yangjun...@gmail.com >>> http://hi.baidu.com/yjpro >>> >> >> > > > -- > yangjun...@gmail.com > http://hi.baidu.com/yjpro >