There should be something in the worker/driver logs regarding the failure.

For receiver failures, you can try the lowlevel kafka consumer
<https://github.com/dibbhatt/kafka-spark-consumer/> as Dibyendu suggested,
You need to have a high-availability setup with Monitoring enabled (nagios
etc configured to trigger alerts/perform actions etc) to have your pipeline
running smoothly.

Thanks
Best Regards

On Mon, Mar 16, 2015 at 1:03 PM, Jun Yang <yangjun...@gmail.com> wrote:

> Akhil,
>
> I have checked the logs. There isn't any clue as to why the 5 receivers
> failed.
>
> That's why I just take it for granted that it will be  a common issue for
> receiver failures, and we need to figure out a way to detect this kind of
> failure and do fail-over.
>
> Thanks
>
> On Mon, Mar 16, 2015 at 3:17 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> You need to figure out why the receivers failed in the first place. Look
>> in your worker logs and see what really happened. When you run a streaming
>> job continuously for longer period mostly there'll be a lot of logs (you
>> can enable log rotation etc.) and if you are doing a groupBy, join, etc
>> type of operations, then there will be a lot of shuffle data. So You need
>> to check in the worker logs and see what happened (whether DISK full etc.),
>> We have streaming pipelines running for weeks without having any issues.
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang <yangjun...@gmail.com> wrote:
>>
>>> Guys,
>>>
>>> We have a project which builds upon Spark streaming.
>>>
>>> We use Kafka as the input stream, and create 5 receivers.
>>>
>>> When this application runs for around 90 hour, all the 5 receivers
>>> failed for some unknown reasons.
>>>
>>> In my understanding, it is not guaranteed that Spark streaming receiver
>>> will do fault recovery automatically.
>>>
>>> So I just want to figure out a way for doing fault-recovery to deal with
>>> receiver failure.
>>>
>>> There is a JIRA post mentioned using StreamingLister for monitoring the
>>> status of receiver:
>>>
>>>
>>> https://issues.apache.org/jira/browse/SPARK-2381?focusedCommentId=14056836&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14056836
>>>
>>> However I haven't found any open doc about how to do this stuff.
>>>
>>> Any guys have met the same issue and deal with it?
>>>
>>> Our environment:
>>>    Spark 1.3.0
>>>    Dual Master Configuration
>>>    Kafka 0.8.2
>>>
>>> Thanks
>>>
>>> --
>>> yangjun...@gmail.com
>>> http://hi.baidu.com/yjpro
>>>
>>
>>
>
>
> --
> yangjun...@gmail.com
> http://hi.baidu.com/yjpro
>

Reply via email to