Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
Dibyendu,

Thanks for the reply.

I am reading your project homepage now.

One quick question I care about is:

If the receivers failed for some reasons(for example, killed brutally by
someone else), is there any mechanism for the receiver to fail over
automatically?

On Mon, Mar 16, 2015 at 3:25 PM, Dibyendu Bhattacharya 
dibyendu.bhattach...@gmail.com wrote:

 Which version of Spark you are running ?

 You can try this Low Level Consumer :
 http://spark-packages.org/package/dibbhatt/kafka-spark-consumer

 This is designed to recover from various failures and have very good fault
 recovery mechanism built in. This is being used by many users and at
 present we at Pearson running this Receiver in Production for almost 3
 months without any issue.

 You can give this a try.

 Regards,
 Dibyendu

 On Mon, Mar 16, 2015 at 12:47 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 You need to figure out why the receivers failed in the first place. Look
 in your worker logs and see what really happened. When you run a streaming
 job continuously for longer period mostly there'll be a lot of logs (you
 can enable log rotation etc.) and if you are doing a groupBy, join, etc
 type of operations, then there will be a lot of shuffle data. So You need
 to check in the worker logs and see what happened (whether DISK full etc.),
 We have streaming pipelines running for weeks without having any issues.

 Thanks
 Best Regards

 On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com wrote:

 Guys,

 We have a project which builds upon Spark streaming.

 We use Kafka as the input stream, and create 5 receivers.

 When this application runs for around 90 hour, all the 5 receivers
 failed for some unknown reasons.

 In my understanding, it is not guaranteed that Spark streaming receiver
 will do fault recovery automatically.

 So I just want to figure out a way for doing fault-recovery to deal with
 receiver failure.

 There is a JIRA post mentioned using StreamingLister for monitoring the
 status of receiver:


 https://issues.apache.org/jira/browse/SPARK-2381?focusedCommentId=14056836page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14056836

 However I haven't found any open doc about how to do this stuff.

 Any guys have met the same issue and deal with it?

 Our environment:
Spark 1.3.0
Dual Master Configuration
Kafka 0.8.2

 Thanks

 --
 yangjun...@gmail.com
 http://hi.baidu.com/yjpro






-- 
yangjun...@gmail.com
http://hi.baidu.com/yjpro


Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
Guys,

We have a project which builds upon Spark streaming.

We use Kafka as the input stream, and create 5 receivers.

When this application runs for around 90 hour, all the 5 receivers failed
for some unknown reasons.

In my understanding, it is not guaranteed that Spark streaming receiver
will do fault recovery automatically.

So I just want to figure out a way for doing fault-recovery to deal with
receiver failure.

There is a JIRA post mentioned using StreamingLister for monitoring the
status of receiver:

https://issues.apache.org/jira/browse/SPARK-2381?focusedCommentId=14056836page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14056836

However I haven't found any open doc about how to do this stuff.

Any guys have met the same issue and deal with it?

Our environment:
   Spark 1.3.0
   Dual Master Configuration
   Kafka 0.8.2

Thanks

-- 
yangjun...@gmail.com
http://hi.baidu.com/yjpro


Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
Akhil,

I have checked the logs. There isn't any clue as to why the 5 receivers
failed.

That's why I just take it for granted that it will be  a common issue for
receiver failures, and we need to figure out a way to detect this kind of
failure and do fail-over.

Thanks

On Mon, Mar 16, 2015 at 3:17 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 You need to figure out why the receivers failed in the first place. Look
 in your worker logs and see what really happened. When you run a streaming
 job continuously for longer period mostly there'll be a lot of logs (you
 can enable log rotation etc.) and if you are doing a groupBy, join, etc
 type of operations, then there will be a lot of shuffle data. So You need
 to check in the worker logs and see what happened (whether DISK full etc.),
 We have streaming pipelines running for weeks without having any issues.

 Thanks
 Best Regards

 On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com wrote:

 Guys,

 We have a project which builds upon Spark streaming.

 We use Kafka as the input stream, and create 5 receivers.

 When this application runs for around 90 hour, all the 5 receivers failed
 for some unknown reasons.

 In my understanding, it is not guaranteed that Spark streaming receiver
 will do fault recovery automatically.

 So I just want to figure out a way for doing fault-recovery to deal with
 receiver failure.

 There is a JIRA post mentioned using StreamingLister for monitoring the
 status of receiver:


 https://issues.apache.org/jira/browse/SPARK-2381?focusedCommentId=14056836page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14056836

 However I haven't found any open doc about how to do this stuff.

 Any guys have met the same issue and deal with it?

 Our environment:
Spark 1.3.0
Dual Master Configuration
Kafka 0.8.2

 Thanks

 --
 yangjun...@gmail.com
 http://hi.baidu.com/yjpro





-- 
yangjun...@gmail.com
http://hi.baidu.com/yjpro


Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Akhil Das
You need to figure out why the receivers failed in the first place. Look in
your worker logs and see what really happened. When you run a streaming job
continuously for longer period mostly there'll be a lot of logs (you can
enable log rotation etc.) and if you are doing a groupBy, join, etc type of
operations, then there will be a lot of shuffle data. So You need to check
in the worker logs and see what happened (whether DISK full etc.), We have
streaming pipelines running for weeks without having any issues.

Thanks
Best Regards

On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com wrote:

 Guys,

 We have a project which builds upon Spark streaming.

 We use Kafka as the input stream, and create 5 receivers.

 When this application runs for around 90 hour, all the 5 receivers failed
 for some unknown reasons.

 In my understanding, it is not guaranteed that Spark streaming receiver
 will do fault recovery automatically.

 So I just want to figure out a way for doing fault-recovery to deal with
 receiver failure.

 There is a JIRA post mentioned using StreamingLister for monitoring the
 status of receiver:


 https://issues.apache.org/jira/browse/SPARK-2381?focusedCommentId=14056836page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14056836

 However I haven't found any open doc about how to do this stuff.

 Any guys have met the same issue and deal with it?

 Our environment:
Spark 1.3.0
Dual Master Configuration
Kafka 0.8.2

 Thanks

 --
 yangjun...@gmail.com
 http://hi.baidu.com/yjpro



Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Dibyendu Bhattacharya
Which version of Spark you are running ?

You can try this Low Level Consumer :
http://spark-packages.org/package/dibbhatt/kafka-spark-consumer

This is designed to recover from various failures and have very good fault
recovery mechanism built in. This is being used by many users and at
present we at Pearson running this Receiver in Production for almost 3
months without any issue.

You can give this a try.

Regards,
Dibyendu

On Mon, Mar 16, 2015 at 12:47 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 You need to figure out why the receivers failed in the first place. Look
 in your worker logs and see what really happened. When you run a streaming
 job continuously for longer period mostly there'll be a lot of logs (you
 can enable log rotation etc.) and if you are doing a groupBy, join, etc
 type of operations, then there will be a lot of shuffle data. So You need
 to check in the worker logs and see what happened (whether DISK full etc.),
 We have streaming pipelines running for weeks without having any issues.

 Thanks
 Best Regards

 On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com wrote:

 Guys,

 We have a project which builds upon Spark streaming.

 We use Kafka as the input stream, and create 5 receivers.

 When this application runs for around 90 hour, all the 5 receivers failed
 for some unknown reasons.

 In my understanding, it is not guaranteed that Spark streaming receiver
 will do fault recovery automatically.

 So I just want to figure out a way for doing fault-recovery to deal with
 receiver failure.

 There is a JIRA post mentioned using StreamingLister for monitoring the
 status of receiver:


 https://issues.apache.org/jira/browse/SPARK-2381?focusedCommentId=14056836page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14056836

 However I haven't found any open doc about how to do this stuff.

 Any guys have met the same issue and deal with it?

 Our environment:
Spark 1.3.0
Dual Master Configuration
Kafka 0.8.2

 Thanks

 --
 yangjun...@gmail.com
 http://hi.baidu.com/yjpro





Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Dibyendu Bhattacharya
Yes.. Auto restart is enabled in my low level consumer ..when there is some
unhandled exception comes...

Even if you see KafkaConsumer.java, for some cases ( like broker failure,
kafka leader changes etc ) it can even refresh the Consumer (The
Coordinator which talks to a Leader) which will recover from those
failures..

Dib

On Mon, Mar 16, 2015 at 1:40 PM, Jun Yang yangjun...@gmail.com wrote:

 I have checked Dibyendu's code, it looks that his implementation has
 auto-restart mechanism:


 
 src/main/java/consumer/kafka/client/KafkaReceiver.java:

 private void start() {

 // Start the thread that receives data over a connection
 KafkaConfig kafkaConfig = new KafkaConfig(_props);
 ZkState zkState = new ZkState(kafkaConfig);
 _kConsumer = new KafkaConsumer(kafkaConfig, zkState, this);
 _kConsumer.open(_partitionId);

 Thread.UncaughtExceptionHandler eh = new
 Thread.UncaughtExceptionHandler() {
 public void uncaughtException(Thread th, Throwable ex) {
   restart(Restarting Receiver for Partition  + _partitionId ,
 ex, 5000);
 }
 };

 _consumerThread = new Thread(_kConsumer);
 _consumerThread.setDaemon(true);
 _consumerThread.setUncaughtExceptionHandler(eh);
 _consumerThread.start();
   }

 
 I also checked Spark's native Kafka Receiver implementation, and it looks
 not have any auto-restart support.

 Any comments from Dibyendu?

 On Mon, Mar 16, 2015 at 3:39 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 As i seen, once i kill my receiver on one machine, it will automatically
 spawn another receiver on another machine or on the same machine.

 Thanks
 Best Regards

 On Mon, Mar 16, 2015 at 1:08 PM, Jun Yang yangjun...@gmail.com wrote:

 Dibyendu,

 Thanks for the reply.

 I am reading your project homepage now.

 One quick question I care about is:

 If the receivers failed for some reasons(for example, killed brutally by
 someone else), is there any mechanism for the receiver to fail over
 automatically?

 On Mon, Mar 16, 2015 at 3:25 PM, Dibyendu Bhattacharya 
 dibyendu.bhattach...@gmail.com wrote:

 Which version of Spark you are running ?

 You can try this Low Level Consumer :
 http://spark-packages.org/package/dibbhatt/kafka-spark-consumer

 This is designed to recover from various failures and have very good
 fault recovery mechanism built in. This is being used by many users and at
 present we at Pearson running this Receiver in Production for almost 3
 months without any issue.

 You can give this a try.

 Regards,
 Dibyendu

 On Mon, Mar 16, 2015 at 12:47 PM, Akhil Das ak...@sigmoidanalytics.com
  wrote:

 You need to figure out why the receivers failed in the first place.
 Look in your worker logs and see what really happened. When you run a
 streaming job continuously for longer period mostly there'll be a lot of
 logs (you can enable log rotation etc.) and if you are doing a groupBy,
 join, etc type of operations, then there will be a lot of shuffle data. So
 You need to check in the worker logs and see what happened (whether DISK
 full etc.), We have streaming pipelines running for weeks without having
 any issues.

 Thanks
 Best Regards

 On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com
 wrote:

 Guys,

 We have a project which builds upon Spark streaming.

 We use Kafka as the input stream, and create 5 receivers.

 When this application runs for around 90 hour, all the 5 receivers
 failed for some unknown reasons.

 In my understanding, it is not guaranteed that Spark streaming
 receiver will do fault recovery automatically.

 So I just want to figure out a way for doing fault-recovery to deal
 with receiver failure.

 There is a JIRA post mentioned using StreamingLister for monitoring
 the status of receiver:


 https://issues.apache.org/jira/browse/SPARK-2381?focusedCommentId=14056836page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14056836

 However I haven't found any open doc about how to do this stuff.

 Any guys have met the same issue and deal with it?

 Our environment:
Spark 1.3.0
Dual Master Configuration
Kafka 0.8.2

 Thanks

 --
 yangjun...@gmail.com
 http://hi.baidu.com/yjpro






 --
 yangjun...@gmail.com
 http://hi.baidu.com/yjpro





 --
 yangjun...@gmail.com
 http://hi.baidu.com/yjpro



Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
I have checked Dibyendu's code, it looks that his implementation has
auto-restart mechanism:


src/main/java/consumer/kafka/client/KafkaReceiver.java:

private void start() {

// Start the thread that receives data over a connection
KafkaConfig kafkaConfig = new KafkaConfig(_props);
ZkState zkState = new ZkState(kafkaConfig);
_kConsumer = new KafkaConsumer(kafkaConfig, zkState, this);
_kConsumer.open(_partitionId);

Thread.UncaughtExceptionHandler eh = new
Thread.UncaughtExceptionHandler() {
public void uncaughtException(Thread th, Throwable ex) {
  restart(Restarting Receiver for Partition  + _partitionId , ex,
5000);
}
};

_consumerThread = new Thread(_kConsumer);
_consumerThread.setDaemon(true);
_consumerThread.setUncaughtExceptionHandler(eh);
_consumerThread.start();
  }

I also checked Spark's native Kafka Receiver implementation, and it looks
not have any auto-restart support.

Any comments from Dibyendu?

On Mon, Mar 16, 2015 at 3:39 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 As i seen, once i kill my receiver on one machine, it will automatically
 spawn another receiver on another machine or on the same machine.

 Thanks
 Best Regards

 On Mon, Mar 16, 2015 at 1:08 PM, Jun Yang yangjun...@gmail.com wrote:

 Dibyendu,

 Thanks for the reply.

 I am reading your project homepage now.

 One quick question I care about is:

 If the receivers failed for some reasons(for example, killed brutally by
 someone else), is there any mechanism for the receiver to fail over
 automatically?

 On Mon, Mar 16, 2015 at 3:25 PM, Dibyendu Bhattacharya 
 dibyendu.bhattach...@gmail.com wrote:

 Which version of Spark you are running ?

 You can try this Low Level Consumer :
 http://spark-packages.org/package/dibbhatt/kafka-spark-consumer

 This is designed to recover from various failures and have very good
 fault recovery mechanism built in. This is being used by many users and at
 present we at Pearson running this Receiver in Production for almost 3
 months without any issue.

 You can give this a try.

 Regards,
 Dibyendu

 On Mon, Mar 16, 2015 at 12:47 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 You need to figure out why the receivers failed in the first place.
 Look in your worker logs and see what really happened. When you run a
 streaming job continuously for longer period mostly there'll be a lot of
 logs (you can enable log rotation etc.) and if you are doing a groupBy,
 join, etc type of operations, then there will be a lot of shuffle data. So
 You need to check in the worker logs and see what happened (whether DISK
 full etc.), We have streaming pipelines running for weeks without having
 any issues.

 Thanks
 Best Regards

 On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com
 wrote:

 Guys,

 We have a project which builds upon Spark streaming.

 We use Kafka as the input stream, and create 5 receivers.

 When this application runs for around 90 hour, all the 5 receivers
 failed for some unknown reasons.

 In my understanding, it is not guaranteed that Spark streaming
 receiver will do fault recovery automatically.

 So I just want to figure out a way for doing fault-recovery to deal
 with receiver failure.

 There is a JIRA post mentioned using StreamingLister for monitoring
 the status of receiver:


 https://issues.apache.org/jira/browse/SPARK-2381?focusedCommentId=14056836page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14056836

 However I haven't found any open doc about how to do this stuff.

 Any guys have met the same issue and deal with it?

 Our environment:
Spark 1.3.0
Dual Master Configuration
Kafka 0.8.2

 Thanks

 --
 yangjun...@gmail.com
 http://hi.baidu.com/yjpro






 --
 yangjun...@gmail.com
 http://hi.baidu.com/yjpro





-- 
yangjun...@gmail.com
http://hi.baidu.com/yjpro