I have checked Dibyendu's code, it looks that his implementation has auto-restart mechanism:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- src/main/java/consumer/kafka/client/KafkaReceiver.java: private void start() { // Start the thread that receives data over a connection KafkaConfig kafkaConfig = new KafkaConfig(_props); ZkState zkState = new ZkState(kafkaConfig); _kConsumer = new KafkaConsumer(kafkaConfig, zkState, this); _kConsumer.open(_partitionId); Thread.UncaughtExceptionHandler eh = new Thread.UncaughtExceptionHandler() { public void uncaughtException(Thread th, Throwable ex) { restart("Restarting Receiver for Partition " + _partitionId , ex, 5000); } }; _consumerThread = new Thread(_kConsumer); _consumerThread.setDaemon(true); _consumerThread.setUncaughtExceptionHandler(eh); _consumerThread.start(); } -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- I also checked Spark's native Kafka Receiver implementation, and it looks not have any auto-restart support. Any comments from Dibyendu? On Mon, Mar 16, 2015 at 3:39 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > As i seen, once i kill my receiver on one machine, it will automatically > spawn another receiver on another machine or on the same machine. > > Thanks > Best Regards > > On Mon, Mar 16, 2015 at 1:08 PM, Jun Yang <yangjun...@gmail.com> wrote: > >> Dibyendu, >> >> Thanks for the reply. >> >> I am reading your project homepage now. >> >> One quick question I care about is: >> >> If the receivers failed for some reasons(for example, killed brutally by >> someone else), is there any mechanism for the receiver to fail over >> automatically? >> >> On Mon, Mar 16, 2015 at 3:25 PM, Dibyendu Bhattacharya < >> dibyendu.bhattach...@gmail.com> wrote: >> >>> Which version of Spark you are running ? >>> >>> You can try this Low Level Consumer : >>> http://spark-packages.org/package/dibbhatt/kafka-spark-consumer >>> >>> This is designed to recover from various failures and have very good >>> fault recovery mechanism built in. This is being used by many users and at >>> present we at Pearson running this Receiver in Production for almost 3 >>> months without any issue. >>> >>> You can give this a try. >>> >>> Regards, >>> Dibyendu >>> >>> On Mon, Mar 16, 2015 at 12:47 PM, Akhil Das <ak...@sigmoidanalytics.com> >>> wrote: >>> >>>> You need to figure out why the receivers failed in the first place. >>>> Look in your worker logs and see what really happened. When you run a >>>> streaming job continuously for longer period mostly there'll be a lot of >>>> logs (you can enable log rotation etc.) and if you are doing a groupBy, >>>> join, etc type of operations, then there will be a lot of shuffle data. So >>>> You need to check in the worker logs and see what happened (whether DISK >>>> full etc.), We have streaming pipelines running for weeks without having >>>> any issues. >>>> >>>> Thanks >>>> Best Regards >>>> >>>> On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang <yangjun...@gmail.com> >>>> wrote: >>>> >>>>> Guys, >>>>> >>>>> We have a project which builds upon Spark streaming. >>>>> >>>>> We use Kafka as the input stream, and create 5 receivers. >>>>> >>>>> When this application runs for around 90 hour, all the 5 receivers >>>>> failed for some unknown reasons. >>>>> >>>>> In my understanding, it is not guaranteed that Spark streaming >>>>> receiver will do fault recovery automatically. >>>>> >>>>> So I just want to figure out a way for doing fault-recovery to deal >>>>> with receiver failure. >>>>> >>>>> There is a JIRA post mentioned using StreamingLister for monitoring >>>>> the status of receiver: >>>>> >>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-2381?focusedCommentId=14056836&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14056836 >>>>> >>>>> However I haven't found any open doc about how to do this stuff. >>>>> >>>>> Any guys have met the same issue and deal with it? >>>>> >>>>> Our environment: >>>>> Spark 1.3.0 >>>>> Dual Master Configuration >>>>> Kafka 0.8.2 >>>>> >>>>> Thanks >>>>> >>>>> -- >>>>> yangjun...@gmail.com >>>>> http://hi.baidu.com/yjpro >>>>> >>>> >>>> >>> >> >> >> -- >> yangjun...@gmail.com >> http://hi.baidu.com/yjpro >> > > -- yangjun...@gmail.com http://hi.baidu.com/yjpro