[ https://issues.apache.org/jira/browse/SPARK-14737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278374#comment-15278374 ]
Faisal commented on SPARK-14737: -------------------------------- Hi Sean, Here are few points to note as per my original reported improvements. 1- How current spark streaming using direct stream API provides retry option? You mentioned i can write retry logic in application but how is not cleared yet. 2- This issue is created as improvement/feature not bug. Isn't it nice to have this feature in-built with configurable parameter? I am pasting sample code in comments just to be on same page with code. > Kafka Brokers are down - spark stream should retry > -------------------------------------------------- > > Key: SPARK-14737 > URL: https://issues.apache.org/jira/browse/SPARK-14737 > Project: Spark > Issue Type: Improvement > Components: Streaming > Affects Versions: 1.3.0 > Environment: Suse Linux, Cloudera Enterprise 5.4.8 (#7 built by > jenkins on 20151023-1205 git: d7dbdf29ac1d57ae9fb19958502d50dcf4e4fffd), > kafka_2.10-0.8.2.2 > Reporter: Faisal > > I have spark streaming application that uses direct streaming - listening to > KAFKA topic. > {code} > HashMap<String, String> kafkaParams = new HashMap<String, String>(); > kafkaParams.put("metadata.broker.list", "broker1,broker2,broker3"); > kafkaParams.put("auto.offset.reset", "largest"); > HashSet<String> topicsSet = new HashSet<String>(); > topicsSet.add("Topic1"); > JavaPairInputDStream<String, String> messages = > KafkaUtils.createDirectStream( > jssc, > String.class, > String.class, > StringDecoder.class, > StringDecoder.class, > kafkaParams, > topicsSet > ); > {code} > I notice when i stop/shutdown kafka brokers, my spark application also > shutdown. > Here is the spark execution script > {code} > spark-submit \ > --master yarn-cluster \ > --files /home/siddiquf/spark/log4j-spark.xml > --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark.xml" \ > --conf > "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark.xml" \ > --class com.example.MyDataStreamProcessor \ > myapp.jar > {code} > Spark job submitted successfully and i can track the application driver and > worker/executor nodes. > Everything works fine but only concern if kafka borkers are offline or > restarted my application controlled by yarn should not shutdown? but it does. > If this is expected behavior then how to handle such situation with least > maintenance? Keeping in mind Kafka cluster is not in hadoop cluster and > managed by different team that is why requires our application to be > resilient enough. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org