Faisal created SPARK-14737:
------------------------------

             Summary: Kafka Brokers are down - spark stream should retry
                 Key: SPARK-14737
                 URL: https://issues.apache.org/jira/browse/SPARK-14737
             Project: Spark
          Issue Type: Improvement
          Components: Streaming
    Affects Versions: 1.3.0
         Environment: Suse Linux, Cloudera Enterprise 5.4.8 (#7 built by 
jenkins on 20151023-1205 git: d7dbdf29ac1d57ae9fb19958502d50dcf4e4fffd), 
kafka_2.10-0.8.2.2

            Reporter: Faisal


I have spark streaming application that uses direct streaming - listening to 
KAFKA topic.
{code}
HashMap<String, String> kafkaParams = new HashMap<String, String>();
    kafkaParams.put("metadata.broker.list", "broker1,broker2,broker3");
    kafkaParams.put("auto.offset.reset", "largest");

    HashSet<String> topicsSet = new HashSet<String>();
    topicsSet.add("Topic1");

    JavaPairInputDStream<String, String> messages = 
KafkaUtils.createDirectStream(
            jssc, 
            String.class, 
            String.class,
            StringDecoder.class, 
            StringDecoder.class, 
            kafkaParams, 
            topicsSet
    );
{code}

I notice when i stop/shutdown kafka brokers, my spark application also shutdown.

Here is the spark execution script
{code}
spark-submit \
--master yarn-cluster \
--files /home/siddiquf/spark/log4j-spark.xml
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark.xml" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark.xml" \
--class com.example.MyDataStreamProcessor \
myapp.jar 
{code}

Spark job submitted successfully and i can track the application driver and 
worker/executor nodes.

Everything works fine but only concern if kafka borkers are offline or 
restarted my application controlled by yarn should not shutdown? but it does.

If this is expected behavior then how to handle such situation with least 
maintenance? Keeping in mind Kafka cluster is not in hadoop cluster and managed 
by different team that is why requires our application to be resilient enough.

Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to