When you enable checkpointing by setting the checkpoint directory, you enable metadata checkpointing. Data checkpointing kicks in only if you are using a DStream operation that requires it, or you are enabling Write Ahead Logs to prevent data loss on driver failure.
More discussion - https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-applications-in-production/ On Tue, Jul 7, 2015 at 7:42 AM, abi_pat <present.boiling2...@gmail.com> wrote: > Hi, > > I am using the new experimental Direct Stream API. Everything is working > fine but when it comes to fault tolerance, I am not sure how to achieve it. > Presently my Kafka config map looks like this > > configMap.put("zookeeper.connect","192.168.51.98:2181"); > configMap.put("group.id", UUID.randomUUID().toString()); > configMap.put("auto.offset.reset","smallest"); > configMap.put("auto.commit.enable","true"); > configMap.put("topics","IPDR31"); > configMap.put("kafka.consumer.id","kafkasparkuser"); > configMap.put("bootstrap.servers","192.168.50.124:9092"); > Set<String> topic = new HashSet<String>(); > topic.add("IPDR31"); > > JavaPairInputDStream<byte[], byte[]> kafkaData = > > KafkaUtils.createDirectStream(js,byte[].class,byte[].class,DefaultDecoder.class,DefaultDecoder.class,configMap,topic); > > Questions - > > Q1- Is my Kafka configuration correct or should it be changed? > > Q2- I also looked into the Checkpointing but in my usecase, Data > checkpointing is not required but meta checkpointing is required. Can I > achieve this, i.e. enabling meta checkpointing and not the data > checkpointing? > > > > Thanks > Abhishek Patel > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Kafka-Direct-Streaming-tp23685.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >