I'm consuming data from Kafka with createDirectStream and store the offsets in Kafka ( https://spark.apache.org/docs/2.1.0/streaming-kafka-0-10-integration.html#kafka-itself )
val stream = KafkaUtils.createDirectStream[String, String]( streamingContext, PreferConsistent, Subscribe[String, String](topics, kafkaParams)) My Spark version is 2.0.2 and 0.10 from Kafka. This solution works well and when I restart the spark process starts from the last offset which Spark consumes, but sometimes I need to reprocess all the topic from the beginning. I have seen that I could reset the offset with a kafka script but it's not enable in Kafka 0.10... kafka-consumer-groups --bootstrap-server kafka-host:9092 --group my-group --reset-offsets --to-earliest --all-topics --execute Another possibility it's to set another kafka parameter in the createDirectStream with a map with the offsets but, how could I get first offset from each partition?, I have checked the api from the new consumer and I don't see any method to get these offsets. Any other way?? I could start with another groupId as well, but it doesn't seem a very clean option for production.