[structured-streaming]How to reset Kafka offset in readStream and read from beginning

karthikjay Tue, 22 May 2018 07:25:01 -0700

I have the following readstream in Spark structured streaming reading data
from Kafka


val kafkaStreamingDF = spark
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", "...")
      .option("subscribe", "testtopic")
      .option("failOnDataLoss", "false")
      .option("startingOffsets","earliest")
      .load()
      .selectExpr("CAST(value as STRING)", "CAST(topic as STRING)")

As far as I know, every time I start the job, underneath the covers, Spark
created new consumer, new consumer group and retrieves the last successful
offset for the job(using the job name ?) and seeks to that offset and start
reading from there. Is that the case ? If yes, how do I reset the offset to
start and force my job to read from beginning ? 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[structured-streaming]How to reset Kafka offset in readStream and read from beginning

Reply via email to