I have the following readstream in Spark structured streaming reading data
from Kafka

val kafkaStreamingDF = spark
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", "...")
      .option("subscribe", "testtopic")
      .option("failOnDataLoss", "false")
      .option("startingOffsets","earliest")
      .load()
      .selectExpr("CAST(value as STRING)", "CAST(topic as STRING)")

As far as I know, every time I start the job, underneath the covers, Spark
created new consumer, new consumer group and retrieves the last successful
offset for the job(using the job name ?) and seeks to that offset and start
reading from there. Is that the case ? If yes, how do I reset the offset to
start and force my job to read from beginning ? 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to