Re: [structured-streaming]How to reset Kafka offset in readStream and read from beginning
You can use .option( "auto.offset.reset","earliest") while reading from kafka. With this, new stream will read from the first offset present for topic . On Wed, May 23, 2018 at 11:32 AM, karthikjaywrote: > Chris, > > Thank you for responding. I get it. > > But, if I am using a console sink without checkpoint location, I do not see > any messages in the console in IntellijIDEA IDE. I do not explicitly > specify > checkpointLocation in this case. How do I clear the working directory data > and force Spark to read Kafka messages from the beginning. ? > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: [structured-streaming]How to reset Kafka offset in readStream and read from beginning
Chris, Thank you for responding. I get it. But, if I am using a console sink without checkpoint location, I do not see any messages in the console in IntellijIDEA IDE. I do not explicitly specify checkpointLocation in this case. How do I clear the working directory data and force Spark to read Kafka messages from the beginning. ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: [structured-streaming]How to reset Kafka offset in readStream and read from beginning
You can delete the write ahead log directory you provided to the sink via the “checkpointLocation” option. From: karthikjaySent: Tuesday, May 22, 2018 7:24:45 AM To: user@spark.apache.org Subject: [structured-streaming]How to reset Kafka offset in readStream and read from beginning I have the following readstream in Spark structured streaming reading data from Kafka val kafkaStreamingDF = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "...") .option("subscribe", "testtopic") .option("failOnDataLoss", "false") .option("startingOffsets","earliest") .load() .selectExpr("CAST(value as STRING)", "CAST(topic as STRING)") As far as I know, every time I start the job, underneath the covers, Spark created new consumer, new consumer group and retrieves the last successful offset for the job(using the job name ?) and seeks to that offset and start reading from there. Is that the case ? If yes, how do I reset the offset to start and force my job to read from beginning ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
[structured-streaming]How to reset Kafka offset in readStream and read from beginning
I have the following readstream in Spark structured streaming reading data from Kafka val kafkaStreamingDF = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "...") .option("subscribe", "testtopic") .option("failOnDataLoss", "false") .option("startingOffsets","earliest") .load() .selectExpr("CAST(value as STRING)", "CAST(topic as STRING)") As far as I know, every time I start the job, underneath the covers, Spark created new consumer, new consumer group and retrieves the last successful offset for the job(using the job name ?) and seeks to that offset and start reading from there. Is that the case ? If yes, how do I reset the offset to start and force my job to read from beginning ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org