Re: [structured-streaming]How to reset Kafka offset in readStream and read from beginning

2018-05-23 Thread Sushil Kotnala
You can use .option( "auto.offset.reset","earliest") while reading from
kafka.
With this, new stream will read from the first offset present for topic .




On Wed, May 23, 2018 at 11:32 AM, karthikjay  wrote:

> Chris,
>
> Thank you for responding. I get it.
>
> But, if I am using a console sink without checkpoint location, I do not see
> any messages in the console in IntellijIDEA IDE. I do not explicitly
> specify
> checkpointLocation in this case. How do I clear the working directory data
> and force Spark to read Kafka messages from the beginning. ?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: [structured-streaming]How to reset Kafka offset in readStream and read from beginning

2018-05-23 Thread karthikjay
Chris,

Thank you for responding. I get it. 

But, if I am using a console sink without checkpoint location, I do not see
any messages in the console in IntellijIDEA IDE. I do not explicitly specify
checkpointLocation in this case. How do I clear the working directory data
and force Spark to read Kafka messages from the beginning. ?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [structured-streaming]How to reset Kafka offset in readStream and read from beginning

2018-05-22 Thread Bowden, Chris
You can delete the write ahead log directory you provided to the sink via the 
“checkpointLocation” option.

From: karthikjay 
Sent: Tuesday, May 22, 2018 7:24:45 AM
To: user@spark.apache.org
Subject: [structured-streaming]How to reset Kafka offset in readStream and read 
from beginning

I have the following readstream in Spark structured streaming reading data
from Kafka

val kafkaStreamingDF = spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "...")
  .option("subscribe", "testtopic")
  .option("failOnDataLoss", "false")
  .option("startingOffsets","earliest")
  .load()
  .selectExpr("CAST(value as STRING)", "CAST(topic as STRING)")

As far as I know, every time I start the job, underneath the covers, Spark
created new consumer, new consumer group and retrieves the last successful
offset for the job(using the job name ?) and seeks to that offset and start
reading from there. Is that the case ? If yes, how do I reset the offset to
start and force my job to read from beginning ?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[structured-streaming]How to reset Kafka offset in readStream and read from beginning

2018-05-22 Thread karthikjay
I have the following readstream in Spark structured streaming reading data
from Kafka

val kafkaStreamingDF = spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "...")
  .option("subscribe", "testtopic")
  .option("failOnDataLoss", "false")
  .option("startingOffsets","earliest")
  .load()
  .selectExpr("CAST(value as STRING)", "CAST(topic as STRING)")

As far as I know, every time I start the job, underneath the covers, Spark
created new consumer, new consumer group and retrieves the last successful
offset for the job(using the job name ?) and seeks to that offset and start
reading from there. Is that the case ? If yes, how do I reset the offset to
start and force my job to read from beginning ? 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org