Re: Spark Streaming not reading missed data

2018-01-16 Thread vijay.bvp
you are creating streaming context each time val streamingContext = new StreamingContext(sparkSession.sparkContext, Seconds(config.getInt(Constants.Properties.BatchInterval))) if you want fault-tolerance, to read from where it stopped between spark job restarts, the correct way is to restore

Re: Spark Streaming not reading missed data

2018-01-16 Thread KhajaAsmath Mohammed
sometimes I get this messages in logs but the job still runs. do you have solution on how to fix this? I have added the code in my earlier email. Exception in thread "pool-22-thread-9" java.lang.NullPointerException at org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(

Re: Spark Streaming not reading missed data

2018-01-16 Thread Jörn Franke
It could be a missing persist before the checkpoint > On 16. Jan 2018, at 22:04, KhajaAsmath Mohammed > wrote: > > Hi, > > Spark streaming job from kafka is not picking the messages and is always > taking the latest offsets when streaming job is stopped for 2 hours.

Spark Streaming not reading missed data

2018-01-16 Thread KhajaAsmath Mohammed
Hi, Spark streaming job from kafka is not picking the messages and is always taking the latest offsets when streaming job is stopped for 2 hours. It is not picking up the offsets that are required to be processed from checkpoint directory. any suggestions on how to process the old messages too