Hi,

I have my application jar sitting in HDFS which defines long-running Spark
Streaming job and I am using checkpoint dir also in HDFS. Every time I have
any changes to the job, I go delete that jar and upload a new one.

Now if I upload a new jar and delete checkpoint directory it works fine.
But if I don't delete the checkpoint directory, I get an error like:

imestamp="2016-05-13T18:49:47,887+0000",level="WARN",threadName="main",logger="org.apache.spark.streaming.CheckpointReader",message="Error
reading checkpoint from file
hdfs://myCheckpoints/application-1/checkpoint-1463165355000",exception=*"java.io.InvalidClassException:
some.package.defined.here.ConcreteClass; local class incompatible: stream
classdesc serialVersionUID = -7808345595732501156, local class
serialVersionUID = 1574855058137843618*

I have changed the 'ConcreteClass' from my last implementation and that's
what's causing the issue.

I have 2 main questions:

   1. *How to fix this?* I know adding private static long serialVersionUID
   = 1113799434508676095L; might fix it, but I don't want to add this to
   all classes, since any class can change between current and next version.
   Anything better?
   2. *What all does checkpoint directory store?* Does it store all classes
   from previous jar? Or just their name and serialVersionUID? This
   
<https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing>
doesn't
   give much detail on internals of checkpointing. It's size is only ~ 6Mb.

Appreciate any help.

Just asked this question on:
https://stackoverflow.com/questions/37217738/spark-job-fails-when-using-checkpointing-if-a-class-change-in-the-job

Thanks,

KP

Reply via email to