Hello everyone, We are trying to decode a message inside a Spark job that we receive from Kafka. The message is encoded using Proto Buff. The problem is when decoding we get class-not-found exceptions. We have tried remedies we found online in Stack Exchange and mail list archives but nothing seems to work.
(This question is a re-ask, but we really cannot figure this one out.) We created a standalone repository with a very simple Spark job that exhibits the above issues. The spark job reads the messages from the FS, decodes them, and prints them. Its easy to checkout and try to see the exception yourself: just uncomment the code that prints the messages from within the RDD. The only sources are the generated Proto Buff java sources and a small Spark Job that decodes a message. I'd appreciate if anyone could take a look. https://github.com/vibhav/spark-protobuf We tried a couple remedies already. Setting "spark.files.userClassPathFirst" didn't fix the problem for us. I am not very familiar with the Spark and Scala environment, so please correct any incorrect assumptions or statements I make. However, I don't believe this to be a classpath visibility issue. I wrote a small helper method to print out the classpath from both the driver and worker, and the output is identical. (I'm printing out System.getProperty("java.class.path") -- is there a better way to do this or check the class path?). You can print out the class paths the same way we are from the example project above. Furthermore, userClassPathFirst seems to have a detrimental effect on otherwise working code, which I cannot explain or do not understand. For example, I created a trivial RDD as such: val l = List(1, 2, 3) sc.makeRDD(l).foreach((x: Int) => { println(x.toString) }) With userClassPathFirst set, I encounter a java.lang.ClassCastException trying to execute that code. Is that to be expected? You can re-create this issue by commenting out the block of code that tries to print the above in the example project we linked to above. We also tried dynamically adding the jar with .addJar to the Spark Context but this seemed to have no effect. Thanks in advance for any help y'all can provide.