Hello everyone,

We are trying to decode a message inside a Spark job that we receive from
Kafka. The message is encoded using Proto Buff. The problem is when
decoding we get class-not-found exceptions. We have tried remedies we found
online in Stack Exchange and mail list archives but nothing seems to work.

(This question is a re-ask, but we really cannot figure this one out.)

We created a standalone repository with a very simple Spark job that
exhibits the above issues. The spark job reads the messages from the FS,
decodes them, and prints them. Its easy to checkout and try to see the
exception yourself: just uncomment the code that prints the messages from
within the RDD. The only sources are the generated Proto Buff java sources
and a small Spark Job that decodes a message. I'd appreciate if anyone
could take a look.

https://github.com/vibhav/spark-protobuf

We tried a couple remedies already.

Setting "spark.files.userClassPathFirst" didn't fix the problem for us. I
am not very familiar with the Spark and Scala environment, so please
correct any incorrect assumptions or statements I make.

However, I don't believe this to be a classpath visibility issue. I wrote a
small helper method to print out the classpath from both the driver and
worker, and the output is identical. (I'm printing out
System.getProperty("java.class.path") -- is there a better way to do this
or check the class path?). You can print out the class paths the same way
we are from the example project above.

Furthermore, userClassPathFirst seems to have a detrimental effect on
otherwise working code, which I cannot explain or do not understand.

For example, I created a trivial RDD as such:

    val l = List(1, 2, 3)
    sc.makeRDD(l).foreach((x: Int) => {
        println(x.toString)
    })

With userClassPathFirst set, I encounter a java.lang.ClassCastException
trying to execute that code. Is that to be expected? You can re-create this
issue by commenting out the block of code that tries to print the above in
the example project we linked to above.

We also tried dynamically adding the jar with .addJar to the Spark Context
but this seemed to have no effect.

Thanks in advance for any help y'all can provide.

Reply via email to