My guess would be that you are packaging too many things in your job, which
is causing problems with the classpath.  When your jar goes in first, you
get the correct version of protobuf, but some other version of something
else.  When your jar goes in later, other things work, but protobuf
breaks.  This is just a guess though; take a look at what you're packaging
in your jar and look for things that Spark or Kafka could also be using.

On Thu, Feb 26, 2015 at 10:06 AM, necro351 . <necro...@gmail.com> wrote:

> Hello everyone,
>
> We are trying to decode a message inside a Spark job that we receive from
> Kafka. The message is encoded using Proto Buff. The problem is when
> decoding we get class-not-found exceptions. We have tried remedies we found
> online in Stack Exchange and mail list archives but nothing seems to work.
>
> (This question is a re-ask, but we really cannot figure this one out.)
>
> We created a standalone repository with a very simple Spark job that
> exhibits the above issues. The spark job reads the messages from the FS,
> decodes them, and prints them. Its easy to checkout and try to see the
> exception yourself: just uncomment the code that prints the messages from
> within the RDD. The only sources are the generated Proto Buff java sources
> and a small Spark Job that decodes a message. I'd appreciate if anyone
> could take a look.
>
> https://github.com/vibhav/spark-protobuf
>
> We tried a couple remedies already.
>
> Setting "spark.files.userClassPathFirst" didn't fix the problem for us. I
> am not very familiar with the Spark and Scala environment, so please
> correct any incorrect assumptions or statements I make.
>
> However, I don't believe this to be a classpath visibility issue. I wrote
> a small helper method to print out the classpath from both the driver and
> worker, and the output is identical. (I'm printing out
> System.getProperty("java.class.path") -- is there a better way to do this
> or check the class path?). You can print out the class paths the same way
> we are from the example project above.
>
> Furthermore, userClassPathFirst seems to have a detrimental effect on
> otherwise working code, which I cannot explain or do not understand.
>
> For example, I created a trivial RDD as such:
>
>     val l = List(1, 2, 3)
>     sc.makeRDD(l).foreach((x: Int) => {
>         println(x.toString)
>     })
>
> With userClassPathFirst set, I encounter a java.lang.ClassCastException
> trying to execute that code. Is that to be expected? You can re-create this
> issue by commenting out the block of code that tries to print the above in
> the example project we linked to above.
>
> We also tried dynamically adding the jar with .addJar to the Spark Context
> but this seemed to have no effect.
>
> Thanks in advance for any help y'all can provide.
>

Reply via email to