I’ve been running into a strange class not found problem, but only when my job 
has more than one phase.  I have an RDD[ProtobufClass] which behaves as 
expected in a single-stage job (e.g. serialize to JSON and export).  But when I 
try to groupByKey, the first stage runs (essentially a keyBy), but eventually 
errors with the relatively common ‘unable to find protocol buffer class’ error 
for the first task of the second stage.  I’ve tried the userClassPathFirst 
options, but then the whole job fails.  So I’m wondering if there is some kind 
of configuration I can use to help Spark resolve the right protocol buffer 
class across stage boundaries?

-John
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to