I’ve been running into a strange class not found problem, but only when my job has more than one phase. I have an RDD[ProtobufClass] which behaves as expected in a single-stage job (e.g. serialize to JSON and export). But when I try to groupByKey, the first stage runs (essentially a keyBy), but eventually errors with the relatively common ‘unable to find protocol buffer class’ error for the first task of the second stage. I’ve tried the userClassPathFirst options, but then the whole job fails. So I’m wondering if there is some kind of configuration I can use to help Spark resolve the right protocol buffer class across stage boundaries?
-John --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org