Not sure if this has been clearly explained here but since I took a day to track it down…
Several people have experienced a class not found error on Spark when the class referenced is supposed to be in the Spark jars. One thing that can cause this is if you are building Spark for your cluster environment. The instructions say to do a “mvn package …” Instead some of these errors can be fixed using the following procedure: 1) delete ~/.m2/repository/org/spark and your-project 2) build Spark for your version of Hadoop *but do not use "mvn package ...”* use “mvn install …” This will put a copy of the exact bits you need into the maven cache for building your-project against. In my case using hadoop 1.2.1 it was "mvn -Dhadoop.version=1.2.1 -DskipTests clean install” If you run tests on Spark some failures can safely be ignored so check before giving up. 3) build your-project with “mvn clean install"