Not sure if this has been clearly explained here but since I took a day to 
track it down…

Several people have experienced a class not found error on Spark when the class 
referenced is supposed to be in the Spark jars.

One thing that can cause this is if you are building Spark for your cluster 
environment. The instructions say to do a “mvn package …” Instead some of these 
errors can be fixed using the following procedure:

1) delete ~/.m2/repository/org/spark and your-project
2) build Spark for your version of Hadoop *but do not use "mvn package ...”* 
use “mvn install …” This will put a copy of the exact bits you need into the 
maven cache for building your-project against. In my case using hadoop 1.2.1 it 
was "mvn -Dhadoop.version=1.2.1 -DskipTests clean install” If you run tests on 
Spark some failures can safely be ignored so check before giving up. 
3) build your-project with “mvn clean install"

Reply via email to