Hi Paul, I got it sorted out. The problem is that the JARs are built into the assembly JARs when you run sbt/sbt clean assembly What I did is:sbt/sbt clean package This will only give you the small JARs. The next steps is to update the CLASSPATH in the bin/compute-classpath.sh script manually, appending all the JARs. With : sbt/sbt assembly We can't introduce our own Hadoop patch since it will always pull from Maven repo, unless we hijack the repository path, or do a 'mvn install' locally. This is more of a hack I think.
Date: Tue, 25 Mar 2014 15:23:08 -0700 Subject: Re: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files? From: paulmscho...@gmail.com To: user@spark.apache.org Andrew, I ran into the same problem and eventually settled on just running the jars directly with java. Since we use sbt to build our jars we had all the dependancies builtin to the jar it self so need for random class paths. On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee <alee...@hotmail.com> wrote: Hi All, I'm getting the following error when I execute start-master.sh which also invokes spark-class at the end. Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/ You need to build Spark with 'sbt/sbt assembly' before running this program. After digging into the code, I see the CLASSPATH is hardcoded with "spark-assembly.*hadoop.*.jar". In bin/spark-class : if [ ! -f "$FWDIR/RELEASE" ]; then # Exit if the user hasn't compiled Spark num_jars=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar" | wc -l) jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar") if [ "$num_jars" -eq "0" ]; then echo "Failed to find Spark assembly in $FWDIR/assembly/target/scala-$SCALA_VERSION/" >&2 echo "You need to build Spark with 'sbt/sbt assembly' before running this program." >&2 exit 1 fi if [ "$num_jars" -gt "1" ]; then echo "Found multiple Spark assembly jars in $FWDIR/assembly/target/scala-$SCALA_VERSION:" >&2 echo "$jars_list" echo "Please remove all but one jar." exit 1 fi fi Is there any reason why this is only grabbing spark-assembly.*hadoop.*.jar ? I am trying to run Spark that links to my own version of Hadoop under /opt/hadoop23/, and I use 'sbt/sbt clean package' to build the package without the Hadoop jar. What is the correct way to link to my own Hadoop jar?