RE: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

Andrew Lee Tue, 25 Mar 2014 16:56:35 -0700

Hi Paul,
I got it sorted out.
The problem is that the JARs are built into the assembly JARs when you run
sbt/sbt clean assembly
What I did is:sbt/sbt clean package
This will only give you the small JARs. The next steps is to update the 
CLASSPATH in the bin/compute-classpath.sh script manually, appending all the 
JARs.
With :
sbt/sbt assembly
We can't introduce our own Hadoop patch since it will always pull from Maven 
repo, unless we hijack the repository path, or do a 'mvn install' locally. This 
is more of a hack I think.

Date: Tue, 25 Mar 2014 15:23:08 -0700
Subject: Re: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar 
files?
From: paulmscho...@gmail.com
To: user@spark.apache.org

Andrew, 
I ran into the same problem and eventually settled on just running the jars 
directly with java. Since we use sbt to build our jars we had all the 
dependancies builtin to the jar it self so need for random class paths. 

On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee <alee...@hotmail.com> wrote:

Hi All,
I'm getting the following error when I execute start-master.sh which also 
invokes spark-class at the end.

Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/

You need to build Spark with 'sbt/sbt assembly' before running this program.

After digging into the code, I see the CLASSPATH is hardcoded with 
"spark-assembly.*hadoop.*.jar".

In bin/spark-class :

if [ ! -f "$FWDIR/RELEASE" ]; then
  # Exit if the user hasn't compiled Spark
  num_jars=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep 
"spark-assembly.*hadoop.*.jar" | wc -l)

  jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep 
"spark-assembly.*hadoop.*.jar")
  if [ "$num_jars" -eq "0" ]; then

    echo "Failed to find Spark assembly in 
$FWDIR/assembly/target/scala-$SCALA_VERSION/" >&2
    echo "You need to build Spark with 'sbt/sbt assembly' before running this 
program." >&2

    exit 1
  fi
  if [ "$num_jars" -gt "1" ]; then
    echo "Found multiple Spark assembly jars in 
$FWDIR/assembly/target/scala-$SCALA_VERSION:" >&2

    echo "$jars_list"
    echo "Please remove all but one jar."
    exit 1
  fi

fi

Is there any reason why this is only grabbing spark-assembly.*hadoop.*.jar ? I 
am trying to run Spark that links to my own version of Hadoop under 
/opt/hadoop23/, 

and I use 'sbt/sbt clean package' to build the package without the Hadoop jar. 
What is the correct way to link to my own Hadoop jar?

RE: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

Reply via email to