When one runs in Local mode (one JVM) on an edge host (the host user
accesses the cluster), it is possible to put additional jar file say
accessing Oracle RDBMS tables in $SPARK_CLASSPATH. This works

export SPARK_CLASSPATH=~/user_jars/ojdbc6.jar

Normally a group of users can have read access to a shared directory like
above and once they log in their shell will invoke an environment file that
will have the above classpath plus additional parameters like $JAVA_HOME
etc are set up for them.

However, if the user chooses to run spark through spark-submit with yarn,
then the only way this will work in my research is to add the jar file as
follows on every node of Spark cluster

in $SPARK_HOME/conf/spark-defaults.conf

Add the jar path to the following:

spark.executor.extraClassPath   /user_jars/ojdbc6.jar

Note that setting both spark.executor.extraClassPath and SPARK_CLASSPATH
will cause initialisation error

ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Found both spark.executor.extraClassPath
and SPARK_CLASSPATH. Use only the former.

I was wondering if there are other ways of making this work in YARN mode,
where every node of cluster will require this JAR file?

Thanks

Reply via email to