There is a usability concern I have with the current way of specifying
--jars. Imagine a use case like hbase where a lot of jobs need it in its
classpath. This needs to be set every time. If we use
spark.executor.extraClassPath,
then we just need to set it once But there is no programmatic way to set
this value, like picking up from an environment variable or by running a
script that generates classpath.  You need to hard code the jars in
spark-defaults.conf.

Also, I would like to know if there is a localization overhead when we use
spark.executor.extraClassPath. Again, in the case of hbase, these jars
would be typically available on all nodes. So there is no need to localize
them from the node where job was submitted. I am wondering if we use the
SPARK_CLASSPATH approach, then it would not do localization. That would be
an added benefit.
Please clarify.




--
Kannan

On Thu, Feb 26, 2015 at 4:15 PM, Marcelo Vanzin <van...@cloudera.com> wrote:

> SPARK_CLASSPATH is definitely deprecated, but my understanding is that
> spark.executor.extraClassPath is not, so maybe the documentation needs
> fixing.
>
> I'll let someone who might know otherwise comment, though.
>
> On Thu, Feb 26, 2015 at 2:43 PM, Kannan Rajah <kra...@maprtech.com> wrote:
> > SparkConf.scala logs a warning saying SPARK_CLASSPATH is deprecated and
> we
> > should use spark.executor.extraClassPath instead. But the online
> > documentation states that spark.executor.extraClassPath is only meant for
> > backward compatibility.
> >
> >
> https://spark.apache.org/docs/1.2.0/configuration.html#execution-behavior
> >
> > Which one is right? I have a use case to submit a hbase job from
> spark-shell
> > and make it run using YARN. In this case, I need to somehow add the hbase
> > jars to the classpath of the executor. If I add it to SPARK_CLASSPATH and
> > export it it works fine. Alternatively, if I set the
> > spark.executor.extraClassPath in spark-defaults.conf, it works fine. But
> the
> > reason I don't like spark-defaults.conf is that I need to hard code it
> > instead of relying on a script to generate the classpath. I can use a
> script
> > in spark-env.sh and set SPARK_CLASSPATH.
> >
> > Given that compute-classpath uses SPARK_CLASSPATH variable, why is it
> marked
> > as deprecated?
> >
> > --
> > Kannan
>
>
>
> --
> Marcelo
>

Reply via email to