Re: How to set YARN_CONF_DIR environment variable on remote client?

2019-12-05 Thread Zhankun Tang
Hi Piper,
Just set HADOOP_CONF_DIR should work. Did you try that?

BR,
Zhankun

On Fri, 6 Dec 2019 at 00:43, Piper Piper  wrote:

> Hello,
>
> I want to run Spark or Flink jobs from a client (remote desktop) onto a
> YARN cluster. Another example will be if I am running a YARN cluster on
> VMs, then I would like to use the host OS as the client to submit Spark
> Jobs to the VM YARN cluster.
>
> What are the easiest ways to set the YARN_CONF_DIR environment variable on
> the client machine so that it can submit Spark jobs to the YARN cluster?
>
> From reading online documents, I believe I am supposed to set the client's
> YARN_CONF_DIR environment variable to $HADOOP_HOME/etc/hadoop or
> $HADOOP_HOME/etc/hadoop/conf. However, I do not understand how do I get the
> value of HADOOP_HOME, do i need to set this value on every machine in the
> cluster, and how my client machine will know how to locate the NameNode in
> the cluster?
>
> Also, does $HADOOP_HOME/etc/hadoop have to be the same on every node in
> the cluster, or is it on a special node, like NameNode or ResourceManager?
>
> I have read there is an easier way by copying the /etc/hadoop contents
> into the client machine, and then setting the client's YARN_CONF_DIR to
> that location. Can someone please explain how to do this? Which node in my
> cluster should I copy the /etc/hadoop contents from? Would this also work
> if my client can only contact the cluster via ssh?
>
> Thanks!
>
> Piper
>
>
>


How to set YARN_CONF_DIR environment variable on remote client?

2019-12-05 Thread Piper Piper
Hello,

I want to run Spark or Flink jobs from a client (remote desktop) onto a
YARN cluster. Another example will be if I am running a YARN cluster on
VMs, then I would like to use the host OS as the client to submit Spark
Jobs to the VM YARN cluster.

What are the easiest ways to set the YARN_CONF_DIR environment variable on
the client machine so that it can submit Spark jobs to the YARN cluster?

>From reading online documents, I believe I am supposed to set the client's
YARN_CONF_DIR environment variable to $HADOOP_HOME/etc/hadoop or
$HADOOP_HOME/etc/hadoop/conf. However, I do not understand how do I get the
value of HADOOP_HOME, do i need to set this value on every machine in the
cluster, and how my client machine will know how to locate the NameNode in
the cluster?

Also, does $HADOOP_HOME/etc/hadoop have to be the same on every node in the
cluster, or is it on a special node, like NameNode or ResourceManager?

I have read there is an easier way by copying the /etc/hadoop contents into
the client machine, and then setting the client's YARN_CONF_DIR to that
location. Can someone please explain how to do this? Which node in my
cluster should I copy the /etc/hadoop contents from? Would this also work
if my client can only contact the cluster via ssh?

Thanks!

Piper