To put it simply, what are the configurations that need to be done on the client machine so that it can run driver on itself and executors on spark-yarn cluster nodes?
On Mon, Apr 22, 2019, 8:22 PM Rishikesh Gawade <rishikeshg1...@gmail.com> wrote: > Hi. > I have been experiencing trouble while trying to connect to a Spark > cluster remotely. This Spark cluster is configured to run using YARN. > Can anyone guide me or provide any step-by-step instructions for > connecting remotely via spark-shell? > Here's the setup that I am using: > The Spark cluster is running with each node as a docker container hosted > on a VM. It is using YARN for scheduling resources for computations. > I have a dedicated docker container acting as a spark client, on which i > have the spark-shell installed(spark binary in standalone setup) and also > the Hadoop and Yarn config directories set so that spark-shell can > coordinate with the RM for resources. > With all of this set, i tried using the following command: > > spark-shell --master yarn --deploy-mode client > > This results in the spark-shell giving me a scala-based console, however, > when I check the Resource Manager UI on the cluster, there seems to be no > application/spark session running. > I have been expecting the driver to be running on the client machine and > the executors running in the cluster. But that doesn't seem to happen. > > How can I achieve this? > Is whatever I am trying feasible, and if so, a good practice? > > Thanks & Regards, > Rishikesh >