The problem got resolved after removing all the configuration files from
all the slave nodes. Earlier we were running in the standalone mode and
that lead to duplicating the configuration on all the slaves. Once that was
done it ran as expected in cluster mode. Although performance is not up to
the standalone mode.

However, as compared to the standalone mode, spark on yarn runs very slow.

I am running it as

$SPARK_HOME/bin/spark-submit --class "EDDApp" --master yarn-cluster
--num-executors 10 --executor-memory 14g
 target/scala-2.10/edd-application_2.10-1.0.jar
 hdfs://hm41:9000/user/hduser/newtrans.csv
 hdfs://hm41:9000/user/hduser/trans-out

We have a cluster of 5 nodes with each having 16GB RAM and 8 cores each. We
have configured the minimum container size as 3GB and maximum as 14GB in
yarn-site.xml. When submitting the job to yarn-cluster we supply number of
executor = 10, memory of executor =14 GB. According to my understanding our
job should be allocated 4 container of 14GB. But the spark UI shows only 3
container of 7.2GB each.

We are unable to ensure the container number and resources allocated to it.
This causes detrimental performance when compared to the standalone mode.




Regards,
Kundan

On Thu, Feb 5, 2015 at 12:49 PM, Felix C <felixcheun...@hotmail.com> wrote:

>  Is YARN_CONF_DIR set?
>
> --- Original Message ---
>
> From: "Aniket Bhatnagar" <aniket.bhatna...@gmail.com>
> Sent: February 4, 2015 6:16 AM
> To: "kundan kumar" <iitr.kun...@gmail.com>, "spark users" <
> user@spark.apache.org>
> Subject: Re: Spark Job running on localhost on yarn cluster
>
>  Have you set master in SparkConf/SparkContext in your code? Driver logs
> show in which mode the spark job is running. Double check if the logs
> mention local or yarn-cluster.
> Also, what's the error that you are getting?
>
> On Wed, Feb 4, 2015, 6:13 PM kundan kumar <iitr.kun...@gmail.com> wrote:
>
> Hi,
>
>  I am trying to execute my code on a yarn cluster
>
>  The command which I am using is
>
>  $SPARK_HOME/bin/spark-submit --class "EDDApp"
> target/scala-2.10/edd-application_2.10-1.0.jar --master yarn-cluster
> --num-executors 3 --driver-memory 6g --executor-memory 7g <outpuPath>
>
>  But, I can see that this program is running only on the localhost.
>
>  Its able to read the file from hdfs.
>
>  I have tried this in standalone mode and it works fine.
>
>  Please suggest where is it going wrong.
>
>
>  Regards,
> Kundan
>
>

Reply via email to