Hello Siva, Thanks for your reply. Actually i'm trying to generate online reports for my clients. For this I want the jobs should be executed faster without putting any job on QUEUE irrespective of the number of jobs different clients are executing from different locations. currently , a job processing 17GB of data takes more than 20mins to execute. also only 6 jobs run simultaneously and the remaining one are in WAITING stage.
Thanks On Wed, Feb 14, 2018 at 4:32 PM, Siva Gudavalli <gudavalli.s...@yahoo.com> wrote: > > Hello Akshay, > > I see there are 6 slaves * with 1 spark Instance each * 5 cores on each > Instance => 30 cores in total > Do you have any other pools confuted ? Running 8 jobs should be triggered > in parallel with the number of cores you have. > > For your long running job, did you have a chance to look at Tasks thats > being triggered. > > I would recommend slow running job to be configured in a separate pool. > > Regards > Shiv > > On Feb 14, 2018, at 5:44 AM, akshay naidu <akshaynaid...@gmail.com> wrote: > > ************************************************************ > ********************************************************** > yarn-site.xml > > > <property> > <name>yarn.scheduler.fair.preemption.cluster- > utilization-threshold</name> > <value>0.8</value> > </property> > > <property> > <name>yarn.scheduler.minimum-allocation-mb</name> > <value>3584</value> > </property> > > <property> > <name>yarn.scheduler.maximum-allocation-mb</name> > <value>10752</value> > </property> > > <property> > <name>yarn.nodemanager.resource.memory-mb</name> > <value>10752</value> > > ************************************************************ > ****************************************************************** > spark-defaults.conf > > spark.master yarn > spark.driver.memory 9g > spark.executor.memory 1024m > spark.yarn.executor.memoryOverhead 1024m > spark.eventLog.enabled true > spark.eventLog.dir hdfs://tech-master:54310/spark-logs > > spark.history.provider org.apache.spark.deploy. > history.FsHistoryProvider > spark.history.fs.logDirectory hdfs://tech-master:54310/spark-logs > spark.history.fs.update.interval 10s > spark.history.ui.port 18080 > > spark.ui.enabled true > spark.ui.port 4040 > spark.ui.killEnabled true > spark.ui.retainedDeadExecutors 100 > > spark.scheduler.mode FAIR > spark.scheduler.allocation.file /usr/local/spark/current/conf/ > fairscheduler.xml > > #spark.submit.deployMode cluster > spark.default.parallelism 30 > > SPARK_WORKER_MEMORY 10g > SPARK_WORKER_INSTANCES 1 > SPARK_WORKER_CORES 5 > > SPARK_DRIVER_MEMORY 9g > SPARK_DRIVER_CORES 5 > > SPARK_MASTER_IP Tech-master > SPARK_MASTER_PORT 7077 > > On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu <akshaynaid...@gmail.com> > wrote: > >> Hello, >> I'm try to run multiple spark jobs on cluster running in yarn. >> Master is 24GB server with 6 Slaves of 12GB >> >> fairscheduler.xml settings are - >> <pool name="default"> >> <schedulingMode>FAIR</schedulingMode> >> <weight>10</weight> >> <minShare>2</minShare> >> </pool> >> >> I am running 8 jobs simultaneously , jobs are running parallelly but not >> all. >> at a time only 7 of then runs simultaneously while the 8th one is in >> queue WAITING for a job to stop. >> >> also, out of the 7 running jobs, 4 runs comparatively much faster than >> remaining three (maybe resources are not distributed properly) . >> >> I want to run n number of jobs at a time and make them run faster , Right >> now, one job is taking more than three minutes while processing a max of >> 1GB data . >> >> Kindly assist me. what am I missing. >> >> Thanks. >> > > >