a small hint would be very helpful . On Wed, Feb 14, 2018 at 5:17 PM, akshay naidu <akshaynaid...@gmail.com> wrote:
> Hello Siva, > Thanks for your reply. > > Actually i'm trying to generate online reports for my clients. For this I > want the jobs should be executed faster without putting any job on QUEUE > irrespective of the number of jobs different clients are executing from > different locations. > currently , a job processing 17GB of data takes more than 20mins to > execute. also only 6 jobs run simultaneously and the remaining one are in > WAITING stage. > > Thanks > > On Wed, Feb 14, 2018 at 4:32 PM, Siva Gudavalli <gudavalli.s...@yahoo.com> > wrote: > >> >> Hello Akshay, >> >> I see there are 6 slaves * with 1 spark Instance each * 5 cores on each >> Instance => 30 cores in total >> Do you have any other pools confuted ? Running 8 jobs should be triggered >> in parallel with the number of cores you have. >> >> For your long running job, did you have a chance to look at Tasks thats >> being triggered. >> >> I would recommend slow running job to be configured in a separate pool. >> >> Regards >> Shiv >> >> On Feb 14, 2018, at 5:44 AM, akshay naidu <akshaynaid...@gmail.com> >> wrote: >> >> ************************************************************ >> ********************************************************** >> yarn-site.xml >> >> >> <property> >> <name>yarn.scheduler.fair.preemption.cluster-utilization- >> threshold</name> >> <value>0.8</value> >> </property> >> >> <property> >> <name>yarn.scheduler.minimum-allocation-mb</name> >> <value>3584</value> >> </property> >> >> <property> >> <name>yarn.scheduler.maximum-allocation-mb</name> >> <value>10752</value> >> </property> >> >> <property> >> <name>yarn.nodemanager.resource.memory-mb</name> >> <value>10752</value> >> >> ************************************************************ >> ****************************************************************** >> spark-defaults.conf >> >> spark.master yarn >> spark.driver.memory 9g >> spark.executor.memory 1024m >> spark.yarn.executor.memoryOverhead 1024m >> spark.eventLog.enabled true >> spark.eventLog.dir hdfs://tech-master:54310/spark-logs >> >> spark.history.provider org.apache.spark.deploy.histor >> y.FsHistoryProvider >> spark.history.fs.logDirectory hdfs://tech-master:54310/spark-logs >> spark.history.fs.update.interval 10s >> spark.history.ui.port 18080 >> >> spark.ui.enabled true >> spark.ui.port 4040 >> spark.ui.killEnabled true >> spark.ui.retainedDeadExecutors 100 >> >> spark.scheduler.mode FAIR >> spark.scheduler.allocation.file /usr/local/spark/current/conf/ >> fairscheduler.xml >> >> #spark.submit.deployMode cluster >> spark.default.parallelism 30 >> >> SPARK_WORKER_MEMORY 10g >> SPARK_WORKER_INSTANCES 1 >> SPARK_WORKER_CORES 5 >> >> SPARK_DRIVER_MEMORY 9g >> SPARK_DRIVER_CORES 5 >> >> SPARK_MASTER_IP Tech-master >> SPARK_MASTER_PORT 7077 >> >> On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu <akshaynaid...@gmail.com> >> wrote: >> >>> Hello, >>> I'm try to run multiple spark jobs on cluster running in yarn. >>> Master is 24GB server with 6 Slaves of 12GB >>> >>> fairscheduler.xml settings are - >>> <pool name="default"> >>> <schedulingMode>FAIR</schedulingMode> >>> <weight>10</weight> >>> <minShare>2</minShare> >>> </pool> >>> >>> I am running 8 jobs simultaneously , jobs are running parallelly but not >>> all. >>> at a time only 7 of then runs simultaneously while the 8th one is in >>> queue WAITING for a job to stop. >>> >>> also, out of the 7 running jobs, 4 runs comparatively much faster than >>> remaining three (maybe resources are not distributed properly) . >>> >>> I want to run n number of jobs at a time and make them run faster , >>> Right now, one job is taking more than three minutes while processing a max >>> of 1GB data . >>> >>> Kindly assist me. what am I missing. >>> >>> Thanks. >>> >> >> >> >