Re: Run Multiple Spark jobs. Reduce Execution time.
a small hint would be very helpful . On Wed, Feb 14, 2018 at 5:17 PM, akshay naiduwrote: > Hello Siva, > Thanks for your reply. > > Actually i'm trying to generate online reports for my clients. For this I > want the jobs should be executed faster without putting any job on QUEUE > irrespective of the number of jobs different clients are executing from > different locations. > currently , a job processing 17GB of data takes more than 20mins to > execute. also only 6 jobs run simultaneously and the remaining one are in > WAITING stage. > > Thanks > > On Wed, Feb 14, 2018 at 4:32 PM, Siva Gudavalli > wrote: > >> >> Hello Akshay, >> >> I see there are 6 slaves * with 1 spark Instance each * 5 cores on each >> Instance => 30 cores in total >> Do you have any other pools confuted ? Running 8 jobs should be triggered >> in parallel with the number of cores you have. >> >> For your long running job, did you have a chance to look at Tasks thats >> being triggered. >> >> I would recommend slow running job to be configured in a separate pool. >> >> Regards >> Shiv >> >> On Feb 14, 2018, at 5:44 AM, akshay naidu >> wrote: >> >> >> ** >> yarn-site.xml >> >> >> >> yarn.scheduler.fair.preemption.cluster-utilization- >> threshold >> 0.8 >> >> >> >> yarn.scheduler.minimum-allocation-mb >> 3584 >> >> >> >> yarn.scheduler.maximum-allocation-mb >> 10752 >> >> >> >> yarn.nodemanager.resource.memory-mb >> 10752 >> >> >> ** >> spark-defaults.conf >> >> spark.master yarn >> spark.driver.memory9g >> spark.executor.memory 1024m >> spark.yarn.executor.memoryOverhead 1024m >> spark.eventLog.enabled true >> spark.eventLog.dir hdfs://tech-master:54310/spark-logs >> >> spark.history.providerorg.apache.spark.deploy.histor >> y.FsHistoryProvider >> spark.history.fs.logDirectory hdfs://tech-master:54310/spark-logs >> spark.history.fs.update.interval 10s >> spark.history.ui.port 18080 >> >> spark.ui.enabledtrue >> spark.ui.port 4040 >> spark.ui.killEnabledtrue >> spark.ui.retainedDeadExecutors 100 >> >> spark.scheduler.modeFAIR >> spark.scheduler.allocation.file /usr/local/spark/current/conf/ >> fairscheduler.xml >> >> #spark.submit.deployMode cluster >> spark.default.parallelism30 >> >> SPARK_WORKER_MEMORY 10g >> SPARK_WORKER_INSTANCES 1 >> SPARK_WORKER_CORES 5 >> >> SPARK_DRIVER_MEMORY 9g >> SPARK_DRIVER_CORES 5 >> >> SPARK_MASTER_IP Tech-master >> SPARK_MASTER_PORT 7077 >> >> On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu >> wrote: >> >>> Hello, >>> I'm try to run multiple spark jobs on cluster running in yarn. >>> Master is 24GB server with 6 Slaves of 12GB >>> >>> fairscheduler.xml settings are - >>> >>> FAIR >>> 10 >>> 2 >>> >>> >>> I am running 8 jobs simultaneously , jobs are running parallelly but not >>> all. >>> at a time only 7 of then runs simultaneously while the 8th one is in >>> queue WAITING for a job to stop. >>> >>> also, out of the 7 running jobs, 4 runs comparatively much faster than >>> remaining three (maybe resources are not distributed properly) . >>> >>> I want to run n number of jobs at a time and make them run faster , >>> Right now, one job is taking more than three minutes while processing a max >>> of 1GB data . >>> >>> Kindly assist me. what am I missing. >>> >>> Thanks. >>> >> >> >> >
Re: Run Multiple Spark jobs. Reduce Execution time.
Hello Siva, Thanks for your reply. Actually i'm trying to generate online reports for my clients. For this I want the jobs should be executed faster without putting any job on QUEUE irrespective of the number of jobs different clients are executing from different locations. currently , a job processing 17GB of data takes more than 20mins to execute. also only 6 jobs run simultaneously and the remaining one are in WAITING stage. Thanks On Wed, Feb 14, 2018 at 4:32 PM, Siva Gudavalliwrote: > > Hello Akshay, > > I see there are 6 slaves * with 1 spark Instance each * 5 cores on each > Instance => 30 cores in total > Do you have any other pools confuted ? Running 8 jobs should be triggered > in parallel with the number of cores you have. > > For your long running job, did you have a chance to look at Tasks thats > being triggered. > > I would recommend slow running job to be configured in a separate pool. > > Regards > Shiv > > On Feb 14, 2018, at 5:44 AM, akshay naidu wrote: > > > ** > yarn-site.xml > > > > yarn.scheduler.fair.preemption.cluster- > utilization-threshold > 0.8 > > > > yarn.scheduler.minimum-allocation-mb > 3584 > > > > yarn.scheduler.maximum-allocation-mb > 10752 > > > > yarn.nodemanager.resource.memory-mb > 10752 > > > ** > spark-defaults.conf > > spark.master yarn > spark.driver.memory9g > spark.executor.memory 1024m > spark.yarn.executor.memoryOverhead 1024m > spark.eventLog.enabled true > spark.eventLog.dir hdfs://tech-master:54310/spark-logs > > spark.history.providerorg.apache.spark.deploy. > history.FsHistoryProvider > spark.history.fs.logDirectory hdfs://tech-master:54310/spark-logs > spark.history.fs.update.interval 10s > spark.history.ui.port 18080 > > spark.ui.enabledtrue > spark.ui.port 4040 > spark.ui.killEnabledtrue > spark.ui.retainedDeadExecutors 100 > > spark.scheduler.modeFAIR > spark.scheduler.allocation.file /usr/local/spark/current/conf/ > fairscheduler.xml > > #spark.submit.deployMode cluster > spark.default.parallelism30 > > SPARK_WORKER_MEMORY 10g > SPARK_WORKER_INSTANCES 1 > SPARK_WORKER_CORES 5 > > SPARK_DRIVER_MEMORY 9g > SPARK_DRIVER_CORES 5 > > SPARK_MASTER_IP Tech-master > SPARK_MASTER_PORT 7077 > > On Tue, Feb 13, 2018 at 4:43 PM, akshay naidu > wrote: > >> Hello, >> I'm try to run multiple spark jobs on cluster running in yarn. >> Master is 24GB server with 6 Slaves of 12GB >> >> fairscheduler.xml settings are - >> >> FAIR >> 10 >> 2 >> >> >> I am running 8 jobs simultaneously , jobs are running parallelly but not >> all. >> at a time only 7 of then runs simultaneously while the 8th one is in >> queue WAITING for a job to stop. >> >> also, out of the 7 running jobs, 4 runs comparatively much faster than >> remaining three (maybe resources are not distributed properly) . >> >> I want to run n number of jobs at a time and make them run faster , Right >> now, one job is taking more than three minutes while processing a max of >> 1GB data . >> >> Kindly assist me. what am I missing. >> >> Thanks. >> > > >
Re: Run Multiple Spark jobs. Reduce Execution time.
** yarn-site.xml yarn.scheduler.fair.preemption.cluster-utilization-threshold 0.8 yarn.scheduler.minimum-allocation-mb 3584 yarn.scheduler.maximum-allocation-mb 10752 yarn.nodemanager.resource.memory-mb 10752 ** spark-defaults.conf spark.master yarn spark.driver.memory9g spark.executor.memory 1024m spark.yarn.executor.memoryOverhead 1024m spark.eventLog.enabled true spark.eventLog.dir hdfs://tech-master:54310/spark-logs spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider spark.history.fs.logDirectory hdfs://tech-master:54310/spark-logs spark.history.fs.update.interval 10s spark.history.ui.port 18080 spark.ui.enabledtrue spark.ui.port 4040 spark.ui.killEnabledtrue spark.ui.retainedDeadExecutors 100 spark.scheduler.modeFAIR spark.scheduler.allocation.file /usr/local/spark/current/conf/fairscheduler.xml #spark.submit.deployMode cluster spark.default.parallelism30 SPARK_WORKER_MEMORY 10g SPARK_WORKER_INSTANCES 1 SPARK_WORKER_CORES 5 SPARK_DRIVER_MEMORY 9g SPARK_DRIVER_CORES 5 SPARK_MASTER_IP Tech-master SPARK_MASTER_PORT 7077 On Tue, Feb 13, 2018 at 4:43 PM, akshay naiduwrote: > Hello, > I'm try to run multiple spark jobs on cluster running in yarn. > Master is 24GB server with 6 Slaves of 12GB > > fairscheduler.xml settings are - > > FAIR > 10 > 2 > > > I am running 8 jobs simultaneously , jobs are running parallelly but not > all. > at a time only 7 of then runs simultaneously while the 8th one is in queue > WAITING for a job to stop. > > also, out of the 7 running jobs, 4 runs comparatively much faster than > remaining three (maybe resources are not distributed properly) . > > I want to run n number of jobs at a time and make them run faster , Right > now, one job is taking more than three minutes while processing a max of > 1GB data . > > Kindly assist me. what am I missing. > > Thanks. >
Re: Run Multiple Spark jobs. Reduce Execution time.
On Tue, Feb 13, 2018 at 4:43 PM, akshay naiduwrote: > Hello, > I'm try to run multiple spark jobs on cluster running in yarn. > Master is 24GB server with 6 Slaves of 12GB > > fairscheduler.xml settings are - > > FAIR > 10 > 2 > > > I am running 8 jobs simultaneously , jobs are running parallelly but not > all. > at a time only 7 of then runs simultaneously while the 8th one is in queue > WAITING for a job to stop. > > also, out of the 7 running jobs, 4 runs comparatively much faster than > remaining three (maybe resources are not distributed properly) . > > I want to run n number of jobs at a time and make them run faster , Right > now, one job is taking more than three minutes while processing a max of > 1GB data . > > Kindly assist me. what am I missing. > > Thanks. >