Hi Andras and Attila: Thanks for your advice. I will check the cluster utility when this job runs next time, but I find some warning in oozie.log:
2017-06-05 02:18:18,952 WARN CallableQueueService:523 - SERVER[ 363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] max concurrency for callable [switch] exceeded, requeueing with [500]ms delay 2017-06-05 02:18:38,433 WARN CallableQueueService:523 - SERVER[ 363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] max concurrency for callable [#composite#job.notification] exceeded, requeueing with [500]ms delay Does it mean I should increase oozie.service.CallableQueueService.callable. concurrency? BTW, I am using Oozie 4.2.0. Thanks 2017-06-06 21:04 GMT+08:00 Attila Sasvari <asasv...@cloudera.com>: > Hi Dong Ying, > > Many thanks Andras, these are good ideas. > > In addition, can you confirm that you have enough vcores / memory in your > cluster for containers? > > You can check and try to adjust the following YARN settings: > - yarn.nodemanager.resource.cpu-vcores > - yarn.nodemanager.resource.memory-mb > (look at your yarn-site.xml / yarn-default.xml) > > Also I would also recommend check overall cluster utilization when Oozie > jobs get into PREP state. Are there a lot of running jobs using a lot of > resources (vcores, memory) at the time when your coordinator tries to > submit the job? You can look at resource manager and history server. Hope > this helps. > > Best, > - Attila > > * yarn settings - > https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn- > default.xml > > > > > On Tue, Jun 6, 2017 at 2:26 PM, Andras Piros <andras.pi...@cloudera.com> > wrote: > > > Hi Dong Ying, > > > > do you see any logs having this snippet queue is full within the Oozie > > webapp logs? > > > > What are the values of these parameters: > > > > - > > > > oozie.service.CallableQueueService.queue.size > > > > - > > > > oozie.service.CallableQueueService.threads > > > > - > > > > oozie.service.CallableQueueService.callable.concurrency > > > > > > Regards, > > > > Andras > > > > On Tue, Jun 6, 2017 at 9:04 AM, Dongying Jiao <pineapple...@gmail.com> > > wrote: > > > > > Hi: > > > I have a oozie coordinator job run at 02:00 o'clock everyday, > sometimes, > > > the job can run smoothly, but sometimes, the job is stuck in PREP state > > for > > > a long time. > > > > > > This is my part of my coordinator.xml: > > > <coordinator-app name="CoordinatorForETL" > > > frequency="${coordinatorFrequency}" > > > start="${startTime}" end="${endTime}" timezone="America/New_York" > > > xmlns="uri:oozie:coordinator:0.2"> > > > <controls> > > > <timeout>10</timeout> > > > <concurrency>1</concurrency> > > > </controls> > > > <action> > > > <workflow> > > > ............. > > > This is part of the workflow.xml: > > > ...... > > > <start to="flowDecision"/> > > > <decision name="flowDecision"> > > > <switch> > > > <case to="q1">${workflowType eq "etl" || workflowType eq > > "all"}</case> > > > <case to="prediction">${workflowType eq "prediction"}</case> > > > <case to="errorOnDecision">${workflowType eq "cleaning"}</case> > > > <default to="errorOnDecision"/> > > > </switch> > > > </decision> > > > ....... > > > > > > From my latest run, the job in PREP state for about 30 min. From oozie > > log, > > > the "start" node of the job is done at 02:00, but until 02:32, the > > > "flowDecision" node started to execute. During that period, I can see > > other > > > oozie jobs are running from log, but didn't find any error or exception > > in > > > log. > > > > > > From my understanding, oozie job in PREP state means the job is not > > > submitted to yarn yet, so can't find application id on yarn. > > > I wonder if this relates to oozie queue mechanism or concurrency > control. > > > If yes, do you have experience on how to tune them? > > > > > > Thanks a lot. > > > > > > Best Regards, > > > Dong Ying > > > > > >