[ 
https://issues.apache.org/jira/browse/OOZIE-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228488#comment-17228488
 ] 

Frank Top Frank commented on OOZIE-3581:
----------------------------------------

Hello, I also ran into the same problem with your production environment under 
the same version.Hello, I also ran into the same problem with your production 
environment under the same version.Initially, I modified the following 
parameters

#oozie.service.ActionCheckerService.action.check.interval ->10

#oozie.service.ActionCheckerService.action.check.delay ->20

#oozie.service.ActionCheckerService.callable.batch.size ->5

All oozie tasks are up and running without inconsistent state issues. However, 
after several hours of running, oozie again has a large number of task states 
running but tasks on Yarn have succeededI combed through the logic

Then, with Debugg mode enabled, it is found through oozie logs that the code 
does not continue here

 
{code:java}
protected Void execute() throws CommandException {
        // If the action is still in PREP, we probably received a callback 
before Oozie was able to update from PREP to RUNNING;
        // we'll requeue this command a few times and hope that it switches to 
RUNNING before giving up
        if (this.wfactionBean.getStatus() == WorkflowActionBean.Status.PREP) {
            int maxEarlyRequeueCount = 
Services.get().get(CallbackService.class).getEarlyRequeueMaxRetries();
            if (this.earlyRequeueCount < maxEarlyRequeueCount) {
                long delay = getRequeueDelay();
                LOG.warn("Received early callback for action still in PREP 
state; will wait [{0}]ms and requeue up to [{1}] more"
                        + " times", delay, (maxEarlyRequeueCount - 
earlyRequeueCount));
                queue(new CompletedActionXCommand(this.actionId, 
this.externalStatus, null, this.getPriority(),
                        this.earlyRequeueCount + 1), delay);
            } else {
                throw new CommandException(ErrorCode.E0822, actionId);
            }
        } else {    // RUNNING
            ActionExecutor executor = 
Services.get().get(ActionService.class).getExecutor(this.wfactionBean.getType());
            // this is done because oozie notifications (of sub-wfs) is send
            // every status change, not only on completion.
            if (executor.isCompleted(externalStatus)) {//***********Here!!code 
does not run into queue()*****************
                queue(new ActionCheckXCommand(this.wfactionBean.getId(), 
getPriority(), -1));
            }
        }
        return null;
    }
{code}

> Callback does not applied in Oozie server, workflows stuk in RUNNING states.
> ----------------------------------------------------------------------------
>
>                 Key: OOZIE-3581
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3581
>             Project: Oozie
>          Issue Type: Bug
>          Components: action, workflow
>    Affects Versions: 4.3.1
>            Reporter: Kotsubinsky Victor
>            Priority: Critical
>
> oozie version 4.3.1.3.1.0.0-78
> with HDP3.10 stack , release provides Oozie 4.3.1 and the additional Apache 
> patches listed here: 
> [https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/release-notes/content/patch_oozie.html]
> I use Hadoop kerberized cluster, run on OOZIE, YARN Mr jobs.
> 1. OOzie run mr-job via YARN
> 2. After YARN mr job completed,YYARN mr-job Successfully sent Callback 
> request to OOzie,
> 3. in logs OOzie server, i can see this request, but OOZIE does not apply 
> this callback request, so in WF-action-id i still see RUNNING state (until 
> action.check process check wf-ids and swith action-id to SUCCESS state)
> LOGS in YARN:
> 2020-01-27 12:16:39,749 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification trying 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification to 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
>  succeeded
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification succeeded for job_1579778851579_31505
>  
> Oozie logs about this event:
> 2020-01-27 12:16:39,770 DEBUG CallbackServlet:526 - SERVER[hdp3-oo-2] USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Received a CallbackServlet.doGet() with query string 
> id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Execute command [callback] key [null]
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Queuing [1] commands with delay [0]ms



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to