[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714443#comment-17714443 ]
Dénes Bodó commented on OOZIE-3715: ----------------------------------- I have some comments: ** *core/src/main/java/org/apache/oozie/command/wf/SignalXCommand.java* Typo in line 552: {{transitions ,one}} should be {{transitions, one}} {code:java} // Fork out more than one transitions ,one submit fail can't execute KillXCommand {code} Please reword the above comment *core/src/test/java/org/apache/oozie/command/wf/TestSignalXCommand.java* Please remove or fill {{@throws Exception}} Java doc lines. {*}L420-421{*}: two empty lines instead of one {*}L429{*}: {{public void testForkSubmitFail() throws Exception{}} - missing space between Exception and { {*}L429-432{*}: Let's create 2 separated test cases (1 for true, 1 for false with self-explaining names) so when one scenario fails it is easier to understand {code:java} public void testForkSubmitFail() throws Exception{ _testForkSubmitFail(true); _testForkSubmitFail(false); } {code} {*}L439{*}: {{@param isForkParallelSubmit. (ture or fail)}} - typo in "ture", fail should be "false", unnecessary dot {*}L442{*}: {{_testForkSubmitFail}} method should be private, and should be used Java naming convention: instead of _test... run... {*}L494{*}: Instead of XConfiguration please use Configuration. {*}L518{*}: {{assertTrue(WorkflowAction.Status.FAILED.equals(failAction.getStatus()));}} - should be assertEquals, also please provide reason message: assertEquals("reason", expected, actual) {*}L522{*}: {{System.out.println(bean.getStatus());}} - line should be removed {*}L513-514{*}: failAction and otherAction are unnecessary variables. You could use *action_to_be_succeeded* instead of *action1* and the same to the other action name. add an assertion for actions.size = 2 {*}L524{*}: Instead of {code:java} assertTrue(WorkflowAction.Status.KILLED.equals(otherAction.getStatus()) || WorkflowAction.Status.OK.equals(otherAction.getStatus()) ); {code} use {code:java} if (!WorkflowAction.Status.KILLED.equals(otherAction.getStatus()) || !WorkflowAction.Status.OK.equals(otherAction.getStatus())) { fail("reason..."); } {code} and add an explanation why these two states are accepted. *KILLED* and *OK* are totally different. Please comment there why and how each can be the status of action2. If the status depends on something, for e.g isForkParallelSubmit please use it in the assertion. Thanks Denes > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > ----------------------------------------------------------------------------------------------------- > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core > Affects Versions: 5.2.1 > Reporter: chenhaodan > Assignee: chenhaodan > Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, > OOZIE-3715-003.patch, OOZIE-3715-004.patch, forkSubmitFail_issue.txt, > status.png > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff0000}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff0000}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List<WorkflowActionBean> > workflowActionBeanListForForked) throws CommandException { > ...... > for (Future<ActionExecutorContext> result : futures) { > ...... > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > ...... > } > ...... > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry<Long, List<XCommand<?>>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)