[ https://issues.apache.org/jira/browse/OOZIE-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436646#comment-15436646 ]
Abhishek Bafna commented on OOZIE-2650: --------------------------------------- I think we can define a new error code instead of using {{E1304, "Duplicate bundle application coordinator name"}} {code} @Override protected void verifyPrecondition() throws CommandException { super.verifyPrecondition(); if (wfId != null) { LOG.warn("Workflow is already submitted for Coord Action [{0}]", getParentId()); throw new CommandException(ErrorCode.E1304, getParentId()); } } {code} > Retry coord start on database exceptions > ---------------------------------------- > > Key: OOZIE-2650 > URL: https://issues.apache.org/jira/browse/OOZIE-2650 > Project: Oozie > Issue Type: Bug > Reporter: Satish Subhashrao Saley > Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2650-1.patch > > > {code:title=CoordActionStartXCommand.java} > updateList.add(new UpdateEntry<WorkflowJobQuery>( > WorkflowJobQuery.UPDATE_WORKFLOW_PARENT_MODIFIED, > wfJob)); > updateList.add(new UpdateEntry<CoordActionQuery>( > CoordActionQuery.UPDATE_COORD_ACTION_FOR_START, > coordAction)); > try { > executor.executeBatchInsertUpdateDelete(insertList, > updateList, null); > queue(new > CoordActionNotificationXCommand(coordAction), 100); > if (EventHandlerService.isEnabled()) { > generateEvent(coordAction, user, appName, > wfJob.getStartTime()); > } > } > catch (JPAExecutorException je) { > throw new CommandException(je); > } > ......... > ....... > ........ > finally { > if (makeFail == true) { // No DB exception occurs > .... > .... > .... > queue(new > CoordActionReadyXCommand(coordAction.getJobId())); > } > } > {code} > If there is any Database issue while starting coord action, we fail the coord > action. We should retry. > CoordActionStartXCommand submits workflow. Workflow gets linked to the coord > action if workflow submission succeeds. But if coord action update fails due > to database exception, recovery service should be able to recover it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)