[ https://issues.apache.org/jira/browse/OOZIE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17632141#comment-17632141 ]
Hadoop QA commented on OOZIE-3669: ---------------------------------- PreCommit-OOZIE-Build started > Fix purge process for bundles to prevent orphan coordinators > ------------------------------------------------------------ > > Key: OOZIE-3669 > URL: https://issues.apache.org/jira/browse/OOZIE-3669 > Project: Oozie > Issue Type: Bug > Components: core > Affects Versions: 5.2.1 > Reporter: Janos Makai > Assignee: Janos Makai > Priority: Major > Attachments: OOZIE-3669-001.patch, OOZIE-3669-002.patch > > > The Oozie purge process for bundles is creating orphan coordinators. When > purging bundle jobs and bundle actions, it does not always purge coordinator > jobs, etc. This causes orphaned coordinators, meaning neither they nor their > children will ever be purged due to the purge logic. > > ---- > > When purging bundles, it first compiles a list of any coordinators which are > not ready to purge [0]. It checks the coord list for status and > coordOlderThan. If the no child coordinator meets these criteria, it adds it > to the coordsToPurge list. > Being added to the list does not guarantee that the coordinator will be > purged however. The processCoordinators method also has logic to check if the > children workflows are older than wfOlderThan [1]. If a purge command is > started where wfOlderThan is much higher than coordOlderThan (for example the > default values are 30 days for workflows and 7 days for coordinators), then > the bundle will be purged, but the coordinator will not. > Since the bundle is now purged, the child coordinator will never be purged > because only parentless coordinators will be checked, since coordinators with > parents will only be purged when the bundle is purged > [0] > {code:java} > PurgeXCommand > 380 long numChildrenNotReady = jpaService.execute( > 381 new CoordJobsCountNotForPurgeFromParentIdJPAExecutor(coordOlderThan, > bundleId)); > CoordinatorJobBean > 192 @NamedQuery(name = "GET_COORD_COUNT_WITH_PARENT_ID_NOT_READY_FOR_PURGE", > query = "select count(w) from CoordinatorJobBean" > 193 + " w where w.bundleId = :parentId and (w.statusStr NOT IN ('SUCCEEDED', > 'FAILED', 'KILLED', 'DONEWITHERROR') " > 194 + "OR w.lastModifiedTimestamp >= :lastModTime)"), > {code} > > [1] > {code:java} > PurgeXCommand > 343 List<String> workflowChildren = fetchTerminatedWorkflow(wfjBeanList); > 344 > private boolean isWorkflowPurgeable(WorkflowJobBean wfjBean, long > wfOlderThanMS) { > 308 final Date wfEndTime = wfjBean.getEndTime(); > 309 final boolean isFinished = wfjBean.inTerminalState(); > 310 if (isFinished && wfEndTime != null && wfEndTime.getTime() < > wfOlderThanMS) > { 311 return true; 312 } > 313 else { > 314 final Date lastModificationTime = wfjBean.getLastModifiedTime(); > 315 if (isFinished && lastModificationTime != null && > lastModificationTime.getTime() < wfOlderThanMS) > { 316 return true; 317 } > 318 } > 319 return false; > 345 // if all workflow are ready to purge add them and add the coordinator > and their actions > 346 if(workflowChildren.size() == wfjBeanList.size()) { > 347 LOG.debug("Purging coordinator " + coordId); > 348 wfsToPurge.addAll(workflowChildren); > 349 coordsToPurge.add(coordId); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)