[
https://issues.apache.org/jira/browse/OOZIE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17632635#comment-17632635
]
ASF subversion and git services commented on OOZIE-3669:
--------------------------------------------------------
Commit 5931b95f777b209c463dc74de88d245ab645263d in oozie's branch
refs/heads/master from Denes Bodo
[ https://gitbox.apache.org/repos/asf?p=oozie.git;h=5931b95f7 ]
OOZIE-3669 Fix purge process for bundles to prevent orphan coordinators (jmakai
via dionusos)
> Fix purge process for bundles to prevent orphan coordinators
> ------------------------------------------------------------
>
> Key: OOZIE-3669
> URL: https://issues.apache.org/jira/browse/OOZIE-3669
> Project: Oozie
> Issue Type: Bug
> Components: core
> Affects Versions: 5.2.1
> Reporter: Janos Makai
> Assignee: Janos Makai
> Priority: Major
> Attachments: OOZIE-3669-001.patch, OOZIE-3669-002.patch,
> OOZIE-3669-003.patch
>
>
> The Oozie purge process for bundles is creating orphan coordinators. When
> purging bundle jobs and bundle actions, it does not always purge coordinator
> jobs, etc. This causes orphaned coordinators, meaning neither they nor their
> children will ever be purged due to the purge logic.
>
> ----
>
> When purging bundles, it first compiles a list of any coordinators which are
> not ready to purge [0]. It checks the coord list for status and
> coordOlderThan. If the no child coordinator meets these criteria, it adds it
> to the coordsToPurge list.
> Being added to the list does not guarantee that the coordinator will be
> purged however. The processCoordinators method also has logic to check if the
> children workflows are older than wfOlderThan [1]. If a purge command is
> started where wfOlderThan is much higher than coordOlderThan (for example the
> default values are 30 days for workflows and 7 days for coordinators), then
> the bundle will be purged, but the coordinator will not.
> Since the bundle is now purged, the child coordinator will never be purged
> because only parentless coordinators will be checked, since coordinators with
> parents will only be purged when the bundle is purged
> [0]
> {code:java}
> PurgeXCommand
> 380 long numChildrenNotReady = jpaService.execute(
> 381 new CoordJobsCountNotForPurgeFromParentIdJPAExecutor(coordOlderThan,
> bundleId));
> CoordinatorJobBean
> 192 @NamedQuery(name = "GET_COORD_COUNT_WITH_PARENT_ID_NOT_READY_FOR_PURGE",
> query = "select count(w) from CoordinatorJobBean"
> 193 + " w where w.bundleId = :parentId and (w.statusStr NOT IN ('SUCCEEDED',
> 'FAILED', 'KILLED', 'DONEWITHERROR') "
> 194 + "OR w.lastModifiedTimestamp >= :lastModTime)"),
> {code}
>
> [1]
> {code:java}
> PurgeXCommand
> 343 List<String> workflowChildren = fetchTerminatedWorkflow(wfjBeanList);
> 344
> private boolean isWorkflowPurgeable(WorkflowJobBean wfjBean, long
> wfOlderThanMS) {
> 308 final Date wfEndTime = wfjBean.getEndTime();
> 309 final boolean isFinished = wfjBean.inTerminalState();
> 310 if (isFinished && wfEndTime != null && wfEndTime.getTime() <
> wfOlderThanMS)
> { 311 return true; 312 }
> 313 else {
> 314 final Date lastModificationTime = wfjBean.getLastModifiedTime();
> 315 if (isFinished && lastModificationTime != null &&
> lastModificationTime.getTime() < wfOlderThanMS)
> { 316 return true; 317 }
> 318 }
> 319 return false;
> 345 // if all workflow are ready to purge add them and add the coordinator
> and their actions
> 346 if(workflowChildren.size() == wfjBeanList.size()) {
> 347 LOG.debug("Purging coordinator " + coordId);
> 348 wfsToPurge.addAll(workflowChildren);
> 349 coordsToPurge.add(coordId);
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)