Janos Makai created OOZIE-3669:
----------------------------------
Summary: Fix purge process for bundles to prevent orphan
coordinators
Key: OOZIE-3669
URL: https://issues.apache.org/jira/browse/OOZIE-3669
Project: Oozie
Issue Type: Bug
Components: core
Affects Versions: 5.2.1
Reporter: Janos Makai
Assignee: Janos Makai
The Oozie purge process for bundles is creating orphan coordinators. When
purging bundle jobs and bundle actions, it does not always purge coordinator
jobs, etc. This causes orphaned coordinators, meaning neither they nor their
children will ever be purged due to the purge logic.
----
When purging bundles, it first compiles a list of any coordinators which are
not ready to purge [0]. It checks the coord list for status and coordOlderThan.
If the no child coordinator meets these criteria, it adds it to the
coordsToPurge list.
Being added to the list does not guarantee that the coordinator will be purged
however. The processCoordinators method also has logic to check if the children
workflows are older than wfOlderThan [1]. If a purge command is started where
wfOlderThan is much higher than coordOlderThan (for example the default values
are 30 days for workflows and 7 days for coordinators), then the bundle will be
purged, but the coordinator will not.
Since the bundle is now purged, the child coordinator will never be purged
because only parentless coordinators will be checked, since coordinators with
parents will only be purged when the bundle is purged
[0]
{code:java}
PurgeXCommand
380 long numChildrenNotReady = jpaService.execute(
381 new CoordJobsCountNotForPurgeFromParentIdJPAExecutor(coordOlderThan,
bundleId));
CoordinatorJobBean
192 @NamedQuery(name = "GET_COORD_COUNT_WITH_PARENT_ID_NOT_READY_FOR_PURGE",
query = "select count(w) from CoordinatorJobBean"
193 + " w where w.bundleId = :parentId and (w.statusStr NOT IN ('SUCCEEDED',
'FAILED', 'KILLED', 'DONEWITHERROR') "
194 + "OR w.lastModifiedTimestamp >= :lastModTime)"),
{code}
[1]
{code:java}
PurgeXCommand
343 List<String> workflowChildren = fetchTerminatedWorkflow(wfjBeanList);
344
private boolean isWorkflowPurgeable(WorkflowJobBean wfjBean, long
wfOlderThanMS) {
308 final Date wfEndTime = wfjBean.getEndTime();
309 final boolean isFinished = wfjBean.inTerminalState();
310 if (isFinished && wfEndTime != null && wfEndTime.getTime() < wfOlderThanMS)
{ 311 return true; 312 }
313 else {
314 final Date lastModificationTime = wfjBean.getLastModifiedTime();
315 if (isFinished && lastModificationTime != null &&
lastModificationTime.getTime() < wfOlderThanMS)
{ 316 return true; 317 }
318 }
319 return false;
345 // if all workflow are ready to purge add them and add the coordinator and
their actions
346 if(workflowChildren.size() == wfjBeanList.size()) {
347 LOG.debug("Purging coordinator " + coordId);
348 wfsToPurge.addAll(workflowChildren);
349 coordsToPurge.add(coordId);
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)