[ 
https://issues.apache.org/jira/browse/OOZIE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17632141#comment-17632141
 ] 

Hadoop QA commented on OOZIE-3669:
----------------------------------

PreCommit-OOZIE-Build started


> Fix purge process for bundles to prevent orphan coordinators
> ------------------------------------------------------------
>
>                 Key: OOZIE-3669
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3669
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.2.1
>            Reporter: Janos Makai
>            Assignee: Janos Makai
>            Priority: Major
>         Attachments: OOZIE-3669-001.patch, OOZIE-3669-002.patch
>
>
> The Oozie purge process for bundles is creating orphan coordinators. When 
> purging bundle jobs and bundle actions, it does not always purge coordinator 
> jobs, etc. This causes orphaned coordinators, meaning neither they nor their 
> children will ever be purged due to the purge logic.
>  
> ----
>  
> When purging bundles, it first compiles a list of any coordinators which are 
> not ready to purge [0]. It checks the coord list for status and 
> coordOlderThan. If the no child coordinator meets these criteria, it adds it 
> to the coordsToPurge list.
> Being added to the list does not guarantee that the coordinator will be 
> purged however. The processCoordinators method also has logic to check if the 
> children workflows are older than wfOlderThan [1]. If a purge command is 
> started where wfOlderThan is much higher than coordOlderThan (for example the 
> default values are 30 days for workflows and 7 days for coordinators), then 
> the bundle will be purged, but the coordinator will not.
> Since the bundle is now purged, the child coordinator will never be purged 
> because only parentless coordinators will be checked, since coordinators with 
> parents will only be purged when the bundle is purged
> [0]
> {code:java}
> PurgeXCommand
>  380 long numChildrenNotReady = jpaService.execute(
>  381 new CoordJobsCountNotForPurgeFromParentIdJPAExecutor(coordOlderThan, 
> bundleId));
> CoordinatorJobBean
>  192 @NamedQuery(name = "GET_COORD_COUNT_WITH_PARENT_ID_NOT_READY_FOR_PURGE", 
> query = "select count(w) from CoordinatorJobBean"
>  193 + " w where w.bundleId = :parentId and (w.statusStr NOT IN ('SUCCEEDED', 
> 'FAILED', 'KILLED', 'DONEWITHERROR') "
>  194 + "OR w.lastModifiedTimestamp >= :lastModTime)"),
> {code}
>  
> [1]
> {code:java}
> PurgeXCommand
>  343 List<String> workflowChildren = fetchTerminatedWorkflow(wfjBeanList);
>  344
> private boolean isWorkflowPurgeable(WorkflowJobBean wfjBean, long 
> wfOlderThanMS) {
>  308 final Date wfEndTime = wfjBean.getEndTime();
>  309 final boolean isFinished = wfjBean.inTerminalState();
>  310 if (isFinished && wfEndTime != null && wfEndTime.getTime() < 
> wfOlderThanMS)
> { 311 return true; 312 }
> 313 else {
>  314 final Date lastModificationTime = wfjBean.getLastModifiedTime();
>  315 if (isFinished && lastModificationTime != null && 
> lastModificationTime.getTime() < wfOlderThanMS)
> { 316 return true; 317 }
> 318 }
>  319 return false;
> 345 // if all workflow are ready to purge add them and add the coordinator 
> and their actions
>  346 if(workflowChildren.size() == wfjBeanList.size()) {
>  347 LOG.debug("Purging coordinator " + coordId);
>  348 wfsToPurge.addAll(workflowChildren);
>  349 coordsToPurge.add(coordId);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to