[ https://issues.apache.org/jira/browse/OOZIE-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312794#comment-16312794 ]
Sergey Svynarchuk commented on OOZIE-1401: ------------------------------------------ [~asasvari] I'll work on it > PurgeCommand should purge the workflow jobs w/o end_time > -------------------------------------------------------- > > Key: OOZIE-1401 > URL: https://issues.apache.org/jira/browse/OOZIE-1401 > Project: Oozie > Issue Type: Sub-task > Components: bundle, coordinator, workflow > Affects Versions: trunk > Reporter: Mona Chitnis > Assignee: Attila Sasvari > Fix For: 5.0.0b1 > > Attachments: OOZIE-1401-001.patch > > > Currently, {{PurgeXCommand}} logic is not working with those workflow jobs > with {{end_time=null}}. This command needs to take care of those jobs as > well. This happens in the case of long stuck jobs after Hadoop restarts or DB > failures. It could be done by checking {{last_modified_time}} instead, if > {{end_time}} is not available. > The current query: > {code:sql} > select w from WorkflowJobBean w where w.endTimestamp < :endTime > {code} > There is also an issue when: > * there is a parent workflow that has its {{end_time}} set > * is otherwise eligible for {{PurgeXCommand}}: {{end_time}} is older than > configured number of days, and has {{status}} either {{KILLED}}, or > {{FAILED}}, or {{SUCCEEDED}} > * has a child workflow that has the {{parent_id}} set to the {{id}} of the > parent workflow > * child workflow has its {{end_time = NULL}} > In this case, > [*{{PurgeXCommand#fetchTerminatedWorkflow()}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/PurgeXCommand.java#L249] > throws a {{NullPointerException}} like this: > {noformat} > 2017-09-29 07:59:46,365 DEBUG org.apache.oozie.command.PurgeXCommand: > SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] > JOB[-] ACTION[-] Purging workflows of long running coordinators is turned on > 2017-09-29 07:59:46,371 DEBUG org.apache.oozie.command.PurgeXCommand: > SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] > JOB[-] ACTION[-] Execute command [purge] key [null] > 2017-09-29 07:59:46,371 INFO org.apache.oozie.command.PurgeXCommand: > SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] > JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [1] days, > Coordinator Jobs older than [1] days, and Bundlejobs older than [1] days. > 2017-09-29 07:59:46,375 ERROR org.apache.oozie.command.PurgeXCommand: > SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] > JOB[-] ACTION[-] Exception, > java.lang.NullPointerException > at > org.apache.oozie.command.PurgeXCommand.fetchTerminatedWorkflow(PurgeXCommand.java:249) > at > org.apache.oozie.command.PurgeXCommand.processWorkflowsHelper(PurgeXCommand.java:227) > at > org.apache.oozie.command.PurgeXCommand.processWorkflows(PurgeXCommand.java:199) > at > org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:150) > at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:53) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)