[ 
https://issues.apache.org/jira/browse/OOZIE-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306319#comment-16306319
 ] 

Attila Sasvari commented on OOZIE-1401:
---------------------------------------

[~vaifer] thanks for spotting and reporting this issue. You are right that the 
mentioned queries do not return lastModificationTime, and if end_time is null, 
then those workflows can't be purged as there is no information when they have 
finished (but at least the purge service does not throw an NPE). An amendment 
patch should be fine I believe.


> PurgeCommand should purge the workflow jobs w/o end_time
> --------------------------------------------------------
>
>                 Key: OOZIE-1401
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1401
>             Project: Oozie
>          Issue Type: Sub-task
>          Components: bundle, coordinator, workflow
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>            Assignee: Attila Sasvari
>             Fix For: 5.0.0b1
>
>         Attachments: OOZIE-1401-001.patch
>
>
> Currently, {{PurgeXCommand}} logic is not working with those workflow jobs 
> with {{end_time=null}}. This command needs to take care of those jobs as 
> well. This happens in the case of long stuck jobs after Hadoop restarts or DB 
> failures. It could be done by checking {{last_modified_time}} instead, if 
> {{end_time}} is not available.
> The current query:
> {code:sql}
> select w from WorkflowJobBean w where w.endTimestamp < :endTime
> {code}
> There is also an issue when:
> * there is a parent workflow that has its {{end_time}} set
> * is otherwise eligible for {{PurgeXCommand}}: {{end_time}} is older than 
> configured number of days, and has {{status}} either {{KILLED}}, or 
> {{FAILED}}, or {{SUCCEEDED}}
> * has a child workflow that has the {{parent_id}} set to the {{id}} of the 
> parent workflow
> * child workflow has its {{end_time = NULL}}
> In this case, 
> [*{{PurgeXCommand#fetchTerminatedWorkflow()}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/PurgeXCommand.java#L249]
>  throws a {{NullPointerException}} like this:
> {noformat}
> 2017-09-29 07:59:46,365 DEBUG org.apache.oozie.command.PurgeXCommand: 
> SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] 
> JOB[-] ACTION[-] Purging workflows of long running coordinators is turned on
> 2017-09-29 07:59:46,371 DEBUG org.apache.oozie.command.PurgeXCommand: 
> SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] 
> JOB[-] ACTION[-] Execute command [purge] key [null]
> 2017-09-29 07:59:46,371 INFO org.apache.oozie.command.PurgeXCommand: 
> SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] 
> JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [1] days, 
> Coordinator Jobs older than [1] days, and Bundlejobs older than [1] days.
> 2017-09-29 07:59:46,375 ERROR org.apache.oozie.command.PurgeXCommand: 
> SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] 
> JOB[-] ACTION[-] Exception, 
> java.lang.NullPointerException
>       at 
> org.apache.oozie.command.PurgeXCommand.fetchTerminatedWorkflow(PurgeXCommand.java:249)
>       at 
> org.apache.oozie.command.PurgeXCommand.processWorkflowsHelper(PurgeXCommand.java:227)
>       at 
> org.apache.oozie.command.PurgeXCommand.processWorkflows(PurgeXCommand.java:199)
>       at 
> org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:150)
>       at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:53)
>       at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to