[ 
https://issues.apache.org/jira/browse/OOZIE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123509#comment-14123509
 ] 

Purshotam Shah commented on OOZIE-1989:
---------------------------------------

There is one more issue with OOZIE-1879.
Attached is the WF. At first run job first fails at pig action.
When we rerun from failed node, oozie.wf.rerun.failnodes=true. Oozie tries to 
join it and fails with error message.

I haven't get time to look into ( i am stuck with some other issue, may not be 
able to debug it). Attaching WF and log to help you to debug the issue.


If you want to track this as different JIRA, please go ahead.


> NPE during a rerun with forks
> -----------------------------
>
>                 Key: OOZIE-1989
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1989
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 4.1.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>            Priority: Blocker
>             Fix For: 4.1.0
>
>         Attachments: OOZIE-1989.patch, oozie-log.txt, worflow.xml
>
>
> OOZIE-1879 fixes a problem where the order of actions that the actions in a 
> fork ended can be different than when they are executed during a rerun, 
> resulting in an error.  It does this by using a comparator to sort them into 
> the proper order in LiteWorkflowInstance.  However, the code assumes that all 
> actions in the executionPath have an end time, which is not always true.  If 
> that happens, you get an NPE like this:
> {noformat}
> java.lang.NullPointerException
>         at 
> org.apache.oozie.workflow.lite.LiteWorkflowInstance$ActionEndTimesComparator.compare(LiteWorkflowInstance.java:739)
>         at 
> org.apache.oozie.workflow.lite.LiteWorkflowInstance$ActionEndTimesComparator.compare(LiteWorkflowInstance.java:719)
>         at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
>         at java.util.TimSort.sort(TimSort.java:189)
>         at java.util.TimSort.sort(TimSort.java:173)
>         at java.util.Arrays.sort(Arrays.java:659)
>         at java.util.Collections.sort(Collections.java:217)
>         at 
> org.apache.oozie.workflow.lite.LiteWorkflowInstance.signal(LiteWorkflowInstance.java:316)
>         at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:190)
>         at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:73)
>         at org.apache.oozie.command.XCommand.call(XCommand.java:283)
>         at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
>         at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
>         at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to