[ 
https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139486#comment-14139486
 ] 

Purshotam Shah commented on OOZIE-1813:
---------------------------------------

{quote}
Tests failed: 1
. Tests errors: 0
. The patch failed the following testcases:
. 
testMessage_withMixedStatus(org.apache.oozie.command.coord.TestAbandonedCoordChecker)
{quote}
Not sure why this has failed in pre-commit. I tried running whole testcase 
multiple times in my local box, no failure.   Uploaded patch to re trigger 
pre-commit build.

> Add service to report/kill rogue bundles and coordinator jobs
> -------------------------------------------------------------
>
>                 Key: OOZIE-1813
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1813
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Purshotam Shah
>            Assignee: Purshotam Shah
>             Fix For: trunk
>
>         Attachments: OOZIE-1813-Amendment-V1.patch, 
> OOZIE-1813-Amendment-V1.patch, OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, 
> OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, 
> OOZIE-1813-V7.patch, OOZIE-1813-V8.patch
>
>
> People leave their test coordinator and bundle jobs without ever killing them
> and they just eat up resources heavily. We should have a service which 
> periodically check for abandoned coords and report/kill them.
> We can add multiple logic to this like ( number of consecutive 
> failed/timedout action, total number of failed/timedout action). 
> To start with if number of coord action with failed/timedout status > defined 
> value, then coord is considered to be rogue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to