[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Purshotam Shah reopened OOZIE-1813: ----------------------------------- > Add service to report/kill rogue bundles and coordinator jobs > ------------------------------------------------------------- > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug > Reporter: Purshotam Shah > Assignee: Purshotam Shah > Fix For: trunk > > Attachments: OOZIE-1813-Amendment-V1.patch, OOZIE-1813-V2.patch, > OOZIE-1813-V3.patch, OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, > OOZIE-1813-V6.patch, OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)