[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171522#comment-14171522 ] Purshotam Shah commented on OOZIE-1813: --- Thanks Rohini for review, committed to trunk. > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Fix For: trunk > > Attachments: OOZIE-1813-Amendment-V1.patch, > OOZIE-1813-Amendment-V1.patch, OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171385#comment-14171385 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 2 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 1538 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/2036/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Fix For: trunk > > Attachments: OOZIE-1813-Amendment-V1.patch, > OOZIE-1813-Amendment-V1.patch, OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139775#comment-14139775 ] Rohini Palaniswamy commented on OOZIE-1813: --- +1. The longer lines are named queries. > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Fix For: trunk > > Attachments: OOZIE-1813-Amendment-V1.patch, > OOZIE-1813-Amendment-V1.patch, OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139688#comment-14139688 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 2 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} - patch does not compile, cannot run testcases {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1994/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Fix For: trunk > > Attachments: OOZIE-1813-Amendment-V1.patch, > OOZIE-1813-Amendment-V1.patch, OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139638#comment-14139638 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 2 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} - patch does not compile, cannot run testcases {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1993/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Fix For: trunk > > Attachments: OOZIE-1813-Amendment-V1.patch, > OOZIE-1813-Amendment-V1.patch, OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139486#comment-14139486 ] Purshotam Shah commented on OOZIE-1813: --- {quote} Tests failed: 1 . Tests errors: 0 . The patch failed the following testcases: . testMessage_withMixedStatus(org.apache.oozie.command.coord.TestAbandonedCoordChecker) {quote} Not sure why this has failed in pre-commit. I tried running whole testcase multiple times in my local box, no failure. Uploaded patch to re trigger pre-commit build. > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Fix For: trunk > > Attachments: OOZIE-1813-Amendment-V1.patch, > OOZIE-1813-Amendment-V1.patch, OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137979#comment-14137979 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 2 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1531 .Tests failed: 1 .Tests errors: 0 .The patch failed the following testcases: . testMessage_withMixedStatus(org.apache.oozie.command.coord.TestAbandonedCoordChecker) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1987/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Fix For: trunk > > Attachments: OOZIE-1813-Amendment-V1.patch, OOZIE-1813-V2.patch, > OOZIE-1813-V3.patch, OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, > OOZIE-1813-V6.patch, OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137779#comment-14137779 ] Rohini Palaniswamy commented on OOZIE-1813: --- >From [~mchiang_4w...@yahoo.com]: currently the diff of current time and coord job start time is used to check abandoned job "older_than". for the case that coord job is in catch up mode, and its start time is earlier than current time, than it will be considered as "older" job to kill even though it is created just now. however coord job can be created at present, and its start time is in the future. using coord job created time as the base may not be accurate either. Thanks for catching this Michelle. So the buffer of 2 days should be max of (created time, start time). And OOZIE-1813-Amendment-V1.patch addresses that. +1 Pending jenkins. > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Fix For: trunk > > Attachments: OOZIE-1813-Amendment-V1.patch, OOZIE-1813-V2.patch, > OOZIE-1813-V3.patch, OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, > OOZIE-1813-V6.patch, OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137767#comment-14137767 ] Purshotam Shah commented on OOZIE-1813: --- 2 days buffer, fails for catchup jobs. Attaching amendment patch. > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Fix For: trunk > > Attachments: OOZIE-1813-Amendment-V1.patch, OOZIE-1813-V2.patch, > OOZIE-1813-V3.patch, OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, > OOZIE-1813-V6.patch, OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128933#comment-14128933 ] Purshotam Shah commented on OOZIE-1813: --- Thanks Rohini and Robert for review. Committed to trunk. > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Fix For: trunk > > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128900#comment-14128900 ] Rohini Palaniswamy commented on OOZIE-1813: --- +1. Lines greater than 132 are Named queries and so ok. Test failure is known flaky test. > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124264#comment-14124264 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1522 .Tests failed: 4 .Tests errors: 0 .The patch failed the following testcases: . testBundleStatusTransitServiceKilled2(org.apache.oozie.service.TestStatusTransitService) . testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService) . testActionKillCommandDate(org.apache.oozie.command.coord.TestCoordActionsKillXCommand) . testMemoryUsageAndSpeed(org.apache.oozie.service.TestPartitionDependencyManagerService) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1920/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124254#comment-14124254 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1522 .Tests failed: 0 .Tests errors: 1 .The patch failed the following testcases: . {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1918/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123866#comment-14123866 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 1522 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1910/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123842#comment-14123842 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1522 .Tests failed: 2 .Tests errors: 0 .The patch failed the following testcases: . testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService) . testPauseBundleAndCoordinator(org.apache.oozie.service.TestPauseTransitService) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1907/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123817#comment-14123817 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:red}-1 COMPILE{color} .{color:red}-1{color} HEAD does not compile .{color:red}-1{color} patch does not compile .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} - patch does not compile, cannot run testcases {color:red}-1 DISTRO{color} .{color:red}-1{color} distro tarball fails with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1916/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123483#comment-14123483 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1521 .Tests failed: 1 .Tests errors: 0 .The patch failed the following testcases: . testCoordinatorActionEvent(org.apache.oozie.event.TestEventGeneration) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1902/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122763#comment-14122763 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 1521 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1901/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122509#comment-14122509 ] Purshotam Shah commented on OOZIE-1813: --- New patch, which include 1. {quote} Actually, on the final patch, can you add the new config properties to oozie-default.xml? {quote} 2. {quote} Also can you add a check to only kill the coord job if it is older than 2 days? If there was something submitted and lot of failures initially this would kill the coord job. Should give user sometime to correct any error and rerun if needed. {quote} 3. {quote} In HA, this service should only run on primary server. {quote} > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch, OOZIE-1813-V8.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119306#comment-14119306 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1515 .Tests failed: 3 .Tests errors: 0 .The patch failed the following testcases: . testBundleStatusTransitServiceKilled2(org.apache.oozie.service.TestStatusTransitService) . testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService) . testUnpauseBundleAndCoordinator(org.apache.oozie.service.TestPauseTransitService) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1850/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110207#comment-14110207 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1512 .Tests failed: 6 .Tests errors: 2 .The patch failed the following testcases: . testConcurrencyReachedAndChooseNextEligible(org.apache.oozie.service.TestCallableQueueService) . testMain(org.apache.oozie.action.hadoop.TestHiveMain) . testPigScript(org.apache.oozie.action.hadoop.TestPigMainWithOldAPI) . testPigScript(org.apache.oozie.action.hadoop.TestPigMain) . testEmbeddedPigWithinPython(org.apache.oozie.action.hadoop.TestPigMain) . testPig_withNullExternalID(org.apache.oozie.action.hadoop.TestPigMain) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1713/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107546#comment-14107546 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:red}-1 COMPILE{color} .{color:red}-1{color} HEAD does not compile .{color:red}-1{color} patch does not compile .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} - patch does not compile, cannot run testcases {color:red}-1 DISTRO{color} .{color:red}-1{color} distro tarball fails with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1645/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106736#comment-14106736 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 1512 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1580/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105110#comment-14105110 ] Purshotam Shah commented on OOZIE-1813: --- In HA, this service should only run on primary server. > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083966#comment-14083966 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1509 .Tests failed: 2 .Tests errors: 0 .The patch failed the following testcases: . testActionKillCommandDate(org.apache.oozie.command.coord.TestCoordActionsKillXCommand) . testCoordActionInputCheckXCommandUniqueness(org.apache.oozie.command.coord.TestCoordActionInputCheckXCommand) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1446/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041461#comment-14041461 ] Rohini Palaniswamy commented on OOZIE-1813: --- Also can you add a check to only kill the coord job if it is older than 2 days? If there was something submitted and lot of failures initially this would kill the coord job. Should give user sometime to correct any error and rerun if needed. > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016063#comment-14016063 ] Robert Kanter commented on OOZIE-1813: -- Actually, on the final patch, can you add the new config properties to oozie-default.xml? > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016059#comment-14016059 ] Robert Kanter commented on OOZIE-1813: -- +1 on the latest patch on RB (pending Jenkins) > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014236#comment-14014236 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1454 .Tests failed: 4 .Tests errors: 0 .The patch failed the following testcases: . testConcurrencyReachedAndChooseNextEligible(org.apache.oozie.service.TestCallableQueueService) . testBundleEngineKill(org.apache.oozie.servlet.TestV1JobServletBundleEngine) . testActionInputCheckLatestActionCreationTime(org.apache.oozie.command.coord.TestCoordActionInputCheckXCommand) . testTimeOutWithUnresolvedMissingDependencies(org.apache.oozie.command.coord.TestCoordPushDependencyCheckXCommand) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1278/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch, > OOZIE-1813-V7.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007962#comment-14007962 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:red}-1 COMPILE{color} .{color:red}-1{color} HEAD does not compile .{color:green}+1{color} patch compiles .{color:red}-1{color} the patch seems to introduce 472 new javac warning(s) {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1452 .Tests failed: 3 .Tests errors: 4 .The patch failed the following testcases: . testMessage_withTimedout(org.apache.oozie.command.coord.TestAbandonedCoordChecker) . testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration) . testConcurrencyReachedAndChooseNextEligible(org.apache.oozie.service.TestCallableQueueService) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1257/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch, OOZIE-1813-V6.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004272#comment-14004272 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:red}-1 RAT{color} .{color:red}-1{color} the patch seems to introduce 1 new RAT warning(s) {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1452 .Tests failed: 1 .Tests errors: 2 .The patch failed the following testcases: . testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1254/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch, OOZIE-1813-V5.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004186#comment-14004186 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:red}-1 RAT{color} .{color:red}-1{color} the patch seems to introduce 1 new RAT warning(s) {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:red}-1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:red}-1{color} patch does not compile .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} - patch does not compile, cannot run testcases {color:red}-1 DISTRO{color} .{color:red}-1{color} distro tarball fails with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1253/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch, > OOZIE-1813-V4.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004012#comment-14004012 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:red}-1 RAT{color} .{color:red}-1{color} the patch seems to introduce 1 new RAT warning(s) {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1451 .Tests failed: 3 .Tests errors: 2 .The patch failed the following testcases: . testActionInputCheckLatestActionCreationTime(org.apache.oozie.command.coord.TestCoordActionInputCheckXCommandNonUTC) . testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration) . testBundleEngineResume(org.apache.oozie.servlet.TestV1JobServletBundleEngine) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1250/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000379#comment-14000379 ] Purshotam Shah commented on OOZIE-1813: --- . -1 the patch contains 2 line(s) longer than 132 characters Are namedQuery. > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch, OOZIE-1813-V3.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999553#comment-13999553 ] Hadoop QA commented on OOZIE-1813: -- Testing JIRA OOZIE-1813 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 2 line(s) longer than 132 characters .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:red}-1 RAT{color} .{color:red}-1{color} the patch seems to introduce 1 new RAT warning(s) {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1450 .Tests failed: 1 .Tests errors: 7 .The patch failed the following testcases: . testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/1237/ > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1813-V2.patch > > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984579#comment-13984579 ] Robert Kanter commented on OOZIE-1813: -- That's a great idea! > Add service to report/kill rogue bundles and coordinator jobs > - > > Key: OOZIE-1813 > URL: https://issues.apache.org/jira/browse/OOZIE-1813 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah > > People leave their test coordinator and bundle jobs without ever killing them > and they just eat up resources heavily. We should have a service which > periodically check for abandoned coords and report/kill them. > We can add multiple logic to this like ( number of consecutive > failed/timedout action, total number of failed/timedout action). > To start with if number of coord action with failed/timedout status > defined > value, then coord is considered to be rogue. -- This message was sent by Atlassian JIRA (v6.2#6252)