[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500271#comment-16500271 ] Hadoop QA commented on OOZIE-3156: -- Testing JIRA OOZIE-3156 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2146 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/603/ > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, OOZIE-3156-v4.patch, OOZIE-3156-v5.patch, > OOZIE-3156-v6.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500161#comment-16500161 ] Hadoop QA commented on OOZIE-3156: -- PreCommit-OOZIE-Build started > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, OOZIE-3156-v4.patch, OOZIE-3156-v5.patch, > OOZIE-3156-v6.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500082#comment-16500082 ] Hadoop QA commented on OOZIE-3156: -- Testing JIRA OOZIE-3156 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:red}-1{color} the patch contains 1 line(s) with trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [server]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 2146 .Tests failed: 0 .Tests errors: 1 .The patch failed the following testcases: .Tests failing with errors: testNewUsingACLs(org.apache.oozie.util.TestZKUtilsWithSecurity) .{color:orange}Tests failed at first run:{color} TestCoordActionsKillXCommand#testActionKillCommandActionNumbers .For the complete list of flaky tests, see TEST-SUMMARY-FULL files. {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/602/ > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, OOZIE-3156-v4.patch, OOZIE-3156-v5.patch, > ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499941#comment-16499941 ] Andras Piros commented on OOZIE-3156: - [~txsing] thanks for the contribution! +1 for patch v5 (pending Jenkins) > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, OOZIE-3156-v4.patch, OOZIE-3156-v5.patch, > ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499939#comment-16499939 ] Hadoop QA commented on OOZIE-3156: -- PreCommit-OOZIE-Build started > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, OOZIE-3156-v4.patch, OOZIE-3156-v5.patch, > ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499871#comment-16499871 ] Andras Piros commented on OOZIE-3156: - [~txsing] good idea! While you're at it, you could modify the signature of the method to be extracted to cover rest of {{TestSshActionExecutor}}. Thanks! Hint: in patch v4, {{DG_SshActionExtension.twiki#31}} is a line longer than 132 chars, violating {{RAW_PATCH_ANALYSIS}} pre-commit check. > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, OOZIE-3156-v4.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499711#comment-16499711 ] Hadoop QA commented on OOZIE-3156: -- Testing JIRA OOZIE-3156 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 1 line(s) longer than 132 characters .{color:green}+1{color} the patch adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2132 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/598/ > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, OOZIE-3156-v4.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499671#comment-16499671 ] Hadoop QA commented on OOZIE-3156: -- PreCommit-OOZIE-Build started > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, OOZIE-3156-v4.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499655#comment-16499655 ] TIAN XING commented on OOZIE-3156: -- [~andras.piros], Thanks for the clarification! The reason why I didn't extract those lines that create a ssh action bean as an another method called \{{createSshWorkflowAction()}} is that, such kind of action-bean-creation code appears in nearly all test methods of \{{TestSshActionExecutor}}. Shall I do such extraction to all of them also? > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497726#comment-16497726 ] Andras Piros commented on OOZIE-3156: - Hi [~txsing], this is only about readability / code maintainability. For clarity I'd extract these lines: {code:java} String baseDir = getTestCaseDir(); Path appPath = new Path(getNameNodeUri(), baseDir); XConfiguration protoConf = new XConfiguration(); protoConf.setStrings(WorkflowAppService.HADOOP_USER, getTestUser()); XConfiguration wfConf = new XConfiguration(); wfConf.set(OozieClient.APP_PATH, appPath.toString()); WorkflowJobBean workflow = new WorkflowJobBean(); workflow.setConf(wfConf.toXmlString()); workflow.setAppPath(wfConf.get(OozieClient.APP_PATH)); workflow.setProtoActionConf(protoConf.toXmlString()); workflow.setId(Services.get().get(UUIDService.class).generateId(ApplicationType.WORKFLOW)); final WorkflowActionBean action = new WorkflowActionBean(); action.setId("actionId"); action.setConf("" + "localhost" + "echo" + "" + "\"prop1=something\"" + ""); action.setName("ssh"); {code} to a method called {{WorkflowActionBean createSshWorkflowAction()}}. Moreover, I'd extract these lines: {code:java} final SshActionExecutor ssh = new SshActionExecutor(); final Context context = new Context(workflow, action); ssh.start(context, action); String originTrackerUri = action.getTrackerUri(); action.setTrackerUri("dummy@dummyHost"); {code} to a method called {{SshActionExecutor createAndStartFailingSshActionExecutor(WorkflowAction base)}}. > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496518#comment-16496518 ] TIAN XING commented on OOZIE-3156: -- Hey [~andras.piros], thanks for the review. In \{{TestSshActionExecutor#testSshCheckWithHostConnectFailure()}}, I copy the code from \{{TestSshActionExecutor#testJobStart}} which gives us an example ends with OK status. In oder to create a "SSH connection failure" situation, I changed action's \{{TrackerUri}} from "\{{@localhost}}" to "\{{dummy@dummyHost}}" during action status check. An exception is expected to be thrown out, while before this patch, the check method will execute normally and end with OK status. Do you have any better suggestions on how to design such test case? Thanks! > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496381#comment-16496381 ] Andras Piros commented on OOZIE-3156: - Thanks for the new patch [~txsing]! Following is the next round on comments: * {{SshActionExecutor#handleRetry()}}: {{sleepBeforeRetryMs /= 2;}} should rather be {{sleepBeforeRetryMs *= 2;}} * the return value of {{SshActionExecutor#handleRetry()}} is not reused in caller code, so it doesn't get really an exponential backoff - {{initWaitTime}} will always be reused * in {{TestSshActionExecutor#testSshCheckWithHostConnectFailure()}} it's unclear to me whether {{echo "prop1=something"}} would always fail for the first time. We need to inject failure somehow to be on the safe side, or, if already present, extract methods of the test case w/ appropriate names to know what's going on * extending {{DG_SshActionExtension.twiki}} goes into the right direction. Still, we need to introduce {{oozie-default.xml#oozie.action.ssh.check.retries.max}} with the default value {{3}}, and mention it also in the docs > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496138#comment-16496138 ] TIAN XING commented on OOZIE-3156: -- [~andras.piros] Test case added. > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496134#comment-16496134 ] Hadoop QA commented on OOZIE-3156: -- Testing JIRA OOZIE-3156 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2132 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/593/ > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496066#comment-16496066 ] Hadoop QA commented on OOZIE-3156: -- PreCommit-OOZIE-Build started > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > OOZIE-3156-v3.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496039#comment-16496039 ] Hadoop QA commented on OOZIE-3156: -- Testing JIRA OOZIE-3156 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 1 line(s) longer than 132 characters .{color:green}+1{color} the patch adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [server]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2132 .{color:orange}Tests failed at first run:{color} TestCoordActionsKillXCommand#testActionKillCommandActionNumbers TestCoordActionsKillXCommand#testActionKillCommandDate .For the complete list of flaky tests, see TEST-SUMMARY-FULL files. {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/592/ > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495978#comment-16495978 ] Hadoop QA commented on OOZIE-3156: -- PreCommit-OOZIE-Build started > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, > ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495409#comment-16495409 ] Hadoop QA commented on OOZIE-3156: -- Testing JIRA OOZIE-3156 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 1 line(s) longer than 132 characters .{color:green}+1{color} the patch adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2132 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/590/ > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495282#comment-16495282 ] Hadoop QA commented on OOZIE-3156: -- PreCommit-OOZIE-Build started > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: OOZIE-3156-v1.patch, ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494928#comment-16494928 ] TIAN XING commented on OOZIE-3156: -- [~andras.piros] Will work on it, thanks for the comments. > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493556#comment-16493556 ] Andras Piros commented on OOZIE-3156: - Thanks for the contribution [~txsing]! Can you please update {{TestSshActionExecutor}} with a new test case covering retry functionality, as well as extend {{DG_SshActionExtension.twiki}} to document the fix? Review comments: * {{SSH_CONNECT_ERROR_CODE}} could be {{final}} * {{retriesMax}} should be {{retryCount}} * in order to actually have a chance that the connection error doesn't reoccur, we should {{Thread#sleep()}} some time in between, or use an [*{{ScheduledThreadPoolExecutor}}*|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ScheduledThreadPoolExecutor.html] to perform waiting without busy waiting * the waiting should be based on an exponential backoff like in [*{{OperationRetryHandler#handleRetry()}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/util/db/OperationRetryHandler.java#L123-L129] > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493534#comment-16493534 ] Hadoop QA commented on OOZIE-3156: -- Testing JIRA OOZIE-3156 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:red}-1{color} the patch contains 1 line(s) longer than 132 characters .{color:red}-1{color} the patch does not add/modify any testcase {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) .{color:green}+1{color} the patch does not seem to introduce new Javadoc error(s) .{color:red}ERROR{color}: the current HEAD has 2 Javadoc error(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. . {color:green}+1{color} There are no new bugs found in [examples]. . {color:green}+1{color} There are no new bugs found in [webapp]. . {color:green}+1{color} There are no new bugs found in [core]. . {color:green}+1{color} There are no new bugs found in [tools]. . {color:green}+1{color} There are no new bugs found in [server]. . {color:green}+1{color} There are no new bugs found in [docs]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive2]. . {color:green}+1{color} There are no new bugs found in [sharelib/pig]. . {color:green}+1{color} There are no new bugs found in [sharelib/streaming]. . {color:green}+1{color} There are no new bugs found in [sharelib/hive]. . {color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. . {color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. . {color:green}+1{color} There are no new bugs found in [sharelib/oozie]. . {color:green}+1{color} There are no new bugs found in [sharelib/distcp]. . {color:green}+1{color} There are no new bugs found in [sharelib/spark]. . {color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 2132 .{color:orange}Tests failed at first run:{color} TestCoordMaterializeTriggerService#testCoordMaterializeTriggerService3 .For the complete list of flaky tests, see TEST-SUMMARY-FULL files. {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/PreCommit-OOZIE-Build/586/ > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493449#comment-16493449 ] Hadoop QA commented on OOZIE-3156: -- PreCommit-OOZIE-Build started > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.0.0 >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493440#comment-16493440 ] TIAN XING commented on OOZIE-3156: -- [~andras.piros] hey Andras, any news on this patch? > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Reporter: TIAN XING >Assignee: TIAN XING >Priority: Major > Attachments: ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether the pid of the process that ssh action > started is still there (by checking the returned value of command "{{ssh > ps -p }}" ) to determine whether ssh action completes or not. > However, we found cases where oozie fails to connect to host during action > status check (e.g., the host is under heavy load, or network is bad etc.). > In such cases, the return value of command "{{ssh ps -p }}" > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-3156) SSH action status turns OK wrongly when failed to connect to host
[ https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322500#comment-16322500 ] Andras Piros commented on OOZIE-3156: - [~txsing] added you to the list of contributors, and assigned this JIRA to you. Thanks for the patch! > SSH action status turns OK wrongly when failed to connect to host > - > > Key: OOZIE-3156 > URL: https://issues.apache.org/jira/browse/OOZIE-3156 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 4.0.0, 4.1.0, 4.2.0, 4.3.0 >Reporter: TIAN XING >Assignee: TIAN XING > Fix For: 4.3.0 > > Attachments: ssh-check-bug.patch > > > When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh > connect to the host and check whether action shell pid is still there (by > checking the returned value of command {{ssh $hostIp ps -p $pid}} ) to > determine whether the action is running or not. > However, there are cases where oozie fails to connect to the host during > action status check (e.g., the host is under heavy load, or network is bad > etc.). > In such cases, the return value of the command {{ssh $hostIp ps -p $pid}} > will be 255 (ssh command exits with the exit status of the remote command or > with 255 if an error occurred.). > According the current logic of method {{getActionStatus()}} in > {{SshActionExecutor}}, the action status will be determined as OK which may > not be correct. -- This message was sent by Atlassian JIRA (v6.4.14#64029)