WangMeng created OOZIE-2495: ------------------------------- Summary: change action status from ErrorType.NON_TRANSIENT to TRANSIENT when SSH action occurs AUTH_FAILED occasionally Key: OOZIE-2495 URL: https://issues.apache.org/jira/browse/OOZIE-2495 Project: Oozie Issue Type: Improvement Components: action Affects Versions: 4.2.0 Reporter: WangMeng
For SSH action , it failed occasionally with the following exception : AUTH_FAILED: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 u...@xxx.xx.xx.xxx mkdir -p oozie-oozi/0000067-130808155814753-oozie-oozi-W/sshjob--ssh/ ] | EErrorStream: Warning: Permanently added (RSA) to the list of known hosts. While I execute the same command by hand in Oozie server host , it worked. Except the incorrect ssh settings ,the reason causing the exception may also be SSH client load is too high when connected, network jitter or others. Once connect failed ,oozie will change its status to ErrorType.NON_TRANSIENT ,suspend this action and do not retry it although I have set up retry times. When it occurs ,I think changing the action status from ErrorType.NON_TRANSIENT to TRANSIENT may be better , thiscan let action retry automaticly before it be suspended, which can deal with occasionally connect error . -- This message was sent by Atlassian JIRA (v6.3.4#6332)