WangMeng created OOZIE-2495:
-------------------------------

             Summary: change action status from  ErrorType.NON_TRANSIENT to 
TRANSIENT when SSH action occurs AUTH_FAILED occasionally
                 Key: OOZIE-2495
                 URL: https://issues.apache.org/jira/browse/OOZIE-2495
             Project: Oozie
          Issue Type: Improvement
          Components: action
    Affects Versions: 4.2.0
            Reporter: WangMeng


For SSH action , it failed occasionally with the following exception :
   AUTH_FAILED: Not able to perform operation [ssh -o   
PasswordAuthentication=no -o KbdInteractiveDevices=no -o 
StrictHostKeyChecking=no -o ConnectTimeout=20 u...@xxx.xx.xx.xxx mkdir -p 
oozie-oozi/0000067-130808155814753-oozie-oozi-W/sshjob--ssh/ ] | EErrorStream: 
Warning: Permanently added (RSA) to the list of known hosts. 
    While I execute the same command  by hand  in Oozie server host , it worked.
    Except the incorrect ssh settings ,the reason causing the exception may 
also be SSH client load is too high when connected, network jitter or others. 
    Once connect failed ,oozie will change its status to  
ErrorType.NON_TRANSIENT ,suspend this action and do not retry it although I 
have set up retry times.
    When it occurs ,I think changing the action status from  
ErrorType.NON_TRANSIENT to TRANSIENT may be better , thiscan let action retry 
automaticly before it be suspended, which can deal with occasionally connect 
error .




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to