We use Ansible to deploy code updates across a small fleet (~8 machines). 
At least a few times a week, we run into network hiccups that cause the SSH 
connection to a random EC2 instance to fail, causing the entire playbook 
run to fail. Sometimes this happens such that we are left with an 
incomplete deploy, which is no fun. In almost all cases we can immediately 
re-launch the playbook and the errant instance is fine the second time 
around. These appear to be very short interruptions, and there's no rhyme 
or reason as to which instance it effects. It's usually only one instance 
out of our fleet at a time (though there's no pattern as to which has 
connectivity issues).

What kind of strategies is everyone using to deal with these sort of 
sporadic SSH failures that cause the whole playbook run to fail prematurely?

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ansible-project+unsubscr...@googlegroups.com.
To post to this group, send email to ansible-project@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/c91a5b9d-3cf3-4efe-93ac-17c7e7f107e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to