We had an outage last night that was rather complex and difficult to debug. Rather than just writing up the bug, I included what we did for various debug steps. Hope some folks who are also cluster maintainers may find it interesting!
https://issues.apache.org/jira/browse/AIRFLOW-5238