The GitHub Actions job "Tests (AMD)" on airflow.git/ssh-remote-job-load-hardening has failed. Run started by GitHub user kaxil (triggered by kaxil).
Head commit for run: 58660a80d27a36b54718877c5dc2517fe0a17851 / Kaxil Naik <[email protected]> Reduce SSH connection churn in SSHRemoteJobOperator under high fan-out The operator and trigger opened a new SSH connection for every remote command. A large expand() fan-out against one host drove the connection rate past the remote sshd MaxStartups limit, which drops connections and surfaces as "paramiko ... Error reading SSH protocol banner" (an immediate EOF, not a banner timeout) at submit time, and left job directories behind when the cleanup connection was dropped too. Changes: - Trigger holds one connection for the whole poll loop instead of reconnecting per command, with bounded jittered reconnect on drops and asyncssh.Error treated as reconnectable. - Operator reuses one connection for OS detection and submission. - Cleanup retries instead of orphaning the job directory on a dropped connection. - Configurable conn_retry_attempts (operator/hook) for the submit burst, plus command_timeout and max_reconnect_attempts forwarded to the trigger. - SSHHookAsync sets a keepalive on the long-lived trigger connection. Report URL: https://github.com/apache/airflow/actions/runs/27048560316 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
