The GitHub Actions job "Tests (AMD)" on 
airflow.git/ssh-remote-job-load-hardening has failed.
Run started by GitHub user kaxil (triggered by kaxil).

Head commit for run:
58660a80d27a36b54718877c5dc2517fe0a17851 / Kaxil Naik <[email protected]>
Reduce SSH connection churn in SSHRemoteJobOperator under high fan-out

The operator and trigger opened a new SSH connection for every remote
command. A large expand() fan-out against one host drove the connection
rate past the remote sshd MaxStartups limit, which drops connections and
surfaces as "paramiko ... Error reading SSH protocol banner" (an immediate
EOF, not a banner timeout) at submit time, and left job directories behind
when the cleanup connection was dropped too.

Changes:
- Trigger holds one connection for the whole poll loop instead of
  reconnecting per command, with bounded jittered reconnect on drops and
  asyncssh.Error treated as reconnectable.
- Operator reuses one connection for OS detection and submission.
- Cleanup retries instead of orphaning the job directory on a dropped
  connection.
- Configurable conn_retry_attempts (operator/hook) for the submit burst,
  plus command_timeout and max_reconnect_attempts forwarded to the trigger.
- SSHHookAsync sets a keepalive on the long-lived trigger connection.

Report URL: https://github.com/apache/airflow/actions/runs/27048560316

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to