1fanwang opened a new issue, #66795: URL: https://github.com/apache/airflow/issues/66795
### Apache Airflow version main (3.x) ### Description When a KubernetesExecutor-managed worker pod terminates with `exit code 1, reason: "Error"` and no `message` set on the container status (the common case for Python import errors at task-runner startup), `AirflowKubernetesScheduler._get_pod_failure_reason()` returns a string like: ``` Pod base reason: Error ``` This is the entire payload that makes it into the scheduler log and the task event_buffer info field. Operators then have to chase the pod logs out-of-band (kubectl, audit logs, log aggregation pipeline) to find the actual traceback. ### Use case / motivation For "generic" failures (no `container.status.message`), optionally append the last N lines of the pod's logs to the failure reason string. Two new opt-in config keys on `[kubernetes_executor]`: - `failure_pod_log_lines` (int, default 0 = disabled, recommended 50-100) - `failure_log_read_timeout` (int seconds, default 5) When `failure_pod_log_lines > 0` and the failure is "generic", call `CoreV1Api.read_namespaced_pod_log(..., tail_lines=N, _request_timeout=T)` and append the result. Wrap in try/except so a read-log failure never propagates out of the failure handler. ### Related issues I have not found a tracking issue for this; happy to be pointed at one if it exists. ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's Code of Conduct -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
