pankajkoti opened a new issue, #66416:
URL: https://github.com/apache/airflow/issues/66416

   ### Under which category would you file this issue?
   
   Task SDK
   
   ### Apache Airflow version
   
   3.2.1
   
   ### What happened and how to reproduce it?
   
    When a task whose operator sets `overwrite_rtif_after_execution = True` 
raises an exception during `execute()`, the task supervisor/finalize path 
attempts to update the rendered template fields *after* the failure has already 
been reported. The SDK then sends a request to the API server that is no longer 
valid for the TI's state and gets back:
   `AirflowRuntimeError: API_SERVER_ERROR: {'status_code': 404, 'message': 'Not 
Found', 'detail':
     {'detail': 'Not Found'}}`
   
   This surfaces as a top-level error in the task log right after the original 
`RuntimeError`, so the user sees two stacked tracebacks where they should see 
only the original failure. This also affects remote logging where users don't 
see the remote logs upon retries for the earlier failed attempts because maybe 
the upload to remote logging is aborted/does not happen(?).
   
   This was originally reported by Cosmos users in 
astronomer/astronomer-cosmos#2021 because the Cosmos local execution operator 
opts in to `overwrite_rtif_after_execution = True` on Airflow 3.x to refresh 
the rendered `compiled_sql` after the dbt invocation. However, the failure is 
not Cosmos-specific: any operator that sets this flag and then raises will hit 
the same path.
   
   #62070 wrapped the `SetRenderedFields` call in `finalize()` with try/except 
so the original task failure is not masked. #63705 then simplified the error 
logging to avoid a `RecursionError` in the `structlog` JSON fallback when the 
error context is logged. Even with both merged, on `3.2.1` we still see `Failed 
to set rendered fields during finalization` followed by `AirflowRuntimeError: 
API_SERVER_ERROR: 404 Not Found`. #63719 ("Only update RTIF for terminal task 
states") is also being attempted as a solution but it's in draft.
   
   ### Minimal reproduction (no Cosmos required)
   
     ```python
     # dags/repro_rtif_finalize.py
     from __future__ import annotations
   
     import pendulum
     from airflow.sdk import DAG
     from airflow.sdk.bases.operator import BaseOperator
   
   
     class FailingOverwriteRTIFOperator(BaseOperator):
         """Minimal operator that triggers the finalize-time RTIF update 
path."""
   
         template_fields = ("message",)
         overwrite_rtif_after_execution = True
   
         def __init__(self, *, message: str = "hello {{ ds }}", **kwargs):
             super().__init__(**kwargs)
             self.message = message
   
         def execute(self, context):
             # Simulate any runtime failure during execute (DB error, network, 
etc.)
             raise RuntimeError("Intentional failure to reproduce RTIF finalize 
bug")
   
   
     with DAG(
         dag_id="repro_rtif_finalize",
         start_date=pendulum.datetime(2026, 1, 1, tz="UTC"),
         schedule=None,
         catchup=False,
     ):
         FailingOverwriteRTIFOperator(task_id="boom")
   ```
   
   ###  How to reproduce
   
   1. Drop the DAG above into a fresh Airflow 3.x environment (no special 
executor or logging configuration required).
   2. Trigger repro_rtif_finalize once.
   3. Look at the task log for the first attempt. You will see the  intended 
RuntimeError from execute(), followed by:
       - on 3.1.0–3.1.7: Top level error: `AirflowRuntimeError: 
API_SERVER_ERROR: {'status_code': 404,
     'message': 'Not Found', 'detail': {'detail': 'Not Found'}}`
       - on 3.1.8+ (with #62070 + #63705 applied):
     `Failed to set rendered fields during finalization` ... 
`AirflowRuntimeError: API_SERVER_ERROR: 404 Not
      Found`
   
   
   ### What you think should happen instead?
   
   A failing task whose operator declares `overwrite_rtif_after_execution = 
True` should not produce a finalize-time `AirflowRuntimeError` i'm. 
Conceptually, rendered template fields should not be re-pushed to the API 
server when the TI has already moved into a failure state for which that 
endpoint is not valid (this matches the direction of #63719) or a better 
solution?
   
   ### Operating System
   
   _No response_
   
   ### Deployment
   
   None
   
   ### Apache Airflow Provider(s)
   
   _No response_
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Official Helm Chart version
   
   Not Applicable
   
   ### Kubernetes Version
   
   _No response_
   
   ### Helm Chart configuration
   
   _No response_
   
   ### Docker Image customizations
   
   _No response_
   
   ### Anything else?
   
   
     - Original Cosmos report with the full traceback: 
astronomer/astronomer-cosmos#2021
     - Related PRs: #62070 (merged, 3.1.8), #63705 (merged), #63719 (draft).
     - Cosmos call site that opts in to the flag for context: 
`cosmos/operators/local.py` `_override_rtif` 
(`self.overwrite_rtif_after_execution = True` on Airflow 3.x) -> 
https://github.com/astronomer/astronomer-cosmos/blob/d33115b69da5573b33123c310a5a7b6fbc02a364/cosmos/operators/local.py#L420.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to