What is the motivation behind storing internal state in a task, instead of 
splitting the logic on state boundaries into multiple tasks? That’s what the 
task abstraction is supposed for, and you wouldn’t need to a separate mechanism 
for that—regular XCom would just work.

While storing state is a legitimate use case, I feel this particular idea would 
have a more negative impact on encouraging people to do too many things in one 
task. I’d even argue the examples given in the Confluence document are already 
so.

TP


> On 14 Nov 2025, at 08:32, Xiaodong Deng <[email protected]> wrote:
> 
> Hi folks!
> 
> We would like to propose a new feature in Airflow, a boolean
> parameter  "persist_xcom_through_retry" Parameter in all Airflow Operators.
> Our team added this feature in our internal fork a few years back, and it
> has been benefiting our users extensively.
> 
> *I have created an AIP
> at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333
> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333>*.
> Below is a summary (in the complete AIP, we have a more detailed problem
> statement and quite a few interesting use-case examples):
> 
> 
> 
> 
> *Traditionally, XCom is defined as “a mechanism that lets Tasks talk to
> each other”. However, XCom also has the capacity and potential to help
> persist and manage task state within a task itself.Currently, Apache
> Airflow automatically clears a task instance’s XCom data when it is
> retried. This behavior, while ensuring clean state for retry attempts,
> creates limitations:*
> 
>   - *Loss of Internal Progress: Tasks that have internal checkpointing or
>   progress tracking lose all intermediate state on retry, forcing restart
>   from the beginning.*
>   - *Resource State Loss: Tasks cannot maintain state about allocated
>   resources (compute instances, downstream job IDs, etc.) across retry
>   attempts, leading to redundant expensive setup operations.*
>   - *No Recovery/Resume Capability: There's no way for tasks to resume
>   from internal checkpoints when transient failures occur during
>   long-running atomicoperations.*
>   - *Poor User Experience: users must implement external state management
>   systems to work around this limitation, adding complexity to DAG authoring.*
> 
> 
> *This proposal aims at extending the capacity of XCom by allowing
> persisting a Task Instance’s XCom through its retries, enabling users to
> build more resilient and efficient pipelines. This is particularly useful
> for the type of tasks which are atomic (so one such task cannot be split
> into multiple tasks) and need to manage internal state or checkpoints. *
> 
> 
> We look forward to your feedback and thoughts. Thanks!
> 
> 
> Regards,
> 
> XD


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to