Re: AIP - Add "persist_xcom_through_retry" Parameter to Airflow Operators

Ash Berlin-Taylor Sun, 16 Nov 2025 00:29:22 -0800

> it's just a relatively minor change to make XCom more powerful & handy.


My point is that if we allow this, it will absolutely be a way people use to 
manage state. It is a minor change at the code level but a big design change. 

> On 16 Nov 2025, at 02:05, Xiaodong Deng <[email protected]> wrote:
> 
> Thanks, Ash and Jarek!
> 
> Actually I would like to purposely avoid proposing this to be an "official" 
> way for supporting state management in Airflow. In the AIP, we have said 
> "state", but only because that's how we may have to summarise it in a 
> formal/academic way.
> 
> The reality is: there are already a lot people using XCom in various and 
> creative ways. If we really want to enforce XCom to ONLY be for "cross-task 
> communication", it should have been designed in a way that a task cannot pull 
> its own XCom.
> 
> Regarding other conversations about an "official" way to support state 
> management in Airflow, like AIP-93 Ash mentioned: I don't think what we are 
> proposing here interferes with it. And not mentioning what we propose here is 
> a quite easy change to implement before the final state management tool in 
> Airflow becomes available (which may take time).
> 
> IMHO, this "persist_xcom_through_retry" boolean parameter we proposed here is 
> more like a long-pending feature for XCom itself. I would like to highlight 
> again we don't intend to make it the formal way to manage state in Airflow. 
> Instead, it's just a relatively minor change to make XCom more powerful & 
> handy.
> 
> Please let me know your thoughts. Thanks a lot!
> 
> 
> Regards,
> XD
> 
> 
> 
>> On 2025/11/14 10:01:09 Jarek Potiuk wrote:
>> Quite Agree. We have discussed at least a few times state persistence
>> for various use cases - and having a persistent state storage for
>> Airflow is a good idea, however using Xcom for that seems like a
>> terrible hack.
>> 
>> Xcom clears its state for a reason, and Xcom is (as the name says
>> defined for cross-communication between tasks - not to persist state).
>> 
>> If we want to persist state, let's name this entity properly and have
>> it do what it says it does, rather than adding complexity and
>> variations to something that was designed for something different.
>> 
>> J.
>> 
>>> On Fri, Nov 14, 2025 at 10:30 AM Ash Berlin-Taylor <[email protected]> wrote:
>>> 
>>> I wonder if instead of this, some of the existing state persistence 
>>> proposals might be better suited - as part of 
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-93+Asset+Watermarks+and+State+Variables
>>>  for instance? Possibly with alterations/extensions to it?
>>> 
>>> (I haven’t really considered it in detail, my brain just played pattern 
>>> recognition.)
>>> 
>>> -ash
>>> 
>>>> On 14 Nov 2025, at 00:32, Xiaodong Deng <[email protected]> wrote:
>>>> 
>>>> Hi folks!
>>>> 
>>>> We would like to propose a new feature in Airflow, a boolean
>>>> parameter  "persist_xcom_through_retry" Parameter in all Airflow Operators.
>>>> Our team added this feature in our internal fork a few years back, and it
>>>> has been benefiting our users extensively.
>>>> 
>>>> *I have created an AIP
>>>> at 
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333
>>>> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333>*.
>>>> Below is a summary (in the complete AIP, we have a more detailed problem
>>>> statement and quite a few interesting use-case examples):
>>>> 
>>>> 
>>>> 
>>>> 
>>>> *Traditionally, XCom is defined as “a mechanism that lets Tasks talk to
>>>> each other”. However, XCom also has the capacity and potential to help
>>>> persist and manage task state within a task itself.Currently, Apache
>>>> Airflow automatically clears a task instance’s XCom data when it is
>>>> retried. This behavior, while ensuring clean state for retry attempts,
>>>> creates limitations:*
>>>> 
>>>>  - *Loss of Internal Progress: Tasks that have internal checkpointing or
>>>>  progress tracking lose all intermediate state on retry, forcing restart
>>>>  from the beginning.*
>>>>  - *Resource State Loss: Tasks cannot maintain state about allocated
>>>>  resources (compute instances, downstream job IDs, etc.) across retry
>>>>  attempts, leading to redundant expensive setup operations.*
>>>>  - *No Recovery/Resume Capability: There's no way for tasks to resume
>>>>  from internal checkpoints when transient failures occur during
>>>>  long-running atomicoperations.*
>>>>  - *Poor User Experience: users must implement external state management
>>>>  systems to work around this limitation, adding complexity to DAG 
>>>> authoring.*
>>>> 
>>>> 
>>>> *This proposal aims at extending the capacity of XCom by allowing
>>>> persisting a Task Instance’s XCom through its retries, enabling users to
>>>> build more resilient and efficient pipelines. This is particularly useful
>>>> for the type of tasks which are atomic (so one such task cannot be split
>>>> into multiple tasks) and need to manage internal state or checkpoints. *
>>>> 
>>>> 
>>>> We look forward to your feedback and thoughts. Thanks!
>>>> 
>>>> 
>>>> Regards,
>>>> 
>>>> XD
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: AIP - Add "persist_xcom_through_retry" Parameter to Airflow Operators

Reply via email to