Re: AIP - Add "persist_xcom_through_retry" Parameter to Airflow Operators

Jarek Potiuk Fri, 14 Nov 2025 02:02:48 -0800

Quite Agree. We have discussed at least a few times state persistence
for various use cases - and having a persistent state storage for
Airflow is a good idea, however using Xcom for that seems like a
terrible hack.


Xcom clears its state for a reason, and Xcom is (as the name says
defined for cross-communication between tasks - not to persist state).

If we want to persist state, let's name this entity properly and have
it do what it says it does, rather than adding complexity and
variations to something that was designed for something different.

J.

On Fri, Nov 14, 2025 at 10:30 AM Ash Berlin-Taylor <[email protected]> wrote:
>
> I wonder if instead of this, some of the existing state persistence proposals 
> might be better suited - as part of 
> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-93+Asset+Watermarks+and+State+Variables
>  for instance? Possibly with alterations/extensions to it?
>
> (I haven’t really considered it in detail, my brain just played pattern 
> recognition.)
>
> -ash
>
> > On 14 Nov 2025, at 00:32, Xiaodong Deng <[email protected]> wrote:
> >
> > Hi folks!
> >
> > We would like to propose a new feature in Airflow, a boolean
> > parameter  "persist_xcom_through_retry" Parameter in all Airflow Operators.
> > Our team added this feature in our internal fork a few years back, and it
> > has been benefiting our users extensively.
> >
> > *I have created an AIP
> > at 
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333
> > <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333>*.
> > Below is a summary (in the complete AIP, we have a more detailed problem
> > statement and quite a few interesting use-case examples):
> >
> >
> >
> >
> > *Traditionally, XCom is defined as “a mechanism that lets Tasks talk to
> > each other”. However, XCom also has the capacity and potential to help
> > persist and manage task state within a task itself.Currently, Apache
> > Airflow automatically clears a task instance’s XCom data when it is
> > retried. This behavior, while ensuring clean state for retry attempts,
> > creates limitations:*
> >
> >   - *Loss of Internal Progress: Tasks that have internal checkpointing or
> >   progress tracking lose all intermediate state on retry, forcing restart
> >   from the beginning.*
> >   - *Resource State Loss: Tasks cannot maintain state about allocated
> >   resources (compute instances, downstream job IDs, etc.) across retry
> >   attempts, leading to redundant expensive setup operations.*
> >   - *No Recovery/Resume Capability: There's no way for tasks to resume
> >   from internal checkpoints when transient failures occur during
> >   long-running atomicoperations.*
> >   - *Poor User Experience: users must implement external state management
> >   systems to work around this limitation, adding complexity to DAG 
> > authoring.*
> >
> >
> > *This proposal aims at extending the capacity of XCom by allowing
> > persisting a Task Instance’s XCom through its retries, enabling users to
> > build more resilient and efficient pipelines. This is particularly useful
> > for the type of tasks which are atomic (so one such task cannot be split
> > into multiple tasks) and need to manage internal state or checkpoints. *
> >
> >
> > We look forward to your feedback and thoughts. Thanks!
> >
> >
> > Regards,
> >
> > XD
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: AIP - Add "persist_xcom_through_retry" Parameter to Airflow Operators

Reply via email to