> it's just a relatively minor change to make XCom more powerful & handy.
My point is that if we allow this, it will absolutely be a way people use to manage state. It is a minor change at the code level but a big design change. > On 16 Nov 2025, at 02:05, Xiaodong Deng <[email protected]> wrote: > > Thanks, Ash and Jarek! > > Actually I would like to purposely avoid proposing this to be an "official" > way for supporting state management in Airflow. In the AIP, we have said > "state", but only because that's how we may have to summarise it in a > formal/academic way. > > The reality is: there are already a lot people using XCom in various and > creative ways. If we really want to enforce XCom to ONLY be for "cross-task > communication", it should have been designed in a way that a task cannot pull > its own XCom. > > Regarding other conversations about an "official" way to support state > management in Airflow, like AIP-93 Ash mentioned: I don't think what we are > proposing here interferes with it. And not mentioning what we propose here is > a quite easy change to implement before the final state management tool in > Airflow becomes available (which may take time). > > IMHO, this "persist_xcom_through_retry" boolean parameter we proposed here is > more like a long-pending feature for XCom itself. I would like to highlight > again we don't intend to make it the formal way to manage state in Airflow. > Instead, it's just a relatively minor change to make XCom more powerful & > handy. > > Please let me know your thoughts. Thanks a lot! > > > Regards, > XD > > > >> On 2025/11/14 10:01:09 Jarek Potiuk wrote: >> Quite Agree. We have discussed at least a few times state persistence >> for various use cases - and having a persistent state storage for >> Airflow is a good idea, however using Xcom for that seems like a >> terrible hack. >> >> Xcom clears its state for a reason, and Xcom is (as the name says >> defined for cross-communication between tasks - not to persist state). >> >> If we want to persist state, let's name this entity properly and have >> it do what it says it does, rather than adding complexity and >> variations to something that was designed for something different. >> >> J. >> >>> On Fri, Nov 14, 2025 at 10:30 AM Ash Berlin-Taylor <[email protected]> wrote: >>> >>> I wonder if instead of this, some of the existing state persistence >>> proposals might be better suited - as part of >>> https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-93+Asset+Watermarks+and+State+Variables >>> for instance? Possibly with alterations/extensions to it? >>> >>> (I haven’t really considered it in detail, my brain just played pattern >>> recognition.) >>> >>> -ash >>> >>>> On 14 Nov 2025, at 00:32, Xiaodong Deng <[email protected]> wrote: >>>> >>>> Hi folks! >>>> >>>> We would like to propose a new feature in Airflow, a boolean >>>> parameter "persist_xcom_through_retry" Parameter in all Airflow Operators. >>>> Our team added this feature in our internal fork a few years back, and it >>>> has been benefiting our users extensively. >>>> >>>> *I have created an AIP >>>> at >>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333 >>>> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333>*. >>>> Below is a summary (in the complete AIP, we have a more detailed problem >>>> statement and quite a few interesting use-case examples): >>>> >>>> >>>> >>>> >>>> *Traditionally, XCom is defined as “a mechanism that lets Tasks talk to >>>> each other”. However, XCom also has the capacity and potential to help >>>> persist and manage task state within a task itself.Currently, Apache >>>> Airflow automatically clears a task instance’s XCom data when it is >>>> retried. This behavior, while ensuring clean state for retry attempts, >>>> creates limitations:* >>>> >>>> - *Loss of Internal Progress: Tasks that have internal checkpointing or >>>> progress tracking lose all intermediate state on retry, forcing restart >>>> from the beginning.* >>>> - *Resource State Loss: Tasks cannot maintain state about allocated >>>> resources (compute instances, downstream job IDs, etc.) across retry >>>> attempts, leading to redundant expensive setup operations.* >>>> - *No Recovery/Resume Capability: There's no way for tasks to resume >>>> from internal checkpoints when transient failures occur during >>>> long-running atomicoperations.* >>>> - *Poor User Experience: users must implement external state management >>>> systems to work around this limitation, adding complexity to DAG >>>> authoring.* >>>> >>>> >>>> *This proposal aims at extending the capacity of XCom by allowing >>>> persisting a Task Instance’s XCom through its retries, enabling users to >>>> build more resilient and efficient pipelines. This is particularly useful >>>> for the type of tasks which are atomic (so one such task cannot be split >>>> into multiple tasks) and need to manage internal state or checkpoints. * >>>> >>>> >>>> We look forward to your feedback and thoughts. Thanks! >>>> >>>> >>>> Regards, >>>> >>>> XD >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
