In addition, I understand we would like to stick to certain design/principles.
However, if that is blocking certain reasonable use cases, either alternative
solutions need to be provided or "principles" need to be adjusted.
That's what I'm hoping for here.
Thanks again!
Regards,
XD
On 2025/11/18 22:20:36 Xiaodong Deng wrote:
> Thanks for your valuable feedback, folks.
>
> Hi @TP,
>
> There are cases where breaking down to multiple tasks is not feasible or not
> the best option. For example, the use case 1 I have shared in the Confluence
> doc appendix.
>
> There are also examples where splitting into multiple tasks may seem make
> sense but may cause down-side effect. In use case 2 and 4 in the Confluence
> doc appendix, I shared why we do it in a single task instead of splitting
> them into two tasks.
>
> Some tasks are simply atomic.
>
>
> Hi @Jarek,
>
> I'm glad we are talking about idempotency. That's exactly why sometimes we
> cannot break down some tasks. In the "Problem Examples" section in the
> Confluence doc, I covered that at some extent.
>
> Would love to discuss more on this, or learn from you for any alternative
> solutions which can become available to Airflow users in a timely manner.
>
> Many thanks!
>
>
> Regards,
> XD
>
> On 2025/11/16 09:48:10 Jarek Potiuk wrote:
> > I agree with TP wholeheartedly. The basic reason why XCom is deleted when
> > restarting is to maintain idempotency principles. And if we allow XCom to
> > be used to break idempotency (that's basically what state per task is
> > about) - then XCom will stop serving its purpose.
> >
> > And of course - we are in the new "world" where we are not only supporting
> > idempotent tasks, Various optimisations and different kinds of workloads
> > require breaking the "old" idempotency rules we used to have when Airflow
> > was used mainly for ETL. And deletion of XCom state was also questioned
> > back then because people **wanted** to use Xcom in other ways. But we held
> > strongly and I think that was a good choice.
> >
> > And while repurposing XCom to do "something" else might seem like a good
> > idea - even for Apple, because they could internally agree to some
> > convention and use it as "solution". But when you look at Airflow as a
> > product, repurposing XCome to also do something else (i.e. storing state)
> > seems a bit "lazy" and "short-cut-y".
> >
> > What does it save if you do it this way? Few things:
> >
> > * not having to do database migration to implement new feature
> > * avoiding having a clearly defined API where state can be stored for
> > various purposes on different levels (Task Instance, Task, Task Group
> > maybe, Dag, Team eventually)
> > * avoiding to think and prepare for all the various use cases that people
> > really would like to use it
> > * avoiding to write the use-case documentation explaining how you can use
> > state
> > * avoiding to write all the test cases making sure that all those use cases
> > are served way
> > * not thinking too much about performance and security implications of
> > those ("Xcom has it already sorted out, I am sure it's going to be fine")
> >
> > Yes, it can be done way faster this way. and I understand some commercial
> > users could have chosen this way as a shortcut to handle a specific use
> > case they had in mind. This is absolutely understandable, and this is what
> > I would even expect a for-profit company to do to increase so-called
> > "time-to-market" and start reaping the benefits of it faster.
> >
> > But should we do it in Airflow the same way ? We are not a for-profit
> > company, time-to-market of such a feature is secondary, compared to the
> > stability, maintainability and having a "product" vision.
> > I consider all the above points as absolutely crucial properties of a
> > "product" - which Airflow is. They might not be needed in a "solution", but
> > having a good "product" - absolutely requires all those things,
> >
> > When we switched to Airflow 3, one of the ideas was to remove all the bad
> > "solution-y" decisions we made in the past that slowed us down in general
> > and - more importantly - turned us into (as Daniel used to say) into
> > "back-compatibility engineers"
> >
> > Does it mean it will take longer and require more dedication and effort
> > and discussions to agree on the scope ? Absolutely. Is this a bad thing? I
> > don't think so.
> >
> > J.
> >
> >
> > On Sun, Nov 16, 2025 at 9:43 AM Tzu-ping Chung via dev <
> > [email protected]> wrote:
> >
> > > What is the motivation behind storing internal state in a task, instead of
> > > splitting the logic on state boundaries into multiple tasks? That’s what
> > > the task abstraction is supposed for, and you wouldn’t need to a separate
> > > mechanism for that—regular XCom would just work.
> > >
> > > While storing state is a legitimate use case, I feel this particular idea
> > > would have a more negative impact on encouraging people to do too many
> > > things in one task. I’d even argue the examples given in the Confluence
> > > document are already so.
> > >
> > > TP
> > >
> > >
> > > > On 14 Nov 2025, at 08:32, Xiaodong Deng <[email protected]> wrote:
> > > >
> > > > Hi folks!
> > > >
> > > > We would like to propose a new feature in Airflow, a boolean
> > > > parameter "persist_xcom_through_retry" Parameter in all Airflow
> > > Operators.
> > > > Our team added this feature in our internal fork a few years back, and
> > > > it
> > > > has been benefiting our users extensively.
> > > >
> > > > *I have created an AIP
> > > > at
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333
> > > > <
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399278333
> > > >*.
> > > > Below is a summary (in the complete AIP, we have a more detailed problem
> > > > statement and quite a few interesting use-case examples):
> > > >
> > > >
> > > >
> > > >
> > > > *Traditionally, XCom is defined as “a mechanism that lets Tasks talk to
> > > > each other”. However, XCom also has the capacity and potential to help
> > > > persist and manage task state within a task itself.Currently, Apache
> > > > Airflow automatically clears a task instance’s XCom data when it is
> > > > retried. This behavior, while ensuring clean state for retry attempts,
> > > > creates limitations:*
> > > >
> > > > - *Loss of Internal Progress: Tasks that have internal checkpointing
> > > > or
> > > > progress tracking lose all intermediate state on retry, forcing
> > > > restart
> > > > from the beginning.*
> > > > - *Resource State Loss: Tasks cannot maintain state about allocated
> > > > resources (compute instances, downstream job IDs, etc.) across retry
> > > > attempts, leading to redundant expensive setup operations.*
> > > > - *No Recovery/Resume Capability: There's no way for tasks to resume
> > > > from internal checkpoints when transient failures occur during
> > > > long-running atomicoperations.*
> > > > - *Poor User Experience: users must implement external state
> > > > management
> > > > systems to work around this limitation, adding complexity to DAG
> > > authoring.*
> > > >
> > > >
> > > > *This proposal aims at extending the capacity of XCom by allowing
> > > > persisting a Task Instance’s XCom through its retries, enabling users to
> > > > build more resilient and efficient pipelines. This is particularly
> > > > useful
> > > > for the type of tasks which are atomic (so one such task cannot be split
> > > > into multiple tasks) and need to manage internal state or checkpoints. *
> > > >
> > > >
> > > > We look forward to your feedback and thoughts. Thanks!
> > > >
> > > >
> > > > Regards,
> > > >
> > > > XD
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]