Hello Dev List,

The inspiration for this is to allow operators to start a long running task
on an external system and reschedule pokes for completion (e.g spark job on
dataproc), instead of blocking a worker (sketched out in #6210
<https://github.com/apache/airflow/pull/6210>) to allow freeing up of slots
between pokes. To do this requires supporting a method for storing task
state between reschedules.
It's worth noting that a task would maintain state only during reschedules
but clear state on retries. In this way the task is idempotent before
reaching a terminal state [SUCCES, FAIL, UP_FOR_RETRY]. This brings up a
question of the scope of commitment to idempotency of operators. If it is
deemed acceptable for reschedules to maintain some state, then we can free
up workers between pokes.

Because this is very similar to the purpose of XCom it's been postulated
that we should support this behavior in XCom rather than provide a new
model in the db for TaskState. (Though discussion here on which is more
appropriate is more than welcome.)

I'd like to put forward a proposal to resurrect the reverted #6370
<https://github.com/apache/airflow/pull/6370> in order to provide a
modification to the lifetime of XComs under certain conditions. The diagram
below helps illustrate the change originally proposed in #6370. There was
concern about changing existing behavior (potentially breaking) and the
fact that this makes operators stateful. Per the review comments and an
informal discussion (meetings notes
<https://docs.google.com/document/d/1uuNCPAcwnn0smcDUJPDFMMjrK-z6Z0osesPG7jVZ3oU/edit#>
and #sig-async-operators) I'd like to modify the approach #6370 to only
skip clearing of XCom if the Xom key is prefixed with
`airflow.models.xcom.DO_NOT_CLEAR_PREFIX = "_STATEFUL_"` or similar.

[image: image.png]
-- 

*Jacob Ferriero*

Strategic Cloud Engineer: Data Engineering

[email protected]

617-714-2509 <(617)%20714-2509>

Reply via email to