Thanks Jacob for building the document. I think we're on the right track. I've added some comments and clarification to the document, to validate we're looking in the same direction. Would love to get more people's opinion on this.
Cheers, Fokko Op wo 8 jan. 2020 om 03:31 schreef Jacob Ferriero <jferri...@google.com.invalid>: > Image not working on dev list here is link to the github review comment > containing said image: > https://github.com/apache/airflow/pull/6370#issuecomment-546582724. > > On Tue, Jan 7, 2020 at 5:40 PM Jacob Ferriero <jferri...@google.com> > wrote: > >> Hello Dev List, >> >> The inspiration for this is to allow operators to start a long running >> task on an external system and reschedule pokes for completion (e.g spark >> job on dataproc), instead of blocking a worker (sketched out in #6210 >> <https://github.com/apache/airflow/pull/6210>) to allow freeing up of >> slots between pokes. To do this requires supporting a method for storing >> task state between reschedules. >> It's worth noting that a task would maintain state only during >> reschedules but clear state on retries. In this way the task is idempotent >> before reaching a terminal state [SUCCES, FAIL, UP_FOR_RETRY]. This brings >> up a question of the scope of commitment to idempotency of operators. If it >> is deemed acceptable for reschedules to maintain some state, then we can >> free up workers between pokes. >> >> Because this is very similar to the purpose of XCom it's been postulated >> that we should support this behavior in XCom rather than provide a new >> model in the db for TaskState. (Though discussion here on which is more >> appropriate is more than welcome.) >> >> I'd like to put forward a proposal to resurrect the reverted #6370 >> <https://github.com/apache/airflow/pull/6370> in order to provide a >> modification to the lifetime of XComs under certain conditions. The diagram >> below helps illustrate the change originally proposed in #6370. There was >> concern about changing existing behavior (potentially breaking) and the >> fact that this makes operators stateful. Per the review comments and an >> informal discussion (meetings notes >> <https://docs.google.com/document/d/1uuNCPAcwnn0smcDUJPDFMMjrK-z6Z0osesPG7jVZ3oU/edit#> >> and #sig-async-operators) I'd like to modify the approach #6370 to only >> skip clearing of XCom if the Xom key is prefixed with >> `airflow.models.xcom.DO_NOT_CLEAR_PREFIX = "_STATEFUL_"` or similar. >> >> [image: image.png] >> -- >> >> *Jacob Ferriero* >> >> Strategic Cloud Engineer: Data Engineering >> >> jferri...@google.com >> >> 617-714-2509 <(617)%20714-2509> >> > > > -- > > *Jacob Ferriero* > > Strategic Cloud Engineer: Data Engineering > > jferri...@google.com > > 617-714-2509 >