On Sun, Jan 12, 2020 at 3:59 AM Jacob Ferriero <[email protected]> wrote:
> I am also still in the "solve this in XCom" camp, but agree now that prefix > is an awkward solution and could be difficult / confusing to users. > While I understand the reasons, I am really with Kaxil here on not abusing XCom for that. I still think of the reschedule story as not something user facing. I think that user in their Dag should not be able to interact with the state stored for rescheduling (following the whole discussion about being state-less/ful. I am strongly now in the "no state for the users" camp. I think we do not really want our users to shoot themselves in the foot even if they really want and Airflow being opinionated about it is a GoodThing(TM). Of course whatever we come up with you would be able to run a DB query and read the value if you really, really want but XCom is an official way to access state and it's a bit to easy this way. I think it should be an implementation detail that users should not rely nor use when writing their dags (just use it as intended in PokeReschedule way). The two compelling reasons I have for keeping this in XCom are: > 1) I think a new table for task state would be nearly identical to XCom > Depends. If you want to add just a field I think we have better options. > 2) (bare with me here as this is a bit long and I am going to use the word > "task" informally denoted by quotes) > > *Proposed Changes to XCom*- Add a state column (boolen or nullable string) > - If state column is false /null then clear XCom at beginning of task > (maintain current behavior) > I think if even decide to use XCom (see below) we should really avoid such general "state" naming. It should be feature dedicated to reschedule and any field name should reflect that this is its purpose and the only purpose. Generic name like "state" calls for abuse and introduction of real "state" which we want to avoid. > Another interesting question is should the rescheduled task instances > (pokes) be allowed to mutate this state? Not useful for polling for job > completion but might be useful for other kinds of rescheduled operations > after an initialized state. > Good question. I think that should be allowed to mutate it. Such id could theoretically be changing over time for example it could contain some "last check timestamp" or SHA or UUID of last request or similar. It can be helpful to optimise routes/checks for the service we interact with. Not very likely but I think there is no harm in being able to mutate it. Jarek, > As far as adding a state column to the TaskReschedule table, my > understanding is this table keeps track of task reschedules: it's first > entry is the first reschedule (not the first / originally scheduled task > instance) would the state value be duplicated for each reschedule or always > read from the first reschedule for a given task / dag / execution date? > Having the state associated w/ reschedules but not the original task > instance seems confusing to me (though technically possible to work with). > I suppose I'm just reiterating what Fokko said about XCom granularity > seeming more appropriate. > Agree. It's not best with TaskReschedule. I looked at the code and you are completely right about it. However I think we have one far better place for such new column: TaskInstance. It seems perfect - has the right primary key, there is only one per task instance, we already access it while rescheduling. How about adding new "poke_reschedule_id" column or similar to TaskInstance ? J. > -- > Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
