On Sun, Jan 12, 2020 at 3:59 AM Jacob Ferriero <jferri...@google.com.invalid>
wrote:

> I am also still in the "solve this in XCom" camp, but agree now that prefix
> is an awkward solution and could be difficult / confusing to users.
>

While I understand the reasons, I am really with Kaxil here on not abusing
XCom for that.
I still think of the reschedule story as not something user facing. I think
that user in their Dag
should not be able to interact with the state stored for rescheduling
(following the whole
discussion about being state-less/ful. I am strongly now in the "no state
for the users" camp.
I think we do not really want our users to shoot themselves in the
foot even if they really want
and Airflow being opinionated about it is a GoodThing(TM). Of course
whatever we come up with
you would be able to run a DB query and read the value if you really,
really want but XCom is an
official way to access state and it's a bit to easy this way. I think it
should be an implementation detail
that users should not rely nor use when writing their dags
(just use it as intended in PokeReschedule way).

The two compelling reasons  I have for keeping this in XCom are:
> 1) I think a new table for task state would be nearly identical to XCom
>

Depends. If you want to add just a field I think we have better options.


> 2) (bare with me here as this is a bit long and I am going to use the word
> "task" informally denoted by quotes)
>
> *Proposed Changes to XCom*- Add a state column (boolen or nullable string)
> - If state column is false /null then clear XCom at beginning of task
> (maintain current behavior)
>

I think if even decide to use XCom (see below) we should really avoid such
general "state"
naming. It should be feature dedicated to reschedule and any field name
should reflect that this
is its purpose and the only purpose. Generic name like "state" calls for
abuse and introduction
of real "state" which we want to avoid.


> Another interesting question is should the rescheduled task instances
> (pokes) be allowed to mutate this state? Not useful for polling for job
> completion but might be useful for other kinds of rescheduled operations
> after an initialized state.
>

Good question. I think that should be allowed to mutate it. Such id could
theoretically be changing over time
for example it could contain some "last check timestamp" or SHA or UUID of
last request or similar. It can be
helpful to optimise routes/checks for the service we interact with. Not
very likely but I think there is no harm in
being able to mutate it.

Jarek,
> As far as adding a state column to the TaskReschedule table, my
> understanding is this table keeps track of task reschedules: it's first
> entry is the first reschedule (not the first / originally scheduled task
> instance) would the state value be duplicated for each reschedule or always
> read from the first reschedule for a given task / dag / execution date?
> Having the state associated w/ reschedules but not the original task
> instance seems confusing to me (though technically possible to work with).
> I suppose I'm just reiterating what Fokko said about XCom granularity
> seeming more appropriate.
>

Agree. It's not best with TaskReschedule. I looked at the code and you are
completely right about it.

However I think we have one far better place for such new column:
TaskInstance.
It seems perfect - has the right primary key, there is only one per task
instance, we already access it while rescheduling.
How about adding new "poke_reschedule_id" column or similar to TaskInstance
?

J.


> --
>
Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to