For context, the reason Malthe is proposing something like this, and
doesn't want to use the "existing" approach of a BranchOperator or
similar is optimization: Having to spin up a task to make a decision
is, in many cases, not necessary and the scheduler could make this
decision quickly.
(This is along similar lines to why we no longer schedule or actually
run DummyOperator but just mark it as success directly in the
scheduler.)
AIP-39 is a little unclear on how the new "logical_date" value changes
with the different timetable implementations or if it's simply used
internally for sorting purposes and not meaningful on its own. For
this proposal to work, there has to be a well-defined "execution date"
that we can compare against.
data_interval_start and/or data_interval_end are the dates you should
use for such a purpose
Please don't use the term execution date -- it is too overloaded and
confusing.
-ash
On Mon, Oct 18 2021 at 21:17:22 +0000, Malthe <[email protected]> wrote:
While AIP-39 provides an interface for more powerful pluggable
scheduling behaviours, there is no such interface to control
task-level scheduling – or more specifically, the ability to control
which DAG runs to skip.
Examples:
- Skip task execution on certain days
- Skip task execution on certain hours which could vary from day to
day
Whether or not child tasks would be affected by such task scheduling
depends on the trigger rule configured on those tasks (e.g.
"all_success", "all_done").
The interface might consist of both an include and exclude expression
– by default all executions would be included and none excluded.
In both cases, the scheduling could be a cron expression but the
interface should again support more powerful behaviors.
It should be evident from the task execution details why the task was
skipped – the interface should provide the necessary string
representation functionality.
AIP-39 is a little unclear on how the new "logical_date" value changes
with the different timetable implementations or if it's simply used
internally for sorting purposes and not meaningful on its own. For
this proposal to work, there has to be a well-defined "execution date"
that we can compare against.