I'm working on implementing AIP-76 (asset partitioning).

In thinking it through and doing a proof of concept, I made some design
decisions that I think warrant formally amending the AIP-76 and giving
notice to the community / allowing for feedback

I've documented the proposed amendment here:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-76+amendment%3A+broader+partition+awareness

Essentially, what this is about is, assets always "run" in the context of a
dag.  In fact they don't really "run", but rather tasks run and emit asset
events.  So, there's always a dag, a dag run, a task, etc.

And this remains true for the new-in-3.0 asset decorator, since it's just a
wrapper that generates a dag with one task.

So to "run an asset" for a partition, or to "schedule an asset" in a
partition-driven way, ultimately, we are running a *dag* for a partition,
and "scheduling a dag" in a partition-driven way.

Additionally, we currently allow dags to be scheduled based on assets, and
of course people will want to have this asset-driven scheduling be
partition-aware.  So this is another sense in which dags must be
partition-aware.

Which brings me to the proposed amendment.

And the amendment is essentially that, we will make DAGs explicitly
partition aware.  So that a DagRun will optionally have a partition_key.
And this partition key is how we the task (which is updating an asset)
would know what partition of the asset it is updating.  Moreover we'll
allow DAGs (even those not defined by asset decorator) to be
partition-driven, rather than logical date-driven.  And allow users to do
so via both directions: (1) schedule based on partition scheme, and (2)
schedule using a standard timetable but optionally use partitions instead
of logical date.

I welcome feedback.  The amendment doc,
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-76+amendment%3A+broader+partition+awareness,
is probably the best medium for this.

Thank you.

Daniel

Reply via email to