Some notes and ideas:

*DAG Fingerprinting: *this can be tricky, especially in regards to dynamic
DAGs, where in some cases each parsing of the DAG can result in a different
fingerprint. I think DAG and tasks attributes are left out from the
proposal that should be considered as part of the fingerprint, like trigger
rules or task start/end datetime. We should do a full pass of all DAG
arguments and make sure we're not forgetting anything that can change
scheduling logic. Also, let's be careful that something as simple as a
dynamic start or end date on a task could lead to a different version each
time you parse. I'd recommend limiting serialization/storage of one version
per DAG Run, as opposed to potentially everytime the DAG is parsed - once
the version for a DAG run is pinned, fingerprinting is not re-evaluated
until the next DAG run is ready to get created.

*Visualizing change in the tree view:* I think this is very complex and
many things can make this view impossible to render (task dependency
reversal, cycles across versions, ...). Maybe a better visual approach
would be to render independent, individual tree views for each DAG version
(side by side), and doing best effort aligning the tasks across blocks and
"linking" tasks with lines across blocks when necessary.

On Fri, Jul 24, 2020 at 12:46 PM Vikram Koka <vik...@astronomer.io> wrote:

> Team,
>
>
>
> We just created 'AIP-36 DAG Versioning' on Confluence and would very much
> appreciate feedback and suggestions from the community.
>
>
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-36+DAG+Versioning
>
>
>
> The DAG Versioning concept has been discussed on multiple occasions in the
> past and has been a topic highlighted as part of Airflow 2.0 as well. We at
> Astronomer have heard data engineers at several enterprises ask about this
> feature as well, for easier debugging when changes are made to DAGs as a
> result of evolving business needs.
>
>
> As described in the AIP, we have a proposal focused on ensuring that the
> visibility behaviour of Airflow is correct, without changing the execution
> behaviour. We considered changing the execution behaviour as well, but
> decided that the risks in changing execution behavior were too high as
> compared to the benefits and therefore decided to limit the scope to only
> making sure that the visibility was correct.
>
>
> We would like to attempt this based on our experience running Airflow as a
> service. We believe that this benefits Airflow as a project and the
> development experience of data engineers using Airflow across the world.
>
>
>  Any feedback, suggestions, and comments would be greatly appreciated.
>
>
>
> Best Regards,
>
>
> Kaxil Naik, Ryan Hamilton, Ash Berlin-Taylor, and Vikram Koka
>

Reply via email to