Changing the code view to show the YAML is now "relatively" easy to
achieve, at least from the webserver point of view, as since 2.0 it
doesn't read the files on disk, but from the DB.
There's a lot of details, but changing the way these DagCode rows are
written could be achievable whilst still keeping the "there must be a
python file to generate the dag".
-ash
On Fri, Aug 20 2021 at 20:41:55 +0000, "Shaw, Damian P."
<[email protected]> wrote:
I’d personally find this very useful. There’s usually extra
information I have about the DAG, and the current “docs_md” is
usually not nearly sufficient enough as it’s poorly placed so if I
start adding a lot of info it gets in the way of the regular UI. Also
last I tested the markdown formatting didn’t work and neither did
the other formatter options.
But I’m not sure how much other people have demand for this.
Thanks,
Damian
*From:*Collin McNulty <[email protected]>
*Sent:* Friday, August 20, 2021 16:36
*To:* [email protected]
*Subject:* Re: [DISCUSS] Adding better support for parametrized DAGs
and dynamic DAGs using JSON/YAML dataformats
On the topic of pointing the code view to yaml, would we
alternatively consider adding a view on the UI that would allow
arbitrary text content? This could be accomplished by adding an
optional parameter to the dag object that allowed you to pass text
(or a filepath) that would then go through a renderer (e.g.
markdown). It could be a readme, or yaml content or anything the
author wanted.
Collin
On Fri, Aug 20, 2021 at 3:27 PM Shaw, Damian P.
<[email protected]
<mailto:[email protected]>> wrote:
FYI this is what I did on one of my past projects for Airflow.
The users wanted to write their DAGs as YAML files so my “DAG
file” was a Python script that read the YAML files and converted
them to DAGs. It was very easy to do and worked because of the
flexibility of Airflow.
The one thing that would have been nice though is if I could of
easily changed the “code view” in Airflow to point to the
relevant YAML file instead of the less useful “DAG file”.
Damian
*From:*Jarek Potiuk <[email protected] <mailto:[email protected]>>
*Sent:* Friday, August 20, 2021 16:21
*To:* [email protected] <mailto:[email protected]>
*Cc:* [email protected] <mailto:[email protected]>
*Subject:* Re: [DISCUSS] Adding better support for parametrized DAGs
and dynamic DAGs using JSON/YAML dataformats
Airflow DAGS are Python code.This is a very basic assumption - which
is not likely to change. Ever.
And we are working on making it even more powerful. Writing DAGs in
yaml/json makes them less powerful and less flexible. This is fine
if you want to build on top of airflow and build a more declarative
way of defining dags and use airflow to run it under the hood.
if you think there is a group of users who can benefit from that -
cool. You can publish a code to convert those to Airflow DAGs and
submit it to our Ecosystem page. There are plenty of tlike "CWL -
Common Workflow Language" and others:
<https://airflow.apache.org/ecosystem/#tools-integrating-with-airflow>
J.
On Fri, Aug 20, 2021 at 2:48 PM Siddharth VP <[email protected]
<mailto:[email protected]>> wrote:
Have we considered allowing dags in json/yaml formats before? I
came up with a rather straightforward way to address parametrized
and dynamic DAGs in Airflow, which I think makes dynamic dags work
at scale.
*Background / Current limitations:*
1. Dynamic DAG generation using single-file methods
<https://www.astronomer.io/guides/dynamically-generating-dags#single-file-methods>
can cause scalability issues
<https://www.astronomer.io/guides/dynamically-generating-dags#scalability>
where there are too many active DAGs per file. The
dag_file_processor_timeout is applied to the loader file, so /all/
dynamically generated dags need to be processed in that time. Sure
the timeout could be increased, but that may be undesirable (what
if there are other static DAGs in the system on which we really
want to enforce a small timeout?)
2. Parametrizing DAGs in Airflow is difficult. There is no good way
to have multiple workflows that differ only by choices of some
constants. Using TriggerDagRunOperator to trigger a generic DAG
with conf doesn't give a native-ish experience as it creates
DagRuns of the /triggered/ dag rather than /this/ dag - which also
means a single scheduler log file.
*Suggested approach:*
1. User writes configuration files in JSON/YAML format. The schema
can be arbitrary except for one condition that it must have a
/builder/ parameter with the path to a python file.
2. User writes the "builder" - a python file containing a make_dag
method that receives the parsed json/yaml and returns aDAGobject.
(Just a sample strategy, we could instead say the file should
contain a class that extends an abstract DagBuilder class.)
2. Airflow reads JSON/YAML files as well from the dags directory.
It parses the file, imports the builder python file, and passes the
parsed json/yaml to it and collects the generated DAG into the
DagBag.
*Sample implementation:*
See
<https://github.com/siddharthvp/airflow/commit/47bad51fc4999737e9a300b134c04bbdbd04c88a>;
only major code change is in dagbag.py
*Result:*
Dag file processor logs show yaml/json file (instead of the builder
python file). Each dynamically generated dag gets its own scheduler
log file.
The configs dag_dir_list_interval, min_file_process_interval,
file_parsing_sort_mode all directly apply to dag config files.
If the json/yaml fail to parse, it's registered as an import error.
Would like to know your thoughts on this. Thanks!
Siddharth VP
--
+48 660 796 129
==============================================================================
Please access the attached hyperlink for an important electronic
communications disclaimer:
<http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html>
==============================================================================
==============================================================================
Please access the attached hyperlink for an important electronic
communications disclaimer:
<http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html>
==============================================================================