Hello, Airflow community,

This email calls for a vote to add the DAG Serialization feature at
https://github.com/apache/airflow/pull/5743.

*AIP*:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler

*Previous Mailing List discussion*:
https://lists.apache.org/thread.html/65d282368e0a7c19815badb8b1c6c8d72b0975ce94f601e13af44f74@%3Cdev.airflow.apache.org%3E
 .

*Authors*: Kaxil Naik, Zhou Fang, Ash-Berlin Taylor

*Summary*:

   - DAGs are serialized using JSON format and stored in a SerializedDag
   table
   - The Webserver now instead of having to parse the DAG file again, reads
   the serialized DAGs in JSON, de-serializes them and creates the DagBag and
   uses it to show in the UI.
   - Instead of loading an entire DagBag when the WebServer starts we only
   load each DAG on demand from the Serialized Dag table. This helps reduce
   Webserver startup time and memory. The reduction is notable when you have a
   large number of DAGs.
   - A JSON Schema has been defined and we validate the serialized dag
   before writing it to the database

[image: image.png]

A PR (https://github.com/apache/airflow/pull/5743) is ready for review from
the committers and community.

We also have a WIP PR (https://github.com/apache/airflow/pull/5992) to
backport this feature to 1.10.* branch.

A big thank you to Zhou and Ash for their continuous help in improving this
feature/PR.

This email is formally calling for a vote to accept the AIP and PR. Please
note that we will update the PR / feature to fix bugs if we find any.

Cheers,
Kaxil

Reply via email to