Hi Kaxil, We are also working on persisting DAGs into DB using JSON for Airflow webserver in Google Composer. We target at minimizing the change to the current Airflow code. Happy to get synced on this!
Here is our progress: (1) Serializing DAGs using Pickle to be used in webserver It has been launched in Composer. I am working on the PR to upstream it: https://github.com/apache/airflow/pull/5594 Currently it does not support non-Airflow operators and we are working on a fix. (2) Caching Pickled DAGs in DB to be used by webserver We have a proof-of-concept implementation, working on an AIP now. (3) Using JSON instead of Pickle in (1) and (2) Decided to use JSON because Pickle is not secure and human readable. The serialization approach is very similar to (1). I will update the RP (https://github.com/apache/airflow/pull/5594) to replace Pickle by JSON, and send our design of (2) as an AIP next week. Glad to check together whether our implementation makes sense and do improvements on that. Thanks! Zhou On Fri, Jul 26, 2019 at 7:37 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > Hi all, > > We, at Astronomer, are going to spend time working on DAG Serialisation. > There are 2 AIPs that are somewhat related to what we plan to work on: > > - AIP-18 Persist all information from DAG file in DB > < > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-18+Persist+all+information+from+DAG+file+in+DB > > > - AIP-19 Making the webserver stateless > < > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-19+Making+the+webserver+stateless > > > > We plan to use JSON as the Serialisation format and store it as a blob in > metadata DB. > > *Goals:* > > - Make Webserver Stateless > - Use the same version of the DAG across Webserver & Scheduler > - Keep backward compatibility and have a flag (globally & at DAG level) > to turn this feature on/off > - Enable DAG Versioning (extended Goal) > > > We will be preparing a proposal (AIP) after some research and some initial > work and open it for the suggestions of the community. > > We already had some good brain-storming sessions with Twitter folks (DanD & > Sumit), folks from GoDataDriven (Fokko & Bas) & Alex (from Uber) which will > be a good starting point for us. > > If anyone in the community is interested in it or has some experience about > the same and want to collaborate please let me know and join > #dag-serialisation channel on Airflow Slack. > > Regards, > Kaxil >