Hi Kaxil,

We are also working on persisting DAGs into DB using JSON for Airflow
webserver in Google Composer. We target at minimizing the change to the
current Airflow code. Happy to get synced on this!

Here is our progress:
(1) Serializing DAGs using Pickle to be used in webserver
It has been launched in Composer. I am working on the PR to upstream it:
https://github.com/apache/airflow/pull/5594
Currently it does not support non-Airflow operators and we are working on a
fix.

(2) Caching Pickled DAGs in DB to be used by webserver
We have a proof-of-concept implementation, working on an AIP now.

(3) Using JSON instead of Pickle in (1) and (2)
Decided to use JSON because Pickle is not secure and human readable. The
serialization approach is very similar to (1).

I will update the RP (https://github.com/apache/airflow/pull/5594) to
replace Pickle by JSON, and send our design of (2) as an AIP next week.
Glad to check together whether our implementation makes sense and do
improvements on that.

Thanks!
Zhou


On Fri, Jul 26, 2019 at 7:37 AM Kaxil Naik <kaxiln...@gmail.com> wrote:

> Hi all,
>
> We, at Astronomer, are going to spend time working on DAG Serialisation.
> There are 2 AIPs that are somewhat related to what we plan to work on:
>
>    - AIP-18 Persist all information from DAG file in DB
>    <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-18+Persist+all+information+from+DAG+file+in+DB
> >
>    - AIP-19 Making the webserver stateless
>    <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-19+Making+the+webserver+stateless
> >
>
> We plan to use JSON as the Serialisation format and store it as a blob in
> metadata DB.
>
> *Goals:*
>
>    - Make Webserver Stateless
>    - Use the same version of the DAG across Webserver & Scheduler
>    - Keep backward compatibility and have a flag (globally & at DAG level)
>    to turn this feature on/off
>    - Enable DAG Versioning (extended Goal)
>
>
> We will be preparing a proposal (AIP) after some research and some initial
> work and open it for the suggestions of the community.
>
> We already had some good brain-storming sessions with Twitter folks (DanD &
> Sumit), folks from GoDataDriven (Fokko & Bas) & Alex (from Uber) which will
> be a good starting point for us.
>
> If anyone in the community is interested in it or has some experience about
> the same and want to collaborate please let me know and join
> #dag-serialisation channel on Airflow Slack.
>
> Regards,
> Kaxil
>

Reply via email to