Hi Kevin,

The problem that DAG parsing takes a long time can be solved by
Asynchronous DAG loading: https://github.com/apache/airflow/pull/5594

The idea is the a background process parses DAG files, and sends DAGs to
webserver process every [webserver] dagbag_sync_interval = 10s.

We have launched it in Composer, so our users can set webserver worker
restart interval to 1 hour (or longer). The background DAG parsing
processing refresh all DAGs per [webserver] = collect_dags_interval = 30s.

If parsing all DAGs take 15min, you can see DAGs being gradually freshed
with this feature.

Thanks,
Zhou


On Sat, Jul 27, 2019 at 2:43 AM Kevin Yang <[email protected]> wrote:

> Nice job Zhou!
>
> Really excited, exactly what we wanted for the webserver scaling issue.
> Want to add another big drive for Airbnb to start think about this
> previously to support the effort: it can not only bring consistency between
> webservers but also bring consistency between webserver and
> scheduler/workers. It may be less of a problem if total DAG parsing time is
> small, but for us the total DAG parsing time is 15+ mins and we had to set
> the webserver( gunicorn subprocesses) restart interval to 20 mins, which
> leads to a worst case 15+20+15=50 mins delay between scheduler start to
> schedule things and users can see their deployed DAGs/changes...
>
> I'm not so sure about the scheduler performance improvement: currently we
> already feed the main scheduler process with SimpleDag through
> DagFileProcessorManager running in a subprocess--in the future we feed it
> with data from DB, which is likely slower( tho the diff should have
> negligible impact to the scheduler performance). In fact if we'd keep the
> existing behavior, try schedule only fresh parsed DAGs, then we may need to
> deal with some consistency issue--dag processor and the scheduler race for
> updating the flag indicating if the DAG is newly parsed. No big deal there
> but just some thoughts on the top of my head and hopefully can be helpful.
>
> And good idea on pre-rendering the template, believe template rendering was
> the biggest concern in the previous discussion. We've also chose the
> pre-rendering+JSON approach in our smart sensor API
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17+Airflow+sensor+optimization
> >
> and
> seems to be working fine--a supporting case for ur proposal ;) There's a
> WIP
> PR <https://github.com/apache/airflow/pull/5499> for it just in case you
> are interested--maybe we can even share some logics.
>
> Thumbs-up again for this and please don't heisitate to reach out if you
> want to discuss further with us or need any help from us.
>
>
> Cheers,
> Kevin Y
>
> On Sat, Jul 27, 2019 at 12:54 AM Driesprong, Fokko <[email protected]>
> wrote:
>
> > Looks great Zhou,
> >
> > I have one thing that pops in my mind while reading the AIP; should keep
> > the caching on the webserver level. As the famous quote goes: *"There are
> > only two hard things in Computer Science: cache invalidation and naming
> > things." -- Phil Karlton*
> >
> > Right now, the fundamental change that is being proposed in the AIP is
> > fetching the DAGs from the database in a serialized format, and not
> parsing
> > the Python files all the time. This will give already a great performance
> > improvement on the webserver side because it removes a lot of the
> > processing. However, since we're still fetching the DAGs from the
> database
> > in a regular interval, cache it in the local process, so we still have
> the
> > two issues that Airflow is suffering from right now:
> >
> >    1. No snappy UI because it is still polling the database in a regular
> >    interval.
> >    2. Inconsistency between webservers because they might poll in a
> >    different interval, I think we've all seen this:
> >    https://www.youtube.com/watch?v=sNrBruPS3r4
> >
> > As I also mentioned in the Slack channel, I strongly feel that we should
> be
> > able to render most views from the tables in the database, so without
> > touching the blob. For specific views, we could just pull the blob from
> the
> > database. In this case we always have the latest version, and we tackle
> the
> > second point above.
> >
> > To tackle the first one, I also have an idea. We should change the DAG
> > parser from a loop to something that uses inotify
> > https://pypi.org/project/inotify_simple/. This will change it from
> polling
> > to an event-driven design, which is much more performant and less
> resource
> > hungry. But this would be an AIP on its own.
> >
> > Again, great design and a comprehensive AIP, but I would include the
> > caching on the webserver to greatly improve the user experience in the
> UI.
> > Looking forward to the opinion of others on this.
> >
> > Cheers, Fokko
> >
> >
> >
> >
> >
> >
> >
> >
> > Op za 27 jul. 2019 om 01:44 schreef Zhou Fang
> <[email protected]
> > >:
> >
> > > Hi Kaxi,
> > >
> > > Just sent out the AIP:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler
> > >
> > > Thanks!
> > > Zhou
> > >
> > >
> > > On Fri, Jul 26, 2019 at 1:33 PM Zhou Fang <[email protected]> wrote:
> > >
> > > > Hi Kaxil,
> > > >
> > > > We are also working on persisting DAGs into DB using JSON for Airflow
> > > > webserver in Google Composer. We target at minimizing the change to
> the
> > > > current Airflow code. Happy to get synced on this!
> > > >
> > > > Here is our progress:
> > > > (1) Serializing DAGs using Pickle to be used in webserver
> > > > It has been launched in Composer. I am working on the PR to upstream
> > it:
> > > > https://github.com/apache/airflow/pull/5594
> > > > Currently it does not support non-Airflow operators and we are
> working
> > on
> > > > a fix.
> > > >
> > > > (2) Caching Pickled DAGs in DB to be used by webserver
> > > > We have a proof-of-concept implementation, working on an AIP now.
> > > >
> > > > (3) Using JSON instead of Pickle in (1) and (2)
> > > > Decided to use JSON because Pickle is not secure and human readable.
> > The
> > > > serialization approach is very similar to (1).
> > > >
> > > > I will update the RP (https://github.com/apache/airflow/pull/5594)
> to
> > > > replace Pickle by JSON, and send our design of (2) as an AIP next
> week.
> > > > Glad to check together whether our implementation makes sense and do
> > > > improvements on that.
> > > >
> > > > Thanks!
> > > > Zhou
> > > >
> > > >
> > > > On Fri, Jul 26, 2019 at 7:37 AM Kaxil Naik <[email protected]>
> > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> We, at Astronomer, are going to spend time working on DAG
> > Serialisation.
> > > >> There are 2 AIPs that are somewhat related to what we plan to work
> on:
> > > >>
> > > >>    - AIP-18 Persist all information from DAG file in DB
> > > >>    <
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-18+Persist+all+information+from+DAG+file+in+DB
> > > >> >
> > > >>    - AIP-19 Making the webserver stateless
> > > >>    <
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-19+Making+the+webserver+stateless
> > > >> >
> > > >>
> > > >> We plan to use JSON as the Serialisation format and store it as a
> blob
> > > in
> > > >> metadata DB.
> > > >>
> > > >> *Goals:*
> > > >>
> > > >>    - Make Webserver Stateless
> > > >>    - Use the same version of the DAG across Webserver & Scheduler
> > > >>    - Keep backward compatibility and have a flag (globally & at DAG
> > > level)
> > > >>    to turn this feature on/off
> > > >>    - Enable DAG Versioning (extended Goal)
> > > >>
> > > >>
> > > >> We will be preparing a proposal (AIP) after some research and some
> > > initial
> > > >> work and open it for the suggestions of the community.
> > > >>
> > > >> We already had some good brain-storming sessions with Twitter folks
> > > (DanD
> > > >> &
> > > >> Sumit), folks from GoDataDriven (Fokko & Bas) & Alex (from Uber)
> which
> > > >> will
> > > >> be a good starting point for us.
> > > >>
> > > >> If anyone in the community is interested in it or has some
> experience
> > > >> about
> > > >> the same and want to collaborate please let me know and join
> > > >> #dag-serialisation channel on Airflow Slack.
> > > >>
> > > >> Regards,
> > > >> Kaxil
> > > >>
> > > >
> > >
> >
>

Reply via email to