Hi Kevin, The problem that DAG parsing takes a long time can be solved by Asynchronous DAG loading: https://github.com/apache/airflow/pull/5594
The idea is the a background process parses DAG files, and sends DAGs to webserver process every [webserver] dagbag_sync_interval = 10s. We have launched it in Composer, so our users can set webserver worker restart interval to 1 hour (or longer). The background DAG parsing processing refresh all DAGs per [webserver] = collect_dags_interval = 30s. If parsing all DAGs take 15min, you can see DAGs being gradually freshed with this feature. Thanks, Zhou On Sat, Jul 27, 2019 at 2:43 AM Kevin Yang <[email protected]> wrote: > Nice job Zhou! > > Really excited, exactly what we wanted for the webserver scaling issue. > Want to add another big drive for Airbnb to start think about this > previously to support the effort: it can not only bring consistency between > webservers but also bring consistency between webserver and > scheduler/workers. It may be less of a problem if total DAG parsing time is > small, but for us the total DAG parsing time is 15+ mins and we had to set > the webserver( gunicorn subprocesses) restart interval to 20 mins, which > leads to a worst case 15+20+15=50 mins delay between scheduler start to > schedule things and users can see their deployed DAGs/changes... > > I'm not so sure about the scheduler performance improvement: currently we > already feed the main scheduler process with SimpleDag through > DagFileProcessorManager running in a subprocess--in the future we feed it > with data from DB, which is likely slower( tho the diff should have > negligible impact to the scheduler performance). In fact if we'd keep the > existing behavior, try schedule only fresh parsed DAGs, then we may need to > deal with some consistency issue--dag processor and the scheduler race for > updating the flag indicating if the DAG is newly parsed. No big deal there > but just some thoughts on the top of my head and hopefully can be helpful. > > And good idea on pre-rendering the template, believe template rendering was > the biggest concern in the previous discussion. We've also chose the > pre-rendering+JSON approach in our smart sensor API > < > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17+Airflow+sensor+optimization > > > and > seems to be working fine--a supporting case for ur proposal ;) There's a > WIP > PR <https://github.com/apache/airflow/pull/5499> for it just in case you > are interested--maybe we can even share some logics. > > Thumbs-up again for this and please don't heisitate to reach out if you > want to discuss further with us or need any help from us. > > > Cheers, > Kevin Y > > On Sat, Jul 27, 2019 at 12:54 AM Driesprong, Fokko <[email protected]> > wrote: > > > Looks great Zhou, > > > > I have one thing that pops in my mind while reading the AIP; should keep > > the caching on the webserver level. As the famous quote goes: *"There are > > only two hard things in Computer Science: cache invalidation and naming > > things." -- Phil Karlton* > > > > Right now, the fundamental change that is being proposed in the AIP is > > fetching the DAGs from the database in a serialized format, and not > parsing > > the Python files all the time. This will give already a great performance > > improvement on the webserver side because it removes a lot of the > > processing. However, since we're still fetching the DAGs from the > database > > in a regular interval, cache it in the local process, so we still have > the > > two issues that Airflow is suffering from right now: > > > > 1. No snappy UI because it is still polling the database in a regular > > interval. > > 2. Inconsistency between webservers because they might poll in a > > different interval, I think we've all seen this: > > https://www.youtube.com/watch?v=sNrBruPS3r4 > > > > As I also mentioned in the Slack channel, I strongly feel that we should > be > > able to render most views from the tables in the database, so without > > touching the blob. For specific views, we could just pull the blob from > the > > database. In this case we always have the latest version, and we tackle > the > > second point above. > > > > To tackle the first one, I also have an idea. We should change the DAG > > parser from a loop to something that uses inotify > > https://pypi.org/project/inotify_simple/. This will change it from > polling > > to an event-driven design, which is much more performant and less > resource > > hungry. But this would be an AIP on its own. > > > > Again, great design and a comprehensive AIP, but I would include the > > caching on the webserver to greatly improve the user experience in the > UI. > > Looking forward to the opinion of others on this. > > > > Cheers, Fokko > > > > > > > > > > > > > > > > > > Op za 27 jul. 2019 om 01:44 schreef Zhou Fang > <[email protected] > > >: > > > > > Hi Kaxi, > > > > > > Just sent out the AIP: > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler > > > > > > Thanks! > > > Zhou > > > > > > > > > On Fri, Jul 26, 2019 at 1:33 PM Zhou Fang <[email protected]> wrote: > > > > > > > Hi Kaxil, > > > > > > > > We are also working on persisting DAGs into DB using JSON for Airflow > > > > webserver in Google Composer. We target at minimizing the change to > the > > > > current Airflow code. Happy to get synced on this! > > > > > > > > Here is our progress: > > > > (1) Serializing DAGs using Pickle to be used in webserver > > > > It has been launched in Composer. I am working on the PR to upstream > > it: > > > > https://github.com/apache/airflow/pull/5594 > > > > Currently it does not support non-Airflow operators and we are > working > > on > > > > a fix. > > > > > > > > (2) Caching Pickled DAGs in DB to be used by webserver > > > > We have a proof-of-concept implementation, working on an AIP now. > > > > > > > > (3) Using JSON instead of Pickle in (1) and (2) > > > > Decided to use JSON because Pickle is not secure and human readable. > > The > > > > serialization approach is very similar to (1). > > > > > > > > I will update the RP (https://github.com/apache/airflow/pull/5594) > to > > > > replace Pickle by JSON, and send our design of (2) as an AIP next > week. > > > > Glad to check together whether our implementation makes sense and do > > > > improvements on that. > > > > > > > > Thanks! > > > > Zhou > > > > > > > > > > > > On Fri, Jul 26, 2019 at 7:37 AM Kaxil Naik <[email protected]> > > wrote: > > > > > > > >> Hi all, > > > >> > > > >> We, at Astronomer, are going to spend time working on DAG > > Serialisation. > > > >> There are 2 AIPs that are somewhat related to what we plan to work > on: > > > >> > > > >> - AIP-18 Persist all information from DAG file in DB > > > >> < > > > >> > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-18+Persist+all+information+from+DAG+file+in+DB > > > >> > > > > >> - AIP-19 Making the webserver stateless > > > >> < > > > >> > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-19+Making+the+webserver+stateless > > > >> > > > > >> > > > >> We plan to use JSON as the Serialisation format and store it as a > blob > > > in > > > >> metadata DB. > > > >> > > > >> *Goals:* > > > >> > > > >> - Make Webserver Stateless > > > >> - Use the same version of the DAG across Webserver & Scheduler > > > >> - Keep backward compatibility and have a flag (globally & at DAG > > > level) > > > >> to turn this feature on/off > > > >> - Enable DAG Versioning (extended Goal) > > > >> > > > >> > > > >> We will be preparing a proposal (AIP) after some research and some > > > initial > > > >> work and open it for the suggestions of the community. > > > >> > > > >> We already had some good brain-storming sessions with Twitter folks > > > (DanD > > > >> & > > > >> Sumit), folks from GoDataDriven (Fokko & Bas) & Alex (from Uber) > which > > > >> will > > > >> be a good starting point for us. > > > >> > > > >> If anyone in the community is interested in it or has some > experience > > > >> about > > > >> the same and want to collaborate please let me know and join > > > >> #dag-serialisation channel on Airflow Slack. > > > >> > > > >> Regards, > > > >> Kaxil > > > >> > > > > > > > > > >
