coufon commented on issue #5594: [AIRFLOW-4924] Loading DAGs asynchronously in Airflow webserver URL: https://github.com/apache/airflow/pull/5594#issuecomment-511901734 Hi Sumit, thanks for your help in reviewing. To answer your questions: > * So each gunicorn worker going to launch a separate thread to parse DAGs asynchronously, and not a single background process going to do the DAG parsing for all the workers. Yes. Each gunicorn worker now has a separate process to collect DAGs. A single background process is possible with more code change in airflow/www/views.py. We can discuss whether it is needed. One drawback of using a process for each gunicorn worker is that the 'collect dags' process consumes most of memory. Usually users want to use async_dag_loader when they have many DAGs (>1,000), so the webserver is very memory intense. So we also suggest users to use [webserver] workers = 1. > * The pickled DAG object won't be stored into DB and just used for thread communication. No. They are not stored in DB and just in memory now. We are working on storing them in DB (just like DagPickle in Airflow, but we store stringified DAGs instead), so UI does not need to run DAG code any more. Scheduler can also use stringified DAGs to accelerate scheduling actions (to be tested out). > * As of now each gunicorn worker might have diff copy of DAGs (due to changes) and cause confusion on UI, something which isn't going to solve with this change. Yes. This PR does not change this inconsistency problem. The solution can be a future PR that: (1) as you mentioned, use a single DAG collecting process for all gunicorn workers; or (2) reading cached DAGs from DB in webserver > * How it going to behave if DAG parsing time for each thread is more than the configured `collect_dags_interval` time. If the process takes longer than collect_dags_interval, it will immediately start the next round of collection. There is no sleep between two rounds of collection.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
