GitHub user dimon222 added a comment to the discussion: How to use sharding for schedulers?
Thank you u for this suggestion, I missed the memo regarding standalone dag processor support. Now that I think about it, it might be something towards solution to my problem. I noticed that generally airflow pattern expects recurring reparsing of directory for serialized dags setting in DB. I realized that I control what goes into my dag folder using my own middleware REST, so can construct on-demand reloading potentially removing necessity to run dag processor as recurring process altogether, but it seems that serialized_dag_expiration_seconds cannot be set to 0/infinite to never expire my rows so that I tell exactly when to refresh serialized entry. So potential solution might incorporate: 1. Running dag processor separately from everything on-demand and/or at REST when any CRUD on dag code happens or Airflow version upgrade needs to happen. 2. serialized_dag_expiration_seconds needs to be set ideally to infinite, but due to current limitations just maximum possible for this value. 3. Scheduler or multiple of them in random seed mode should only run without dag processor enabled. 4. More sophisticated sharding when need to do sharding for schedulers (not dag processors) will likely require custom sort modes implemented but might be unnecessary since most of pain of initial problem is the dag processor heaviness. GitHub link: https://github.com/apache/airflow/discussions/56294#discussioncomment-14598241 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
