GitHub user dimon222 edited a comment on the discussion: How to use sharding 
for schedulers?

Thank you for this suggestion, I missed the memo regarding standalone dag 
processor support. Now that I think about it, it might be something towards 
solution to my problem. I noticed that generally airflow pattern expects 
recurring reparsing of directory for serialized dags setting in DB. I realized 
that I control what goes into my dag folder using my own middleware REST, so 
can construct on-demand reloading potentially removing necessity to run dag 
processor as recurring process altogether, but it seems that 
serialized_dag_expiration_seconds cannot be set to 0/infinite to never expire 
my rows so that I tell exactly when to refresh serialized entry.

So potential solution might incorporate:
1. Running dag processor separately from everything on-demand and/or at REST 
when any CRUD on dag code happens or Airflow version upgrade needs to happen. 
2. serialized_dag_expiration_seconds needs to be set ideally to infinite, but 
due to current limitations just maximum possible for this value.  
3. Scheduler or multiple of them in random seed mode should only run without 
dag processor enabled. 
4. More sophisticated sharding when need to do sharding for schedulers (not dag 
processors) will likely require custom sort modes implemented but might be 
unnecessary since most of pain of initial problem is the dag processor 
heaviness. 

GitHub link: 
https://github.com/apache/airflow/discussions/56294#discussioncomment-14598241

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to