potiuk commented on issue #17437:
URL: https://github.com/apache/airflow/issues/17437#issuecomment-893348355


   I think, before you start requesting new features, it is great to check if 
the existing features are not working well for you:
   
   1) Did you try to configure multiple schedulers? Airflow 2 has been 
specifically designed to be able to scale it's operations with mulltiple 
schedulers. Please try to increase the number of schedulers you have and see if 
that can improve your experience.
   
   2) There are a number of settings that you can configure to prioritize 
scheduler and improve it's speed. Did you try to fine-tune them? For example 
"file-parsing-sort-mode" should be able to control the sequence of parsing the 
file 
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#file-parsing-sort-mode
   
   3) There are also other parameters that can control the behaviour of parsers 
(see the "scrheduler" section in config.
   
   There are also plenty of materials that you can learn from and try to fine 
tune the behaviour of scheduler:
   
   * Official documentation 
https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html
   * Astronomer's Blog detailing the new features, tunables and scalabiliy of 
the scheduler 
https://www.astronomer.io/blog/airflow-2-scheduler#:~:text=As%20part%20of%20Apache%20Airflow,once%20their%20dependencies%20are%20met.
   * This fantastic talk from @ashb  about Scheduler in Airflow 2 and how it 
works and how it can be tuned https://www.youtube.com/watch?v=DYC4-xElccE
   
   Please take a look at those resources and try to fine tune your scheduler 
accordingly. Come back please with your findings and some more data detailing 
what you have done and how you tried to fine-tune your configuration. 
   
   Ideally, it would be great if you can report both - if you manage to improve 
your configuration, let us know what worked and why, if you will try all of 
that and it did not work - please also report back all the observations you had 
during your trials - CPU, memory used, I/O usage, what kind of storage you have 
for dags, whether you tried to fine tune the storage options (for example we 
know that you need to buy extra I/O when you use EFS as DAG storage otherwise 
you are limited with the efficiency of the storage). You need to tell us  where 
you saw the bottlenecks and how you tried to overcome there..
   
    That will help us to see if there are still some bottlenecks that we were 
not able to foresee when we designed fine-tuning possibilities for the 
scheduler (but we need more data from you). 
   
   I am closing it for now, until you can provide this data for us to 
investigate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to