Can vote for making it as an optional approach for fine-tuning (only for advance users).
On 12-Jul-2022 at 7:44:35 AM, Jarek Potiuk <ja...@potiuk.com> wrote: > Not interesting :) ? > > On Thu, Jul 7, 2022 at 10:41 AM Jarek Potiuk <ja...@potiuk.com> wrote: > > > Hello everyone, > > > We have just published a blog on our medium - > https://medium.com/apache-airflow/airflows-magic-loop-ec424b05b629 - that > is a blog of one of our users Itay Bittan (thanks!) who had been inspired > by our discussion on Slack on how they struggle with delays of loading > dynamic dags in their K8S. > > > The problem that they had was that they have dynamic dags that are created > in a big loop (1000s of DAGs) and that caused ~ 2 minutes delays on > starting their tas on K8S, because all DAGs have to be created by the loop. > > > What I proposed to try (since the DAGs were connected by the loop but > really isolated from each other) is to skip "all other" DAG creation in the > loop when it is parsed in the worker. That resulted in cutting the delay to > ~ 200ms. > > > His case seems to be general enough to maybe suggest it even as a > "general" solution - currently it is based on possibly several > "non-documented" assumptions (that dag_id is passed in a certain way to the > worker and that you can use it to filter out such a loop. > > > However maybe that's a good idea to make it documented and convert into > "best practice" when you have similar Dynamic DAGs. > > > I can think of several caveats of such an approach - not all DAGs created > in a loop can be isolated, sometimes there might be side-effects that make > your dag have different structure if you skip other DAGs, but - I thought > that if we add some "guidelines" that could be easily replicated by other > users. > > > WDYT? > > > J. > >