Can vote for making it as an optional approach for fine-tuning (only for
advance users).

On 12-Jul-2022 at 7:44:35 AM, Jarek Potiuk <ja...@potiuk.com> wrote:

> Not interesting :) ?
>
> On Thu, Jul 7, 2022 at 10:41 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>
> Hello everyone,
>
>
> We have just published a blog on our medium -
> https://medium.com/apache-airflow/airflows-magic-loop-ec424b05b629 - that
> is a blog of one of our users Itay Bittan (thanks!) who had been inspired
> by our discussion on Slack on how they struggle with delays of loading
> dynamic dags in their K8S.
>
>
> The problem that they had was that they have dynamic dags that are created
> in a big loop (1000s of DAGs) and that caused ~ 2 minutes delays on
> starting their tas on K8S, because all DAGs have to be created by the loop.
>
>
> What I proposed to try (since the DAGs were connected by the loop but
> really isolated from each other) is to skip "all other" DAG creation in the
> loop when it is parsed in the worker. That resulted in cutting the delay to
> ~ 200ms.
>
>
> His case seems to be general enough to maybe suggest it even as a
> "general" solution - currently it is based on possibly several
> "non-documented" assumptions (that dag_id is passed in a certain way to the
> worker and that you can use it to filter out such a loop.
>
>
> However maybe that's a good idea to make it documented and convert into
> "best practice" when you have similar Dynamic DAGs.
>
>
> I can think of several caveats of such an approach - not all DAGs created
> in a loop can be isolated, sometimes there might be side-effects that make
> your dag have different structure if  you skip other DAGs, but - I thought
> that if we add some "guidelines" that could be easily replicated by other
> users.
>
>
> WDYT?
>
>
> J.
>
>

Reply via email to