Hey Vardan,

I also run a system with a large number of DAGs.

Regarding the slowness in the UI there are a few fixes that went into
1.10.7 which reduced the number of DAGs Airflow loads when browsing.
There is also a couple more changes going into the next release (I
hope!) which will speed it up further.

Regarding pruning the metadata theres a repository with some example's
in here: https://github.com/teamclairvoyant/airflow-maintenance-dags

Be very careful pruning DagRuns as you can end up with DagRuns getting
re-scheduled by the scheduler if catchup=True.

R

On Thu, 23 Jan 2020 at 12:54, Vardan Gupta <[email protected]> wrote:
>
> Hi Devs,
>
>
>
> Just wanted to share a production scenario that our team is trying to solve
> here in our organization.
>
>
>
> Just a little background, we have 20+ Airflow(v1.10.2, with Kubernetes
> Executor and MySQL as meta-store) clusters in our organization and have 10k
> active workflows and 100k daily runs. Every solution has different nature
> of workflow scheduling. Some of them runs them on scheduled basis while
> other triggers them on-demand basis. The minimum scheduling interval that
> one can configure could be at minute level, so maximum number of runs which
> can go for such workflow will be 1440/day and 43200/month, so if such
> workflow are not getting deleted for 2-3 months, previous runs details will
> be available in meta-store and problem gets amplified with ad-hoc triggers,
> where runs can go even larger and that’s where we start hitting the
> performance issues on Airflow UI and perhaps on schedule too because of
> slowness in results retrieval from the MySQL(there are few bad queries
> which gets formulated to use IN clause).
>
>
>
> We were thinking to expose a policy at workflow/cluster level which can
> restrict number of runs that can be preserved in meta-store for a workflow.
> Couple of things which will be required for the same.
>
>
>
>    1. *Need to define what’s the definition of older runs*– Can it be time
>    bound (by exposing variable like maxOlderRunsInDays) or restricting at
>    number of runs(by exposing variable like keepMaxTotalRuns) after which new
>    dag_run creation will archive 1 older run in asc order or could it be
>    function of both? I guess most important would be to control at number of
>    runs level but other could be sensible too depending upon use case.
>    2. *Archive older runs data*, may be in same meta-store with archival
>    tables having same schema as models but with flexibility in constraints.
>
>
>
> For time bound archival, let’s say 30 days history is the policy then we
> would be requiring a process within Airflow or outside it (may be a DAG
> with periodical runs which can archive older runs) and if we go for
> restriction at number of runs level, then perhaps it will be easier to make
> provision in airflow code to handle during dag_run creation block.
>
>
>
> Though, we haven’t done a formal benchmarking for the slowness observed but
> we plan to do that as it will help in knowing the limit we want to apply to
> the system.
>
>
>
> Would be happy to hear back from community about how they feel about the
> problem?
>
>
> Regards,
>
> Vardan Gupta

Reply via email to