Historically we have added indexes as needed for the performance of airflow
itself and not for the rest API.

Lately we've observed more usage of task instances list endpoint and
specifically filtering on end_date and / or start_date and / or
execution_date.

One line of argument goes that every possible filter in the rest API should
have an index.  But not all users use the API at all, let alone use all
params in the rest API.  Or even *all possible combinations?!*

Another argument would be "this should be the responsibility of the cluster
maintainer", since it's not part of the core operation of airflow
(scheduler / tasks / webserver) and highly dependent on specific use case.

Another thing to consider is sometimes there's an efficient way to get the
TIs you're looking for already with a slight refactor.  E.g. get the dag
runs first, then fetch the TIs using the PK.

Of course adding indexes is a trade off between storage space,
performance overhead maintaining them etc, balanced against reasonable
usage of the REST API.

I'm curious what folks think about this, and whether and when we should add
indexes in OSS that help REST API queries.

Reply via email to