The API is public, it **should** behave well regardless of local
customizations. We have automated DB maintenance and we "promise" to our
users it will work and we explicitly tell them "do not touch airflow DB as
you might break things".

Now - If we change the narration now and tell them "if you want to use a
particular APIS you will have to touch the DB and potentially break things"
- this is turning that completely upside - down and opens up for a
floodgate of things people would like to modify in their databases.

It would be an absolute nightmare to support our users if we say "yeah,
please modify the DB as you think is right". Most of them will not know
what modifications they did. In plenty of cases people who did it will be
already gone and absent from the project and there will be absolutely no
trace of what they did (because they just applied an index on a live DB one
day). This means that migration, failovers, switching clouds etc. - all
those scenarios will start breaking for such users. For me this is pretty
much no-go.

If anything - I think we should carefully choose what filtering criteria we
add to the API. We could introduce the rule that whenever we add something
there, a serious analysis on:

* whether this filtering is really needed
* does it need to be indexed for speed (maybe we can get away without it if
it's a secondary or tertiary criteria, and maybe we can change the
filtering criteria to make it so).
* what impact such index will have (size, rebalancing the trees, slowing
down inserts etc.)

We could actually somewhat automate it and raise a flag for any such change
and make sure those questions are answered before we merge such PR.

I'd rather gate such changes until answers to all those questions are known
than give the users freedom to modify Airflow metadata DB,

It also boils down to the performance "benchmark/baseline" and reproducible
performance tests discussion. This is the goal of AIP-59 (
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-59).
If we had it, we could make sure any API change introduces acceptable
performance overhead (disk size, memory, cpu, etc all should be included).

J.

Reply via email to