Hello here,

Since we have now started even more isolation and separation of the code
and libraries, I think it might be a good time to maybe improve a bit the
way our tests "package" hierarchy is done - in providers and elsewhere.

This question came up when i started to add pre-commit checks after our
"shared code" PR have been merged - in
https://github.com/apache/airflow/pull/53697

One problem with having tests in monorepo, is that if all "tests" folders
are added to PYTHONPATH, we cannot really put tests in the "top" level of
the tests folder (that is pretty natural whe you do not have multiple of
"tests" folders. There are two reasons for that:

* it's likely module names or package names will have the same names in
separate distributions
* quite often - if we add tests in a sub-package of tests directly they
might clash with the same names - for example from stdlib or another
package we installed. Good examples of it (and something that happened in
the past were `tests/smtp/test_*` and `tests/kubernetes/test_*`. Depending
on how your IDE and env was configured you could have ended up in "import
kubernetes" not behaving as you expected.

So far in providers we have the "{unit/system/integration}/<PROVIDER>/"
convention that solved the issue. But even me as author of it, I must admit
`from unit.amazon` when you want to cross-import between different amazon
or different providers looks rather stupid and baffling. Even recently Ash
commented in one PR "that must be wrong" - and yeah, it does look wrong....

So maybe it's a good time to improve it and come up with a convention that
we can apply also to shared folders? Such change will be very little
disruptive and can be largely automated and applied in a single PR - and
modern git and IDE integration will follow such changes nicely - so that
you can see history of changes and it resolves most of conflicts
automatically - so it could be done with very little disruption (only
currently opened PRs will have to be rebased).

My proposal would be to add a "parent" folder in the "tests" directory to
indicate where the test is coming from. That might sound like a
duplication, but I think it's a natural consequence of having code in
monorepo, and a nice way to improve the organisation of tests. Also it has
an immediate notion or where the imports come from.

airflow-core\
             src\
                 airflow\
             tests\
                   airflow_core_tests\
                                      unit\
                                      system\
                                      integration\
airflow-task-sdk\
                 src\
                     airflow\
                             sdk
                 tests\
                       task_sdk_tests\
                                     # here all tests are unit so no need
for sub-packages

providers/amazon/tests\
                       provider_tests\
                                      unit\
                                           amazon
                                      integration\
                                                 amazon


providers/cncf/kubernetes/tests\
                                provider_tests\
                                              unit\
                                                  cncf\kubernetes\
                                              integration\
                                                          cncf\kubernetes\


I also considered swapping "unit" / "provider" (provider_tests\amazon\unit)
- but that would make it a bit more complex when we choose type of tests to
run or examples to have and I think it's somewhat better to have the
unit/system/integration distinction right after provider_tests because
essentially those test types are VERY different.

Again - that sounds like a lot of duplication in the path, but I think it's
worth it (and our pre-commits already make sure that there are no typos and
keep it in order and they can be updated to keep this new structure).

The main benefit of it hat when you import (wherever) between different
distributions - the imports will **look** better and be more obvious:

from provider_tests.unit.amazon import some_amazon_test_code
from task_sdk_tests import apis

Note that this does not yet tackle the other "special" test types we have
-> kubernetes-tests, docker-tests, task-sdk-tests (right now being merged
by Amogh), especially the last one seem to be clashing with
`task_sdk_tests`. But I think (and this would be separate discussion) -
this also adds an opportunity (next step) to move those "special" test
types we have. Eventually we could move those tests to be "yet another test
type" under "tests":

airflow_core_tests\
                   unit
                   system
                   integration
                   docker
                   kubernetes

task_sdk_tests\
              unit
              integration

But I would approach it as the next step after we reorganize imports for
the "regular" tests we have.

WDYT? Good idea? Other proposals?

J.

Reply via email to