> > If I understand correctly, using `pytest -k` might be less work and more > generalized than a swag of custom makers, unless it entails a lot of > re-naming things. The work to add markers might be easier if they can be > applied to entire classes of tests, although what I've mostly seen with > `pytest` is a functional pattern rather than classes in tests. For more > about that, see the note about using pytest fixtures vs. class > setup/teardown at https://docs.pytest.org/en/latest/xunit_setup.html
I think `pytest -k` is great for ad-hoc/manual execution of only what we want. But for automation around running tests (which should be repeatable and reproducible by anyone), I think it makes much more sense to keep makers in the code. It's really just a matter where we keep information about how we group tests in common categories that we use for test execution. 1. with pytest -k - we would have to keep the "grouping" as different set of -k parameters in CI test scripts. This requires following naming conventions for modules or classes or tests. Similar to what Kamil described earlier in the thread: we already use *_system.py module + SystemTest class naming in GCP tests. 2. with markers, the grouping is kept in the source code of tests instead. This is a "meta" information that does not force any naming convention on the tests. I strongly prefer 2. over 1. for test automation. Some reasoning: - It makes it easier to reproduce grouping locally without having to look-up the selection criteria/naming conventions. - It's easier to make automation around it (for example in case of integrations we can easily select cases where "integration" from environment matches the integration marker. For example cassandra integration will be matched by integration("cassandra") marker. With naming convention we would have to record somewhere (in the custom -k command) that "cassandra" integration matches (for example) all tests in "tests.cassandra" package, or all tests named TestCassandra or something even more complex. Defining custom marker seems like much more obvious and easy to follow. - Naming conventions are sometimes not obvious when you look at the code - as opposed to markers are quite obvious to follow in the code when you add new tests of the same "category". - Last but not least - you can combine different markers together. For example we can have Cassandra (integration) + MySql (backend) tests. So markers are "labels" and you can apply more of them to the same test. Naming convention makes it difficult (or impossible) to combine different categories together - You would have to have non-overlapping conventions and as we add more categories it might become impossible. For example if you look at my proposal below - we will likely have a number of "system(gcp)" and "backend("postgres") tests for tests that are testing System tests for Postgres to BigQuery. For me, the last reason from the list above is a deal-breaker. I can very easily imagine overlapping categories of tests we come up with and markers give us great flexibility here. With regard to "slow" and https://github.com/apache/airflow/pull/6876, it > was motivated by one test that uses moto mocking for AWS batch services. > In particular, it has a mock batch job that actually runs a container and > the user of the mock has no control over how the job transitions from > various job states (with associated status). For example, the `pytest` > durations are an order of magnitude longer for this test than all others > (see below stdout from a PR branch of mine). So, during dev-test cycles, > once this test is coded and working as expected, it helps to either > temporarily mark it with `pytest.mark.skip` or to permanently mark it with > a custom marker (e.g. `pytest.mark.slow`) and then use the `pytest -m 'not > slow'` to run all the faster tests. It's no big deal, I can live without > it, it's just a convenience. > With regards to "slow" tests. Maybe the right approach here will be to have a different marker. I think "Slow" suggest that there is a "fast" somewhere and that we need to know how slow is slow. As an inspiration - I really like the distinction introduced by Martin Fowler: https://www.martinfowler.com/articles/mocksArentStubs.html#ClassicalAndMockistTesting - where he distinguishes between different types of "test doubles" (dummy, fake, stub, spy, mock). Unfortunately, this terminology is not universally accepted, but for the sake of this discussion - assume we follow it, then I think the "fast" tests use "stubs, mocks or spies" where the "slow" tests you mention use "fakes" (your scripts are really "fakes"). The "fake" tests are usually much slower. But the "fake" marker might not be good name though because it's not universally agreed. But maybe we can come up with something that indicates the tests that are using "fakes" rather than "mocks/stubs/spies" ? That's much easier to decide when to apply such marker. Any idea? Maybe "nostub" or maybe "heavy" or something like that ? Or maybe we can start using "fake" terminology in those tests and use "fake" maker for those and simply introduce this term in our project. If we come up with a good proposal here this might be fairly consistent in terms of when to run the tests: - most tests where everything is mocked/stubbed/spied -> *no marker* - non-integration-dependent tests which are using fakes -> *"fake"* or *"heavy"* or smth else - integration-dependent tests: use *integration("<INTEGRATION>")* markers - for example integration("cassandra") - system tests (future, pending AIP-4) -> use *system("<SYSTEM>")* markers - for tests that require external services/credentials to connect to them, for example system("gcp") or system("aws") That would be super friendly for both automation and manual execution of the tests. J. -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>