The link to https://docs.pytest.org/en/latest/example/markers.html#custom-marker-and-command-line-option-to-control-test-runs helps to clarify some of the customization required to add CLI options that select test sets based on markers. +1 for a common default with *no marker*. (It's hard to guess how many test sets are required and how many extra lines of "marker code" are needed for each category and how the Venn diagrams work out. I don't want to get into that as I don't have familiarity with all of it, but my first intuition is that markers will provide granularity at the expense of a lot more "marker code", unless there is always a common default test-env and extra tests are only required for the exceptions to the defaults.)
How would the proposed marker scheme categorise a test that uses mocked infrastructure for AWS batch services? Consider how much AWS infrastructure is mocked in a moto server to test batch services, i.e. see [1,2]. In a real sense, the moto library provides a server with a container runtime, it's "mocked infrastructure" that helps to "fake integration" tests. +1 for a common vocabulary (semantics) for tests and markers; I'm not a test-expert by a long shot, so what is the best practice for a test vocabulary and how does it translate into markers? Does the Apache Foundation have any kind of manifesto about such things? [1] https://github.com/spulec/moto/blob/master/tests/test_batch/test_batch.py [2] https://github.com/spulec/moto/blob/master/moto/batch/models.py On Sun, Dec 29, 2019 at 7:48 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > > > > If I understand correctly, using `pytest -k` might be less work and more > > generalized than a swag of custom makers, unless it entails a lot of > > re-naming things. The work to add markers might be easier if they can be > > applied to entire classes of tests, although what I've mostly seen with > > `pytest` is a functional pattern rather than classes in tests. For more > > about that, see the note about using pytest fixtures vs. class > > setup/teardown at https://docs.pytest.org/en/latest/xunit_setup.html > > > I think `pytest -k` is great for ad-hoc/manual execution of only what we > want. But for automation around running tests (which should be repeatable > and reproducible by anyone), I think it makes much more sense to keep > makers in the code. > > It's really just a matter where we keep information about how we group > tests in common categories that we use for test execution. > > 1. with pytest -k - we would have to keep the "grouping" as different > set of -k parameters in CI test scripts. This requires following naming > conventions for modules or classes or tests. Similar to what Kamil > described earlier in the thread: we already use *_system.py module + > SystemTest class naming in GCP tests. > 2. with markers, the grouping is kept in the source code of tests > instead. This is a "meta" information that does not force any naming > convention on the tests. > > I strongly prefer 2. over 1. for test automation. > > Some reasoning: > > - It makes it easier to reproduce grouping locally without having to > look-up the selection criteria/naming conventions. > - It's easier to make automation around it (for example in case of > integrations we can easily select cases where "integration" from > environment matches the integration marker. For example cassandra > integration will be matched by integration("cassandra") marker. With > naming > convention we would have to record somewhere (in the custom -k command) > that "cassandra" integration matches (for example) all tests in > "tests.cassandra" package, or all tests named TestCassandra or something > even more complex. Defining custom marker seems like much more obvious > and > easy to follow. > - Naming conventions are sometimes not obvious when you look at the code > - as opposed to markers are quite obvious to follow in the code when you > add new tests of the same "category". > - Last but not least - you can combine different markers together. For > example we can have Cassandra (integration) + MySql (backend) tests. So > markers are "labels" and you can apply more of them to the same test. > Naming convention makes it difficult (or impossible) to combine > different > categories together - You would have to have non-overlapping conventions > and as we add more categories it might become impossible. For example if > you look at my proposal below - we will likely have a number of > "system(gcp)" and "backend("postgres") tests for tests that are testing > System tests for Postgres to BigQuery. > > For me, the last reason from the list above is a deal-breaker. I can very > easily imagine overlapping categories of tests we come up with and markers > give us great flexibility here. > > With regard to "slow" and https://github.com/apache/airflow/pull/6876, it > > was motivated by one test that uses moto mocking for AWS batch services. > > In particular, it has a mock batch job that actually runs a container and > > the user of the mock has no control over how the job transitions from > > various job states (with associated status). For example, the `pytest` > > durations are an order of magnitude longer for this test than all others > > (see below stdout from a PR branch of mine). So, during dev-test cycles, > > once this test is coded and working as expected, it helps to either > > temporarily mark it with `pytest.mark.skip` or to permanently mark it > with > > a custom marker (e.g. `pytest.mark.slow`) and then use the `pytest -m > 'not > > slow'` to run all the faster tests. It's no big deal, I can live without > > it, it's just a convenience. > > > > With regards to "slow" tests. Maybe the right approach here will be to have > a different marker. I think "Slow" suggest that there is a "fast" somewhere > and that we need to know how slow is slow. > > As an inspiration - I really like the distinction introduced by Martin > Fowler: > > https://www.martinfowler.com/articles/mocksArentStubs.html#ClassicalAndMockistTesting > - > where he distinguishes between different types of "test doubles" (dummy, > fake, stub, spy, mock). Unfortunately, this terminology is not universally > accepted, but for the sake of this discussion - assume we follow it, then I > think the "fast" tests use "stubs, mocks or spies" where the "slow" tests > you mention use "fakes" (your scripts are really "fakes"). > The "fake" tests are usually much slower. But the "fake" marker might not > be good name though because it's not universally agreed. > > But maybe we can come up with something that indicates the tests that are > using "fakes" rather than "mocks/stubs/spies" ? That's much easier to > decide when to apply such marker. Any idea? Maybe "nostub" or maybe > "heavy" or something like that ? Or maybe we can start using "fake" > terminology in those tests and use "fake" maker for those and simply > introduce this term in our project. > > If we come up with a good proposal here this might be fairly consistent in > terms of when to run the tests: > > - most tests where everything is mocked/stubbed/spied -> *no marker* > - non-integration-dependent tests which are using fakes -> *"fake"* or > *"heavy"* or smth else > - integration-dependent tests: use *integration("<INTEGRATION>")* > markers - for example integration("cassandra") > - system tests (future, pending AIP-4) -> use *system("<SYSTEM>")* > markers - for tests that require external services/credentials to > connect > to them, for example system("gcp") or system("aws") > > That would be super friendly for both automation and manual execution of > the tests. > > J. > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > -- Darren L. Weber, Ph.D. http://psdlw.users.sourceforge.net/ http://psdlw.users.sourceforge.net/wordpress/