potiuk commented on code in PR #35160: URL: https://github.com/apache/airflow/pull/35160#discussion_r1376539399
########## tests/system/providers/amazon/aws/utils/__init__.py: ########## @@ -106,6 +106,8 @@ def _fetch_from_ssm(key: str, test_name: str | None = None) -> str: # Since a default value after the SSM check is allowed, these exceptions should not stop execution. except NoCredentialsError as e: log.info("No boto credentials found: %s", e) + except ClientError as e: Review Comment: Likely becase previously when this test was run we already had some other tests initialising credentials. This is one of these typical problems where we have implicit and unwanted side effects between tests that are run in the same, predictable sequence. This is the test that had the problem - it's a DB test: ```python @pytest.mark.db_test @pytest.mark.parametrize("example", example_not_suspended_dags(), ids=relative_path) def test_should_be_importable(example): dagbag = DagBag( dag_folder=example, include_examples=False, ) assert len(dagbag.import_errors) == 0, f"import_errors={str(dagbag.import_errors)}" assert len(dagbag.dag_ids) >= 1 ``` This test (this error appears in DB `Always` test that is importing all system tests) was previously run in a suite of other DB + NON-DB tests and likely some of the previous Non-DB tests left "attempt" of making the authentication setup behind - not enough to authenticate (because it had no auth information) but enough to change the error message when the "import all system tests" was run previously. Now this test is run in "DB" suite only so likely the test that was predictably running before and leaving the side effect is likely run in the "Non-DB" suite. Or maybe some of the other changes in other tests I made stopped leaving the side-effect. This is one example where some tests could have impacted other tests behaviour - even if they do not share the DB. Just localy cache authentication configuration (even if the attempt failed) is enough. Just the usual case when you heavily impact the sequence and "neighbours" of the tests BTW, After this one gets merged, I am planning to experiment with randomization of test execution. This is recommended practice for tests to run them in random order to be able to detect similar side effects - It however requires quite a discipline and multiple runs to get to a stable state if it is not used by everyone from the very beginning. I am very interested to see those when I complete the PR. It could be that we will be able to enable randomisation for some of the test types but not for the others. But I will definitely attempt to do so. BTW. There was a very nice podcast about taming flaky tests in "Talk Python to me" https://talkpython.fm/episodes/show/429/taming-flaky-tests which I listened to. I think it reinvigorated me to try some of those things, knowing that what I think about is somethign that others also think make sense and how they deal with it :). I knew about test randomization and why it is good and wanted to try it - but hearing that it **can** work even in large installation was encouraging to think "ok let's try it". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org