potiuk commented on code in PR #35160:
URL: https://github.com/apache/airflow/pull/35160#discussion_r1376539399


##########
tests/system/providers/amazon/aws/utils/__init__.py:
##########
@@ -106,6 +106,8 @@ def _fetch_from_ssm(key: str, test_name: str | None = None) 
-> str:
     # Since a default value after the SSM check is allowed, these exceptions 
should not stop execution.
     except NoCredentialsError as e:
         log.info("No boto credentials found: %s", e)
+    except ClientError as e:

Review Comment:
   Likely becase previously when this test was run we already had some other 
tests initialising credentials. This is one of these typical problems where we 
have implicit and unwanted side effects between tests that are run in the same, 
predictable sequence.  
   
   This is the test that had the problem - it's a DB test:
   
   ```
   @pytest.mark.db_test
   @pytest.mark.parametrize("example", example_not_suspended_dags(), 
ids=relative_path)
   def test_should_be_importable(example):
       dagbag = DagBag(
           dag_folder=example,
           include_examples=False,
       )
       assert len(dagbag.import_errors) == 0, 
f"import_errors={str(dagbag.import_errors)}"
       assert len(dagbag.dag_ids) >= 1
   ```
   
   This test (this error appears in non-DB `Always` test that is importing all 
system tests)  was previously run in a suite of other DB + NON-DB tests and 
likely some of the previous Non-DB tests left "attempt" of making the 
authentication setup behind - not enough to authenticate (because it had no 
auth information) but enough to change the error message when the "import all 
system tests" was run previously. 
   
   Now this test is run in "DB" suite only so likely the test that was 
predictably running before and leaving the side effect is likely run in the 
"Non-DB" suite. Or maybe some of the other changes in other tests I made 
stopped leaving the side-effect. 
   
   This is one example where some tests could have impacted tests behaviour - 
even if they do not share the DB. Just localy cache authentication 
configuration (even if the attempt failed) is enough.
   
   Just the usual case when you heavily impact the sequence and "neighbours" of 
the tests 
   
   BTW, After this one gets merged, I am planning to experiment with 
randomization of test execution. This is recommended practice for tests to run 
them in random order to be able to detect similar side effects - It however 
requires quite a discipline and multiple runs to  get to a stable state if it 
is not used by everyone from the very beginning. I am very interested to see 
those when I complete the PR. It could be that we will be able to enable 
randomisation for some of the test types but not for the others. But I will 
definitely attempt to do so.
   
   BTW. There was a very nice podcast about taming flaky tests in "Talk Python 
to me" 
    https://talkpython.fm/episodes/show/429/taming-flaky-tests which I listened 
to. I think it reinvigorated me to try some of those things, knowing that what 
I think about is somethign that others also think make sense and how they deal 
with it :). I knew about test randomization and why it is good and wanted to 
try it - but hearing that it **can** work even in large installation was 
encouraging to think "ok let's try it".
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to