On Tue, Jul 27, 2021 at 9:48 AM Peter Maydell <peter.mayd...@linaro.org> wrote: > > On Tue, 27 Jul 2021 at 14:24, Cleber Rosa <cr...@redhat.com> wrote: > > Yes, I've spent quite some time with some flaky behavior while running > > the replay tests as well. But in the end, the test remained unchanged > > because we found the issues in the actual code under test (one time > > the recording of the replay file would sometimes be corrupted when > > using >=1 CPUs, but 100% of the time when using a single CPU). > > > > This time, it was failing 100% of the time in my experience, and now, > > after the fix in df3a2de51a07089a4a729fe1f792f658df9dade4, it's > > passing 100% of the time. So I guess even tests with some observed > > flakiness can have their value. > > To me they have very little value, because once I notice a test > is flaky I simply start to ignore whether it is passing or failing, > and then it might as well not be there at all. > (This is happening currently with the gitlab CI tests, which have > been failing for a week.) > > -- PMM >
I hear you... and I acknowledge that we currently don't have a good solution for keeping track of the test results data and thus going beyond one's perceived value of a test. It's not something for the short term, but I do plan to work on a "confidence" tracker for tests. There is some seed work in the CKI data warehouse project[1] but it's very incipient. - Cleber. [1] - https://gitlab.com/cki-project/datawarehouse/-/blob/main/datawarehouse/views.py#L158