Absolutely 1 Great idea! Happy to coordinate - and I hope others would like
to join it as well :)

On Mon, Apr 20, 2020 at 12:04 PM Tomasz Urbaszek <
[email protected]> wrote:

> Got it!
>
> What would you say to organize a more coordinated effort to improve
> our testing suite something like "Fridays with tests"? In a few weeks,
> this should result in a much better test suite and probably fewer
> problems with CI. This also a nice way to take a look at Airflow
> internals :)
>
> Tomek
>
>
> On Mon, Apr 20, 2020 at 10:18 AM Jarek Potiuk <[email protected]>
> wrote:
> >
> > Both - depending on the tests. I think for now I've been over-cautious a
> > bit and after merging while observing a few runs in production (and other
> > people's PR we might quickly go down with the number of quarantined
> tests.
> >
> > I think most of the problematic tests are really "long-running" and
> pretty
> > stand-alone ones. I think part of the process should be that if we find
> > that they require some side effects, we will be able to fix that the and
> > eventually we will only have few quarantined "single tests" rather than
> > "whole classes"
> >
> > On Mon, Apr 20, 2020 at 7:42 AM Tomasz Urbaszek <
> [email protected]>
> > wrote:
> >
> > > Thank you Jarek for your work!
> > > +1 for the idea of quarantine tests. Just one question: are we marking
> > > single tests or whole classes? This question is mostly related to
> > > tests that requires some side effects from previous tests.
> > >
> > > Tomek
> > >
> > >
> > > On Mon, Apr 20, 2020 at 2:38 AM Jarek Potiuk <[email protected]
> >
> > > wrote:
> > > >
> > > > Hello everyone,
> > > >
> > > > I have a proposal - very much COVID-19-inspired on how to fix our CI
> > > tests...
> > > >
> > > > After the recent problems with CI together with Daniel and Tomek we
> > > > decided to make an emergency migration to Github Actions. So we did.
> > > >
> > > > I think overall it was a good move, but we had some problems with it.
> > > > It turns out that while we were blaming Travis for everything wrong
> > > > that happened in our builds, it was not always Travis' fault. We have
> > > > some tests that are also failing in Github Actions and I think it's
> > > > the highest time we fix them.
> > > >
> > > > I spend a better part of the weekend bring trying different things
> and
> > > > implementing numerous optimizations back to our CI configuration (a
> > > > lot of those were lost during the emergency move).
> > > >
> > > > While running it I had many issues and I think I found a good way to
> > > > handle our flaky tests. I would love that others think about it.
> > > >
> > > > Those interested - please take a look at the PR "Bring back CI
> > > > optimisations" https://github.com/apache/airflow/pull/8393
> > > > Corresponding GituhbActions here:
> > > > https://github.com/apache/airflow/actions/runs/82410109
> > > >
> > > > I implemented a lot of optimizations in this PR (some of them will
> > > > only take effect after we merge to master) but most of all I wanted
> to
> > > > introduce a concept of "quarantined tests" (good name isn't it :) )
> > > >
> > > > Here is the idea:
> > > >
> > > >  - tests that are marked as @pytest.mark.quarantined are skipped in
> > > > regular runs (I identified 58 potential candidates - not all of them
> > > > are flaky but I wanted to be safe)
> > > >  - there is one dedicated "Quarantine" job that runs only quarantined
> > > > tests (it's Postgres 9.6 with Python 3.6 for now)
> > > >  - those "quarantined" tests are run with 90 s. timeout each and
> rerun
> > > > up to 3 times if they fail
> > > >  - failure of any of the Quarantine tests does not fail the whole CI
> > > >  - I plan to create GithUb issues for groups of those tests
> > > > (MoveOutOfQuarantine NNNN)
> > > >  - I think it's best if we split them between committers
> > > > - The job of the committers will be to observe the stability of those
> > > tests
> > > > - once we fix and observe that the tests are "stable" we  move them
> > > > out of Quarantine back to regular tests (by removing
> > > > @pytest.mark.quarantined)
> > > > - the goal is to move all our tests out of Quarantine
> > > > - in the future we can move any flaky test to Quarantine (by adding
> > > > @pytest.mark.quarantined) and it will give us time to observe it and
> > > > fix any flakiness.
> > > >
> > > > Let me know what you think of it?
> > > >
> > > > J.
> > > >
> > > > --
> > > > Jarek Potiuk
> > > > Polidea | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129
> > >
> > >
> > >
> > > --
> > >
> > > Tomasz Urbaszek
> > > Polidea | Software Engineer
> > >
> > > M: +48 505 628 493
> > > E: [email protected]
> > >
> > > Unique Tech
> > > Check out our projects!
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
>
>
>
> --
>
> Tomasz Urbaszek
> Polidea | Software Engineer
>
> M: +48 505 628 493
> E: [email protected]
>
> Unique Tech
> Check out our projects!
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to