Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

Jarek Potiuk Sun, 26 Jan 2020 09:10:04 -0800

While fixing the v1-10-test in preparation to 1.10.8 we've managed to fix a
number of flaky tests, including kerberos-related flakiness, so I expect a
lot more stability - we still had quite a few flaky tests last week, but I
hope as of today it will be a LOT better.


We have now got rid of the hardware entropy dependencies, fixed some
"random" tests (literally they predictably failed 1/10 runs because
randomness) and we have a robust mechanism to make sure that all the
integrations are up and running before tests are started. This should
really help with CI stability.

Ah - and we've also sped-up the CI tests as well. We split out pylint tests
which were the longest-running part of static tests, moved doc building to
the "test" phase, we have more - but smaller - jobs. Seems that we also
have more parallel workers available on Apache side, so by utilising
parallel running we shaved off at least 10 minutes elapsed time from
average CI pipeline execution.

More improvements will come after we move to GitHub Actions (which is next
in line).

I think this thread can be closed :).

J.


On Wed, Jan 15, 2020 at 11:38 PM Jarek Potiuk <[email protected]>
wrote:

> Merged now - please rebase to latest master, this should workaround the
> intermittent failures.
>
> J.
>
> On Wed, Jan 15, 2020 at 11:15 PM Jarek Potiuk <[email protected]>
> wrote:
>
>> Hello everyone,
>>
>> I think - thanks to diagnostics we added I found the root cause for that.
>>
>> The most probable reason why we had the problem, is that something
>> changed on Travis CI regarding entropy sources. Our integrations
>> (cassandra, mysql, kerberos) need enough entropy (source of random data) on
>> the servers to generate certificates/keys etc. Because of security reasons
>> - you need usually a reliable (hardware) source of random data for those
>> applications that use TLS/SSL and generate their own certificates. It seems
>> right now on Travis the source of entropy is shared between multiple
>> running jobs (dockers) and it slows down startup time a lot (10s of second
>> rather than 100s of ms). So if we have a lot of parallel jobs using entropy
>> running on the same hardware - startup time for those might be very slow.
>>
>> I've opened an issue in the community section of Travis for that:
>> https://travis-ci.community/t/not-enough-entropy-during-builds-likely/6878
>>
>> In the meantime we have a workaround (waiting until all integrations /db
>> start before we run tests) that I will merge soon:
>> https://github.com/apache/airflow/pull/7172 (waiting for Travis build to
>> complete).
>>
>> Possibly later we speed it up by using software source of entropy (we do
>> not need hardware entropy for CI tests) but this might take a bit more time.
>>
>> J.
>>
>> On Tue, Jan 14, 2020 at 5:24 PM Jarek Potiuk <[email protected]>
>> wrote:
>>
>>> We have other kerberos-related failures. I disabled
>>> temporarily "kerberos-specific" build until I add some more diagnostics and
>>> test it.
>>>
>>> Please rebase to latest master.
>>>
>>> J.
>>>
>>> On Tue, Jan 14, 2020 at 7:24 AM Jarek Potiuk <[email protected]>
>>> wrote:
>>>
>>>> Seems tests are stable  - but the kerberos problem is happening often
>>>> enough to take a look. I will see what I can do to make it stable. seems
>>>> that might be a race between kerberos initialising and tests starting to
>>>> run.
>>>>
>>>> On Mon, Jan 13, 2020 at 8:58 PM Jarek Potiuk <[email protected]>
>>>> wrote:
>>>>
>>>>> Just merged the change with integration separation/slimming down the
>>>>> tests on CI. https://github.com/apache/airflow/pull/7091
>>>>>
>>>>> It looks like it is far more stable, I just had one failure with
>>>>> kerberos not starting (which also happened sometimes with old tests). We
>>>>> will look in the future at some of the "xfailed/xpassed" tests - those 
>>>>> that
>>>>> we know are problematic. We have 8 of them now.
>>>>>
>>>>> Also Breeze is now much more enjoyable to use. Pls. take a look at the
>>>>> docs.
>>>>>
>>>>> J.
>>>>>
>>>>> On Wed, Jan 8, 2020 at 2:23 PM Jarek Potiuk <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I like what you've done with the separate integrations, and that
>>>>>>> coupled with pytest markers and better "import error" handling in the 
>>>>>>> tests
>>>>>>> would make it easier to run a sub-set of the tests without having to
>>>>>>> install everything (for instance not having to install mysql client 
>>>>>>> libs.
>>>>>>
>>>>>>
>>>>>> Cool. That's exactly what I am working on in
>>>>>> https://github.com/apache/airflow/pull/7091 -> I want to get all the
>>>>>> tests run in integration-less CI, select all those that failed and treat
>>>>>> them appropriately.
>>>>>>
>>>>>>
>>>>>>> Admittedly less of a worry with breeze/docker, but still would be
>>>>>>> nice to skip/deselct tests when deps aren't there)
>>>>>>>
>>>>>>
>>>>>> Yeah. For me it's the same. I think we had recently a few discussions
>>>>>> with first time users that they have difficulty contributing because they
>>>>>> do not know how to reproduce failing CI reliably locally. I think the
>>>>>> resource of Breeze environment for simple tests was a big
>>>>>> blocker/difficulty for some users so slimming it down and making it
>>>>>> integration-less by default will be really helpful. I will also make it 
>>>>>> the
>>>>>> "default" way of reproducing tests - i will remove the separate bash
>>>>>> scripts which were an intermediate step. This is the same work especially
>>>>>> that I use the same mechanism and ... well - it will be far easier for me
>>>>>> to have integration - specific cases working in CI  if i also have Breeze
>>>>>> to support it (eating my own dog food).
>>>>>>
>>>>>>
>>>>>>> Most of these PRs are merged now, I've glanced over #7091 and like
>>>>>>> the look of it, good work! You'll let us know when we should take a 
>>>>>>> deeper
>>>>>>> look?
>>>>>>>
>>>>>>
>>>>>> Yep I will. I hope today/tomorrow - most of it is ready. I also
>>>>>> managed to VASTLY simplified running kubernetes kind (One less docker
>>>>>> image, everything runs in the same docker engine as the airflow-testing
>>>>>> itself) in https://github.com/apache/airflow/pull/6516 which is
>>>>>> prerequisite for #7091  - so both will need to be reviewed. I marke
>>>>>>
>>>>>>
>>>>>>> For cassandra tests specifically I'm not sure there is a huge amount
>>>>>>> of value in actually running the tests against cassandra -- we are using
>>>>>>> the official python module for it, and the test is basically running 
>>>>>>> these
>>>>>>> queries - DROP TABLE IF EXISTS, CREATE TABLE, INSERT INTO TABLE, and 
>>>>>>> then
>>>>>>> running hook.record_exists -- that seems like it's testing cassandra
>>>>>>> itself, when I think all we should do is test that hook.record_exists 
>>>>>>> calls
>>>>>>> the execute method on the connection with the right string. I'll knock 
>>>>>>> up a
>>>>>>> PR for this.
>>>>>>> Do we think it's worth keeping the non-mocked/integration tests too?
>>>>>>>
>>>>>>
>>>>>> I would not remove them just yet. Let's see how it works when I
>>>>>> separate it out. I have a feeling that we have very little number of 
>>>>>> those
>>>>>> integration tests overall so maybe it will be stable and fast enough when
>>>>>> we only run those in a separate job. I think it's good to have different
>>>>>> levels of tests (unit/integration/system) as they find different types of
>>>>>> problems.  As long as we can have integration/system tests clearly
>>>>>> separated, stable and easy to disable/enable - I am all for having
>>>>>> different types of tests. There is this old and well established concept 
>>>>>> of
>>>>>> Test Pyramid https://martinfowler.com/bliki/TestPyramid.html  which
>>>>>> applies very accurately to our case. By adding markers/categorising the
>>>>>> tests and seeing how many of those tests we have, how stable they are, 
>>>>>> how
>>>>>> long they are and (eventtually) how much it costs us - we can make better
>>>>>> decisions.
>>>>>>
>>>>>> J.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

Reply via email to