Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

2020-01-26 Thread Tomasz Urbaszek
Great job Jarek! 🚀

T.

On Sun, Jan 26, 2020 at 6:09 PM Jarek Potiuk 
wrote:

> While fixing the v1-10-test in preparation to 1.10.8 we've managed to fix a
> number of flaky tests, including kerberos-related flakiness, so I expect a
> lot more stability - we still had quite a few flaky tests last week, but I
> hope as of today it will be a LOT better.
>
> We have now got rid of the hardware entropy dependencies, fixed some
> "random" tests (literally they predictably failed 1/10 runs because
> randomness) and we have a robust mechanism to make sure that all the
> integrations are up and running before tests are started. This should
> really help with CI stability.
>
> Ah - and we've also sped-up the CI tests as well. We split out pylint tests
> which were the longest-running part of static tests, moved doc building to
> the "test" phase, we have more - but smaller - jobs. Seems that we also
> have more parallel workers available on Apache side, so by utilising
> parallel running we shaved off at least 10 minutes elapsed time from
> average CI pipeline execution.
>
> More improvements will come after we move to GitHub Actions (which is next
> in line).
>
> I think this thread can be closed :).
>
> J.
>
>
> On Wed, Jan 15, 2020 at 11:38 PM Jarek Potiuk 
> wrote:
>
> > Merged now - please rebase to latest master, this should workaround the
> > intermittent failures.
> >
> > J.
> >
> > On Wed, Jan 15, 2020 at 11:15 PM Jarek Potiuk 
> > wrote:
> >
> >> Hello everyone,
> >>
> >> I think - thanks to diagnostics we added I found the root cause for
> that.
> >>
> >> The most probable reason why we had the problem, is that something
> >> changed on Travis CI regarding entropy sources. Our integrations
> >> (cassandra, mysql, kerberos) need enough entropy (source of random
> data) on
> >> the servers to generate certificates/keys etc. Because of security
> reasons
> >> - you need usually a reliable (hardware) source of random data for those
> >> applications that use TLS/SSL and generate their own certificates. It
> seems
> >> right now on Travis the source of entropy is shared between multiple
> >> running jobs (dockers) and it slows down startup time a lot (10s of
> second
> >> rather than 100s of ms). So if we have a lot of parallel jobs using
> entropy
> >> running on the same hardware - startup time for those might be very
> slow.
> >>
> >> I've opened an issue in the community section of Travis for that:
> >>
> https://travis-ci.community/t/not-enough-entropy-during-builds-likely/6878
> >>
> >> In the meantime we have a workaround (waiting until all integrations /db
> >> start before we run tests) that I will merge soon:
> >> https://github.com/apache/airflow/pull/7172 (waiting for Travis build
> to
> >> complete).
> >>
> >> Possibly later we speed it up by using software source of entropy (we do
> >> not need hardware entropy for CI tests) but this might take a bit more
> time.
> >>
> >> J.
> >>
> >> On Tue, Jan 14, 2020 at 5:24 PM Jarek Potiuk 
> >> wrote:
> >>
> >>> We have other kerberos-related failures. I disabled
> >>> temporarily "kerberos-specific" build until I add some more
> diagnostics and
> >>> test it.
> >>>
> >>> Please rebase to latest master.
> >>>
> >>> J.
> >>>
> >>> On Tue, Jan 14, 2020 at 7:24 AM Jarek Potiuk  >
> >>> wrote:
> >>>
>  Seems tests are stable  - but the kerberos problem is happening often
>  enough to take a look. I will see what I can do to make it stable.
> seems
>  that might be a race between kerberos initialising and tests starting
> to
>  run.
> 
>  On Mon, Jan 13, 2020 at 8:58 PM Jarek Potiuk <
> jarek.pot...@polidea.com>
>  wrote:
> 
> > Just merged the change with integration separation/slimming down the
> > tests on CI. https://github.com/apache/airflow/pull/7091
> >
> > It looks like it is far more stable, I just had one failure with
> > kerberos not starting (which also happened sometimes with old
> tests). We
> > will look in the future at some of the "xfailed/xpassed" tests -
> those that
> > we know are problematic. We have 8 of them now.
> >
> > Also Breeze is now much more enjoyable to use. Pls. take a look at
> the
> > docs.
> >
> > J.
> >
> > On Wed, Jan 8, 2020 at 2:23 PM Jarek Potiuk <
> jarek.pot...@polidea.com>
> > wrote:
> >
> >> I like what you've done with the separate integrations, and that
> >>> coupled with pytest markers and better "import error" handling in
> the tests
> >>> would make it easier to run a sub-set of the tests without having
> to
> >>> install everything (for instance not having to install mysql
> client libs.
> >>
> >>
> >> Cool. That's exactly what I am working on in
> >> https://github.com/apache/airflow/pull/7091 -> I want to get all
> the
> >> tests run in integration-less CI, select all those that failed and
> treat
> >> them appropriately.
> >>
> >>
> >>> Admittedly less of a wo

Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

2020-01-26 Thread Jarek Potiuk
While fixing the v1-10-test in preparation to 1.10.8 we've managed to fix a
number of flaky tests, including kerberos-related flakiness, so I expect a
lot more stability - we still had quite a few flaky tests last week, but I
hope as of today it will be a LOT better.

We have now got rid of the hardware entropy dependencies, fixed some
"random" tests (literally they predictably failed 1/10 runs because
randomness) and we have a robust mechanism to make sure that all the
integrations are up and running before tests are started. This should
really help with CI stability.

Ah - and we've also sped-up the CI tests as well. We split out pylint tests
which were the longest-running part of static tests, moved doc building to
the "test" phase, we have more - but smaller - jobs. Seems that we also
have more parallel workers available on Apache side, so by utilising
parallel running we shaved off at least 10 minutes elapsed time from
average CI pipeline execution.

More improvements will come after we move to GitHub Actions (which is next
in line).

I think this thread can be closed :).

J.


On Wed, Jan 15, 2020 at 11:38 PM Jarek Potiuk 
wrote:

> Merged now - please rebase to latest master, this should workaround the
> intermittent failures.
>
> J.
>
> On Wed, Jan 15, 2020 at 11:15 PM Jarek Potiuk 
> wrote:
>
>> Hello everyone,
>>
>> I think - thanks to diagnostics we added I found the root cause for that.
>>
>> The most probable reason why we had the problem, is that something
>> changed on Travis CI regarding entropy sources. Our integrations
>> (cassandra, mysql, kerberos) need enough entropy (source of random data) on
>> the servers to generate certificates/keys etc. Because of security reasons
>> - you need usually a reliable (hardware) source of random data for those
>> applications that use TLS/SSL and generate their own certificates. It seems
>> right now on Travis the source of entropy is shared between multiple
>> running jobs (dockers) and it slows down startup time a lot (10s of second
>> rather than 100s of ms). So if we have a lot of parallel jobs using entropy
>> running on the same hardware - startup time for those might be very slow.
>>
>> I've opened an issue in the community section of Travis for that:
>> https://travis-ci.community/t/not-enough-entropy-during-builds-likely/6878
>>
>> In the meantime we have a workaround (waiting until all integrations /db
>> start before we run tests) that I will merge soon:
>> https://github.com/apache/airflow/pull/7172 (waiting for Travis build to
>> complete).
>>
>> Possibly later we speed it up by using software source of entropy (we do
>> not need hardware entropy for CI tests) but this might take a bit more time.
>>
>> J.
>>
>> On Tue, Jan 14, 2020 at 5:24 PM Jarek Potiuk 
>> wrote:
>>
>>> We have other kerberos-related failures. I disabled
>>> temporarily "kerberos-specific" build until I add some more diagnostics and
>>> test it.
>>>
>>> Please rebase to latest master.
>>>
>>> J.
>>>
>>> On Tue, Jan 14, 2020 at 7:24 AM Jarek Potiuk 
>>> wrote:
>>>
 Seems tests are stable  - but the kerberos problem is happening often
 enough to take a look. I will see what I can do to make it stable. seems
 that might be a race between kerberos initialising and tests starting to
 run.

 On Mon, Jan 13, 2020 at 8:58 PM Jarek Potiuk 
 wrote:

> Just merged the change with integration separation/slimming down the
> tests on CI. https://github.com/apache/airflow/pull/7091
>
> It looks like it is far more stable, I just had one failure with
> kerberos not starting (which also happened sometimes with old tests). We
> will look in the future at some of the "xfailed/xpassed" tests - those 
> that
> we know are problematic. We have 8 of them now.
>
> Also Breeze is now much more enjoyable to use. Pls. take a look at the
> docs.
>
> J.
>
> On Wed, Jan 8, 2020 at 2:23 PM Jarek Potiuk 
> wrote:
>
>> I like what you've done with the separate integrations, and that
>>> coupled with pytest markers and better "import error" handling in the 
>>> tests
>>> would make it easier to run a sub-set of the tests without having to
>>> install everything (for instance not having to install mysql client 
>>> libs.
>>
>>
>> Cool. That's exactly what I am working on in
>> https://github.com/apache/airflow/pull/7091 -> I want to get all the
>> tests run in integration-less CI, select all those that failed and treat
>> them appropriately.
>>
>>
>>> Admittedly less of a worry with breeze/docker, but still would be
>>> nice to skip/deselct tests when deps aren't there)
>>>
>>
>> Yeah. For me it's the same. I think we had recently a few discussions
>> with first time users that they have difficulty contributing because they
>> do not know how to reproduce failing CI reliably locally. I think the
>> resource of Breeze e

Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

2020-01-15 Thread Jarek Potiuk
Merged now - please rebase to latest master, this should workaround the
intermittent failures.

J.

On Wed, Jan 15, 2020 at 11:15 PM Jarek Potiuk 
wrote:

> Hello everyone,
>
> I think - thanks to diagnostics we added I found the root cause for that.
>
> The most probable reason why we had the problem, is that something changed
> on Travis CI regarding entropy sources. Our integrations (cassandra, mysql,
> kerberos) need enough entropy (source of random data) on the servers to
> generate certificates/keys etc. Because of security reasons - you need
> usually a reliable (hardware) source of random data for those applications
> that use TLS/SSL and generate their own certificates. It seems right now on
> Travis the source of entropy is shared between multiple running jobs
> (dockers) and it slows down startup time a lot (10s of second rather than
> 100s of ms). So if we have a lot of parallel jobs using entropy running on
> the same hardware - startup time for those might be very slow.
>
> I've opened an issue in the community section of Travis for that:
> https://travis-ci.community/t/not-enough-entropy-during-builds-likely/6878
>
> In the meantime we have a workaround (waiting until all integrations /db
> start before we run tests) that I will merge soon:
> https://github.com/apache/airflow/pull/7172 (waiting for Travis build to
> complete).
>
> Possibly later we speed it up by using software source of entropy (we do
> not need hardware entropy for CI tests) but this might take a bit more time.
>
> J.
>
> On Tue, Jan 14, 2020 at 5:24 PM Jarek Potiuk 
> wrote:
>
>> We have other kerberos-related failures. I disabled
>> temporarily "kerberos-specific" build until I add some more diagnostics and
>> test it.
>>
>> Please rebase to latest master.
>>
>> J.
>>
>> On Tue, Jan 14, 2020 at 7:24 AM Jarek Potiuk 
>> wrote:
>>
>>> Seems tests are stable  - but the kerberos problem is happening often
>>> enough to take a look. I will see what I can do to make it stable. seems
>>> that might be a race between kerberos initialising and tests starting to
>>> run.
>>>
>>> On Mon, Jan 13, 2020 at 8:58 PM Jarek Potiuk 
>>> wrote:
>>>
 Just merged the change with integration separation/slimming down the
 tests on CI. https://github.com/apache/airflow/pull/7091

 It looks like it is far more stable, I just had one failure with
 kerberos not starting (which also happened sometimes with old tests). We
 will look in the future at some of the "xfailed/xpassed" tests - those that
 we know are problematic. We have 8 of them now.

 Also Breeze is now much more enjoyable to use. Pls. take a look at the
 docs.

 J.

 On Wed, Jan 8, 2020 at 2:23 PM Jarek Potiuk 
 wrote:

> I like what you've done with the separate integrations, and that
>> coupled with pytest markers and better "import error" handling in the 
>> tests
>> would make it easier to run a sub-set of the tests without having to
>> install everything (for instance not having to install mysql client libs.
>
>
> Cool. That's exactly what I am working on in
> https://github.com/apache/airflow/pull/7091 -> I want to get all the
> tests run in integration-less CI, select all those that failed and treat
> them appropriately.
>
>
>> Admittedly less of a worry with breeze/docker, but still would be
>> nice to skip/deselct tests when deps aren't there)
>>
>
> Yeah. For me it's the same. I think we had recently a few discussions
> with first time users that they have difficulty contributing because they
> do not know how to reproduce failing CI reliably locally. I think the
> resource of Breeze environment for simple tests was a big
> blocker/difficulty for some users so slimming it down and making it
> integration-less by default will be really helpful. I will also make it 
> the
> "default" way of reproducing tests - i will remove the separate bash
> scripts which were an intermediate step. This is the same work especially
> that I use the same mechanism and ... well - it will be far easier for me
> to have integration - specific cases working in CI  if i also have Breeze
> to support it (eating my own dog food).
>
>
>> Most of these PRs are merged now, I've glanced over #7091 and like
>> the look of it, good work! You'll let us know when we should take a 
>> deeper
>> look?
>>
>
> Yep I will. I hope today/tomorrow - most of it is ready. I also
> managed to VASTLY simplified running kubernetes kind (One less docker
> image, everything runs in the same docker engine as the airflow-testing
> itself) in https://github.com/apache/airflow/pull/6516 which is
> prerequisite for #7091  - so both will need to be reviewed. I marke
>
>
>> For cassandra tests specifically I'm not sure there is a huge amount
>> of value in actually running the tests again

Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

2020-01-15 Thread Jarek Potiuk
Hello everyone,

I think - thanks to diagnostics we added I found the root cause for that.

The most probable reason why we had the problem, is that something changed
on Travis CI regarding entropy sources. Our integrations (cassandra, mysql,
kerberos) need enough entropy (source of random data) on the servers to
generate certificates/keys etc. Because of security reasons - you need
usually a reliable (hardware) source of random data for those applications
that use TLS/SSL and generate their own certificates. It seems right now on
Travis the source of entropy is shared between multiple running jobs
(dockers) and it slows down startup time a lot (10s of second rather than
100s of ms). So if we have a lot of parallel jobs using entropy running on
the same hardware - startup time for those might be very slow.

I've opened an issue in the community section of Travis for that:
https://travis-ci.community/t/not-enough-entropy-during-builds-likely/6878

In the meantime we have a workaround (waiting until all integrations /db
start before we run tests) that I will merge soon:
https://github.com/apache/airflow/pull/7172 (waiting for Travis build to
complete).

Possibly later we speed it up by using software source of entropy (we do
not need hardware entropy for CI tests) but this might take a bit more time.

J.

On Tue, Jan 14, 2020 at 5:24 PM Jarek Potiuk 
wrote:

> We have other kerberos-related failures. I disabled
> temporarily "kerberos-specific" build until I add some more diagnostics and
> test it.
>
> Please rebase to latest master.
>
> J.
>
> On Tue, Jan 14, 2020 at 7:24 AM Jarek Potiuk 
> wrote:
>
>> Seems tests are stable  - but the kerberos problem is happening often
>> enough to take a look. I will see what I can do to make it stable. seems
>> that might be a race between kerberos initialising and tests starting to
>> run.
>>
>> On Mon, Jan 13, 2020 at 8:58 PM Jarek Potiuk 
>> wrote:
>>
>>> Just merged the change with integration separation/slimming down the
>>> tests on CI. https://github.com/apache/airflow/pull/7091
>>>
>>> It looks like it is far more stable, I just had one failure with
>>> kerberos not starting (which also happened sometimes with old tests). We
>>> will look in the future at some of the "xfailed/xpassed" tests - those that
>>> we know are problematic. We have 8 of them now.
>>>
>>> Also Breeze is now much more enjoyable to use. Pls. take a look at the
>>> docs.
>>>
>>> J.
>>>
>>> On Wed, Jan 8, 2020 at 2:23 PM Jarek Potiuk 
>>> wrote:
>>>
 I like what you've done with the separate integrations, and that
> coupled with pytest markers and better "import error" handling in the 
> tests
> would make it easier to run a sub-set of the tests without having to
> install everything (for instance not having to install mysql client libs.


 Cool. That's exactly what I am working on in
 https://github.com/apache/airflow/pull/7091 -> I want to get all the
 tests run in integration-less CI, select all those that failed and treat
 them appropriately.


> Admittedly less of a worry with breeze/docker, but still would be nice
> to skip/deselct tests when deps aren't there)
>

 Yeah. For me it's the same. I think we had recently a few discussions
 with first time users that they have difficulty contributing because they
 do not know how to reproduce failing CI reliably locally. I think the
 resource of Breeze environment for simple tests was a big
 blocker/difficulty for some users so slimming it down and making it
 integration-less by default will be really helpful. I will also make it the
 "default" way of reproducing tests - i will remove the separate bash
 scripts which were an intermediate step. This is the same work especially
 that I use the same mechanism and ... well - it will be far easier for me
 to have integration - specific cases working in CI  if i also have Breeze
 to support it (eating my own dog food).


> Most of these PRs are merged now, I've glanced over #7091 and like the
> look of it, good work! You'll let us know when we should take a deeper 
> look?
>

 Yep I will. I hope today/tomorrow - most of it is ready. I also managed
 to VASTLY simplified running kubernetes kind (One less docker image,
 everything runs in the same docker engine as the airflow-testing itself) in
 https://github.com/apache/airflow/pull/6516 which is prerequisite for
 #7091  - so both will need to be reviewed. I marke


> For cassandra tests specifically I'm not sure there is a huge amount
> of value in actually running the tests against cassandra -- we are using
> the official python module for it, and the test is basically running these
> queries - DROP TABLE IF EXISTS, CREATE TABLE, INSERT INTO TABLE, and then
> running hook.record_exists -- that seems like it's testing cassandra
> itself, when I think all we sho

Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

2020-01-14 Thread Jarek Potiuk
We have other kerberos-related failures. I disabled
temporarily "kerberos-specific" build until I add some more diagnostics and
test it.

Please rebase to latest master.

J.

On Tue, Jan 14, 2020 at 7:24 AM Jarek Potiuk 
wrote:

> Seems tests are stable  - but the kerberos problem is happening often
> enough to take a look. I will see what I can do to make it stable. seems
> that might be a race between kerberos initialising and tests starting to
> run.
>
> On Mon, Jan 13, 2020 at 8:58 PM Jarek Potiuk 
> wrote:
>
>> Just merged the change with integration separation/slimming down the
>> tests on CI. https://github.com/apache/airflow/pull/7091
>>
>> It looks like it is far more stable, I just had one failure with kerberos
>> not starting (which also happened sometimes with old tests). We will look
>> in the future at some of the "xfailed/xpassed" tests - those that we know
>> are problematic. We have 8 of them now.
>>
>> Also Breeze is now much more enjoyable to use. Pls. take a look at the
>> docs.
>>
>> J.
>>
>> On Wed, Jan 8, 2020 at 2:23 PM Jarek Potiuk 
>> wrote:
>>
>>> I like what you've done with the separate integrations, and that coupled
 with pytest markers and better "import error" handling in the tests would
 make it easier to run a sub-set of the tests without having to install
 everything (for instance not having to install mysql client libs.
>>>
>>>
>>> Cool. That's exactly what I am working on in
>>> https://github.com/apache/airflow/pull/7091 -> I want to get all the
>>> tests run in integration-less CI, select all those that failed and treat
>>> them appropriately.
>>>
>>>
 Admittedly less of a worry with breeze/docker, but still would be nice
 to skip/deselct tests when deps aren't there)

>>>
>>> Yeah. For me it's the same. I think we had recently a few discussions
>>> with first time users that they have difficulty contributing because they
>>> do not know how to reproduce failing CI reliably locally. I think the
>>> resource of Breeze environment for simple tests was a big
>>> blocker/difficulty for some users so slimming it down and making it
>>> integration-less by default will be really helpful. I will also make it the
>>> "default" way of reproducing tests - i will remove the separate bash
>>> scripts which were an intermediate step. This is the same work especially
>>> that I use the same mechanism and ... well - it will be far easier for me
>>> to have integration - specific cases working in CI  if i also have Breeze
>>> to support it (eating my own dog food).
>>>
>>>
 Most of these PRs are merged now, I've glanced over #7091 and like the
 look of it, good work! You'll let us know when we should take a deeper 
 look?

>>>
>>> Yep I will. I hope today/tomorrow - most of it is ready. I also managed
>>> to VASTLY simplified running kubernetes kind (One less docker image,
>>> everything runs in the same docker engine as the airflow-testing itself) in
>>> https://github.com/apache/airflow/pull/6516 which is prerequisite for
>>> #7091  - so both will need to be reviewed. I marke
>>>
>>>
 For cassandra tests specifically I'm not sure there is a huge amount of
 value in actually running the tests against cassandra -- we are using the
 official python module for it, and the test is basically running these
 queries - DROP TABLE IF EXISTS, CREATE TABLE, INSERT INTO TABLE, and then
 running hook.record_exists -- that seems like it's testing cassandra
 itself, when I think all we should do is test that hook.record_exists calls
 the execute method on the connection with the right string. I'll knock up a
 PR for this.
 Do we think it's worth keeping the non-mocked/integration tests too?

>>>
>>> I would not remove them just yet. Let's see how it works when I separate
>>> it out. I have a feeling that we have very little number of those
>>> integration tests overall so maybe it will be stable and fast enough when
>>> we only run those in a separate job. I think it's good to have different
>>> levels of tests (unit/integration/system) as they find different types of
>>> problems.  As long as we can have integration/system tests clearly
>>> separated, stable and easy to disable/enable - I am all for having
>>> different types of tests. There is this old and well established concept of
>>> Test Pyramid https://martinfowler.com/bliki/TestPyramid.html  which
>>> applies very accurately to our case. By adding markers/categorising the
>>> tests and seeing how many of those tests we have, how stable they are, how
>>> long they are and (eventtually) how much it costs us - we can make better
>>> decisions.
>>>
>>> J.
>>>
>>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea  | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] 
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea  | Principal Software Engineer
>
> M: +48 660 796 129 <

Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

2020-01-13 Thread Jarek Potiuk
Seems tests are stable  - but the kerberos problem is happening often
enough to take a look. I will see what I can do to make it stable. seems
that might be a race between kerberos initialising and tests starting to
run.

On Mon, Jan 13, 2020 at 8:58 PM Jarek Potiuk 
wrote:

> Just merged the change with integration separation/slimming down the tests
> on CI. https://github.com/apache/airflow/pull/7091
>
> It looks like it is far more stable, I just had one failure with kerberos
> not starting (which also happened sometimes with old tests). We will look
> in the future at some of the "xfailed/xpassed" tests - those that we know
> are problematic. We have 8 of them now.
>
> Also Breeze is now much more enjoyable to use. Pls. take a look at the
> docs.
>
> J.
>
> On Wed, Jan 8, 2020 at 2:23 PM Jarek Potiuk 
> wrote:
>
>> I like what you've done with the separate integrations, and that coupled
>>> with pytest markers and better "import error" handling in the tests would
>>> make it easier to run a sub-set of the tests without having to install
>>> everything (for instance not having to install mysql client libs.
>>
>>
>> Cool. That's exactly what I am working on in
>> https://github.com/apache/airflow/pull/7091 -> I want to get all the
>> tests run in integration-less CI, select all those that failed and treat
>> them appropriately.
>>
>>
>>> Admittedly less of a worry with breeze/docker, but still would be nice
>>> to skip/deselct tests when deps aren't there)
>>>
>>
>> Yeah. For me it's the same. I think we had recently a few discussions
>> with first time users that they have difficulty contributing because they
>> do not know how to reproduce failing CI reliably locally. I think the
>> resource of Breeze environment for simple tests was a big
>> blocker/difficulty for some users so slimming it down and making it
>> integration-less by default will be really helpful. I will also make it the
>> "default" way of reproducing tests - i will remove the separate bash
>> scripts which were an intermediate step. This is the same work especially
>> that I use the same mechanism and ... well - it will be far easier for me
>> to have integration - specific cases working in CI  if i also have Breeze
>> to support it (eating my own dog food).
>>
>>
>>> Most of these PRs are merged now, I've glanced over #7091 and like the
>>> look of it, good work! You'll let us know when we should take a deeper look?
>>>
>>
>> Yep I will. I hope today/tomorrow - most of it is ready. I also managed
>> to VASTLY simplified running kubernetes kind (One less docker image,
>> everything runs in the same docker engine as the airflow-testing itself) in
>> https://github.com/apache/airflow/pull/6516 which is prerequisite for
>> #7091  - so both will need to be reviewed. I marke
>>
>>
>>> For cassandra tests specifically I'm not sure there is a huge amount of
>>> value in actually running the tests against cassandra -- we are using the
>>> official python module for it, and the test is basically running these
>>> queries - DROP TABLE IF EXISTS, CREATE TABLE, INSERT INTO TABLE, and then
>>> running hook.record_exists -- that seems like it's testing cassandra
>>> itself, when I think all we should do is test that hook.record_exists calls
>>> the execute method on the connection with the right string. I'll knock up a
>>> PR for this.
>>> Do we think it's worth keeping the non-mocked/integration tests too?
>>>
>>
>> I would not remove them just yet. Let's see how it works when I separate
>> it out. I have a feeling that we have very little number of those
>> integration tests overall so maybe it will be stable and fast enough when
>> we only run those in a separate job. I think it's good to have different
>> levels of tests (unit/integration/system) as they find different types of
>> problems.  As long as we can have integration/system tests clearly
>> separated, stable and easy to disable/enable - I am all for having
>> different types of tests. There is this old and well established concept of
>> Test Pyramid https://martinfowler.com/bliki/TestPyramid.html  which
>> applies very accurately to our case. By adding markers/categorising the
>> tests and seeing how many of those tests we have, how stable they are, how
>> long they are and (eventtually) how much it costs us - we can make better
>> decisions.
>>
>> J.
>>
>>
>
>
> --
>
> Jarek Potiuk
> Polidea  | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] 
>
>

-- 

Jarek Potiuk
Polidea  | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] 


Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

2020-01-13 Thread Jarek Potiuk
Just merged the change with integration separation/slimming down the tests
on CI. https://github.com/apache/airflow/pull/7091

It looks like it is far more stable, I just had one failure with kerberos
not starting (which also happened sometimes with old tests). We will look
in the future at some of the "xfailed/xpassed" tests - those that we know
are problematic. We have 8 of them now.

Also Breeze is now much more enjoyable to use. Pls. take a look at the docs.

J.

On Wed, Jan 8, 2020 at 2:23 PM Jarek Potiuk 
wrote:

> I like what you've done with the separate integrations, and that coupled
>> with pytest markers and better "import error" handling in the tests would
>> make it easier to run a sub-set of the tests without having to install
>> everything (for instance not having to install mysql client libs.
>
>
> Cool. That's exactly what I am working on in
> https://github.com/apache/airflow/pull/7091 -> I want to get all the
> tests run in integration-less CI, select all those that failed and treat
> them appropriately.
>
>
>> Admittedly less of a worry with breeze/docker, but still would be nice to
>> skip/deselct tests when deps aren't there)
>>
>
> Yeah. For me it's the same. I think we had recently a few discussions with
> first time users that they have difficulty contributing because they do not
> know how to reproduce failing CI reliably locally. I think the resource of
> Breeze environment for simple tests was a big blocker/difficulty for some
> users so slimming it down and making it integration-less by default will be
> really helpful. I will also make it the "default" way of reproducing tests
> - i will remove the separate bash scripts which were an intermediate step.
> This is the same work especially that I use the same mechanism and ... well
> - it will be far easier for me to have integration - specific cases working
> in CI  if i also have Breeze to support it (eating my own dog food).
>
>
>> Most of these PRs are merged now, I've glanced over #7091 and like the
>> look of it, good work! You'll let us know when we should take a deeper look?
>>
>
> Yep I will. I hope today/tomorrow - most of it is ready. I also managed to
> VASTLY simplified running kubernetes kind (One less docker image,
> everything runs in the same docker engine as the airflow-testing itself) in
> https://github.com/apache/airflow/pull/6516 which is prerequisite for
> #7091  - so both will need to be reviewed. I marke
>
>
>> For cassandra tests specifically I'm not sure there is a huge amount of
>> value in actually running the tests against cassandra -- we are using the
>> official python module for it, and the test is basically running these
>> queries - DROP TABLE IF EXISTS, CREATE TABLE, INSERT INTO TABLE, and then
>> running hook.record_exists -- that seems like it's testing cassandra
>> itself, when I think all we should do is test that hook.record_exists calls
>> the execute method on the connection with the right string. I'll knock up a
>> PR for this.
>> Do we think it's worth keeping the non-mocked/integration tests too?
>>
>
> I would not remove them just yet. Let's see how it works when I separate
> it out. I have a feeling that we have very little number of those
> integration tests overall so maybe it will be stable and fast enough when
> we only run those in a separate job. I think it's good to have different
> levels of tests (unit/integration/system) as they find different types of
> problems.  As long as we can have integration/system tests clearly
> separated, stable and easy to disable/enable - I am all for having
> different types of tests. There is this old and well established concept of
> Test Pyramid https://martinfowler.com/bliki/TestPyramid.html  which
> applies very accurately to our case. By adding markers/categorising the
> tests and seeing how many of those tests we have, how stable they are, how
> long they are and (eventtually) how much it costs us - we can make better
> decisions.
>
> J.
>
>


-- 

Jarek Potiuk
Polidea  | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] 


Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

2020-01-08 Thread Jarek Potiuk
>
> I like what you've done with the separate integrations, and that coupled
> with pytest markers and better "import error" handling in the tests would
> make it easier to run a sub-set of the tests without having to install
> everything (for instance not having to install mysql client libs.


Cool. That's exactly what I am working on in
https://github.com/apache/airflow/pull/7091 -> I want to get all the tests
run in integration-less CI, select all those that failed and treat them
appropriately.


> Admittedly less of a worry with breeze/docker, but still would be nice to
> skip/deselct tests when deps aren't there)
>

Yeah. For me it's the same. I think we had recently a few discussions with
first time users that they have difficulty contributing because they do not
know how to reproduce failing CI reliably locally. I think the resource of
Breeze environment for simple tests was a big blocker/difficulty for some
users so slimming it down and making it integration-less by default will be
really helpful. I will also make it the "default" way of reproducing tests
- i will remove the separate bash scripts which were an intermediate step.
This is the same work especially that I use the same mechanism and ... well
- it will be far easier for me to have integration - specific cases working
in CI  if i also have Breeze to support it (eating my own dog food).


> Most of these PRs are merged now, I've glanced over #7091 and like the
> look of it, good work! You'll let us know when we should take a deeper look?
>

Yep I will. I hope today/tomorrow - most of it is ready. I also managed to
VASTLY simplified running kubernetes kind (One less docker image,
everything runs in the same docker engine as the airflow-testing itself) in
https://github.com/apache/airflow/pull/6516 which is prerequisite for
#7091  - so both will need to be reviewed. I marke


> For cassandra tests specifically I'm not sure there is a huge amount of
> value in actually running the tests against cassandra -- we are using the
> official python module for it, and the test is basically running these
> queries - DROP TABLE IF EXISTS, CREATE TABLE, INSERT INTO TABLE, and then
> running hook.record_exists -- that seems like it's testing cassandra
> itself, when I think all we should do is test that hook.record_exists calls
> the execute method on the connection with the right string. I'll knock up a
> PR for this.
> Do we think it's worth keeping the non-mocked/integration tests too?
>

I would not remove them just yet. Let's see how it works when I separate it
out. I have a feeling that we have very little number of those integration
tests overall so maybe it will be stable and fast enough when we only run
those in a separate job. I think it's good to have different levels of
tests (unit/integration/system) as they find different types of problems.
As long as we can have integration/system tests clearly separated, stable
and easy to disable/enable - I am all for having different types of tests.
There is this old and well established concept of Test Pyramid
https://martinfowler.com/bliki/TestPyramid.html  which applies very
accurately to our case. By adding markers/categorising the tests and seeing
how many of those tests we have, how stable they are, how long they are and
(eventtually) how much it costs us - we can make better decisions.

J.


Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

2020-01-08 Thread Kaxil Naik
In that case we should just remove the non-kicked tests.

On Wed, Jan 8, 2020, 11:00 Ash Berlin-Taylor  wrote:

> Hi Jarek,
>
> I like what you've done with the separate integrations, and that coupled
> with pytest markers and better "import error" handling in the tests would
> make it easier to run a sub-set of the tests without having to install
> everything (for instance not having to install mysql client libs.
> Admittedly less of a worry with breeze/docker, but still would be nice to
> skip/deselct tests when deps aren't there)
> Most of these PRs are merged now, I've glanced over #7091 and like the
> look of it, good work! You'll let us know when we should take a deeper look?
> For cassandra tests specifically I'm not sure there is a huge amount of
> value in actually running the tests against cassandra -- we are using the
> official python module for it, and the test is basically running these
> queries - DROP TABLE IF EXISTS, CREATE TABLE, INSERT INTO TABLE, and then
> running hook.record_exists -- that seems like it's testing cassandra
> itself, when I think all we should do is test that hook.record_exists calls
> the execute method on the connection with the right string. I'll knock up a
> PR for this.
> Do we think it's worth keeping the non-mocked/integration tests too?
> -ash
>
> On Jan 7 2020, at 9:50 am, Jarek Potiuk  wrote:
> > *TL;DR; I have a proposal how we can remedy often failing CI tests and I
> > have a kind request to other committers to help me to fix it in a good
> way.*
> >
> > As we all noticed we had recently some often (far too often) failing
> tests
> > in Travis. The situation is not very good and we have to remedy it fairly
> > quickly.
> >
> > I think we can do it without compromising the quality and without
> temporary
> > disabling some of the tests.
> >
> > *Root cause of the problem*
> > The root cause of the problem seems to be memory used during tests. After
> > adding instafail we know that often the tests are failing because there
> is
> > not enough memory on Travis machines. This is a combination of the way
> how
> > our virtual machines are allocated in Travis infrastructure, more and
> more
> > tests we have, the fact that our tests require a lot of "integrations"
> > (running as separate images - cassandra, rabbitmq, postgres/mysql
> databases
> > etc) and the fact that running them with pytests (pytest apparently uses
> > more memory).
> >
> > *Proposal*
> > One of the proposals on slack was to get rid of Cassandra tests and
> disable
> > cassandra temporarily - but I think we can do better and I can get it
> > merged in a day or two and get it sorted out for now (and good for the
> > future).
> >
> > I already wrote an integration test proposal recently
> > <
> https://lists.apache.org/thread.html/120af497f4adf482162be9583a93651fd206b71db213255f52ad8b7a%40%3Cdev.airflow.apache.org%3E
> >
> > (I
> > will resurrect that thread now) how we can split our integration tests
> > using pytest markers and get support from Breeze and our CI testing
> > framework into separate integrations. I already have working code for
> that
> > (it is a result of my resumed work on Production Image) and most of the
> > code is already Green in Travis and they need to get a review from other
> > committers. At the end of the message I copy the excerpt from the docs
> how
> > this will work.
> >
> > Once we have that in, we will have a very easy (and maintainable for the
> > future) way that helps both with CI resources but also make Breeze far
> more
> > usable (and less resource-hungry):
> >
> > - add pytest markers so that we know which tests are "integration" ones.
> > - start Breeze locally without any external integrations (right now only
> > Kerberos is needed) - most of the tests works there. Far less resource
> usage
> > - start Breeze easily with *--integration mongo --integration
> > cassandra *etc. whenever
> > we need to run tests for that integration
> > - run all the "non-integration" tests in CI without the integrations
> > started
> > - run only the "integration-related" tests in CI with the integrations
> > started
> > - we will have more jobs in CI but they should run much more reliably
> > and faster in general
> > - also one of the changes is to improve the way we build kind/kubernetes
> > tests in order to unblock migration to GithubActions that Tomek works on
> -
> > that might be our ultimate speedup/stabilisation
> > - For those curious ones is updated documentation in my PR: "Launching
> > Breeze integrations"
> > <
> https://github.com/PolideaInternal/airflow/blob/separate-integrations/BREEZE.rst#launching-breeze-integrations
> >,
> > "Running
> > <
> https://github.com/PolideaInternal/airflow/blob/separate-integrations/BREEZE.rst#running-tests-with-kubernetes-in-breeze
> >
> > tests with Kubernetes in Breeze"
> > <
> https://github.com/PolideaInternal/airflow/blob/separate-integrations/BREEZE.rst#running-tests-with-kubernetes-in-breeze
> >
> >
> > *PRs - 

Re: Often failing tests in CI (and a way to fix them quickly and future-proof)

2020-01-08 Thread Ash Berlin-Taylor
Hi Jarek,

I like what you've done with the separate integrations, and that coupled with 
pytest markers and better "import error" handling in the tests would make it 
easier to run a sub-set of the tests without having to install everything (for 
instance not having to install mysql client libs. Admittedly less of a worry 
with breeze/docker, but still would be nice to skip/deselct tests when deps 
aren't there)
Most of these PRs are merged now, I've glanced over #7091 and like the look of 
it, good work! You'll let us know when we should take a deeper look?
For cassandra tests specifically I'm not sure there is a huge amount of value 
in actually running the tests against cassandra -- we are using the official 
python module for it, and the test is basically running these queries - DROP 
TABLE IF EXISTS, CREATE TABLE, INSERT INTO TABLE, and then running 
hook.record_exists -- that seems like it's testing cassandra itself, when I 
think all we should do is test that hook.record_exists calls the execute method 
on the connection with the right string. I'll knock up a PR for this.
Do we think it's worth keeping the non-mocked/integration tests too?
-ash

On Jan 7 2020, at 9:50 am, Jarek Potiuk  wrote:
> *TL;DR; I have a proposal how we can remedy often failing CI tests and I
> have a kind request to other committers to help me to fix it in a good way.*
>
> As we all noticed we had recently some often (far too often) failing tests
> in Travis. The situation is not very good and we have to remedy it fairly
> quickly.
>
> I think we can do it without compromising the quality and without temporary
> disabling some of the tests.
>
> *Root cause of the problem*
> The root cause of the problem seems to be memory used during tests. After
> adding instafail we know that often the tests are failing because there is
> not enough memory on Travis machines. This is a combination of the way how
> our virtual machines are allocated in Travis infrastructure, more and more
> tests we have, the fact that our tests require a lot of "integrations"
> (running as separate images - cassandra, rabbitmq, postgres/mysql databases
> etc) and the fact that running them with pytests (pytest apparently uses
> more memory).
>
> *Proposal*
> One of the proposals on slack was to get rid of Cassandra tests and disable
> cassandra temporarily - but I think we can do better and I can get it
> merged in a day or two and get it sorted out for now (and good for the
> future).
>
> I already wrote an integration test proposal recently
> 
> (I
> will resurrect that thread now) how we can split our integration tests
> using pytest markers and get support from Breeze and our CI testing
> framework into separate integrations. I already have working code for that
> (it is a result of my resumed work on Production Image) and most of the
> code is already Green in Travis and they need to get a review from other
> committers. At the end of the message I copy the excerpt from the docs how
> this will work.
>
> Once we have that in, we will have a very easy (and maintainable for the
> future) way that helps both with CI resources but also make Breeze far more
> usable (and less resource-hungry):
>
> - add pytest markers so that we know which tests are "integration" ones.
> - start Breeze locally without any external integrations (right now only
> Kerberos is needed) - most of the tests works there. Far less resource usage
> - start Breeze easily with *--integration mongo --integration
> cassandra *etc. whenever
> we need to run tests for that integration
> - run all the "non-integration" tests in CI without the integrations
> started
> - run only the "integration-related" tests in CI with the integrations
> started
> - we will have more jobs in CI but they should run much more reliably
> and faster in general
> - also one of the changes is to improve the way we build kind/kubernetes
> tests in order to unblock migration to GithubActions that Tomek works on -
> that might be our ultimate speedup/stabilisation
> - For those curious ones is updated documentation in my PR: "Launching
> Breeze integrations"
> ,
> "Running
> 
> tests with Kubernetes in Breeze"
> 
>
> *PRs - kind request to other committers*
> I have a series of PRs that are already implementing almost all of it (I
> needed that in order to implement Production Image support). They are
> depending on each other - I added unit test support for Bash scripts and
> several improvements and added simplifications:
>
> - [AIRFLOW-6

Often failing tests in CI (and a way to fix them quickly and future-proof)

2020-01-07 Thread Jarek Potiuk
*TL;DR; I have a proposal how we can remedy often failing CI tests and I
have a kind request to other committers to help me to fix it in a good way.*

As we all noticed we had recently some often (far too often) failing tests
in Travis. The situation is not very good and we have to remedy it fairly
quickly.

I think we can do it without compromising the quality and without temporary
disabling some of the tests.

*Root cause of the problem*

The root cause of the problem seems to be memory used during tests. After
adding instafail we know that often the tests are failing because there is
not enough memory on Travis machines. This is a combination of the way how
our virtual machines are allocated in Travis infrastructure, more and more
tests we have, the fact that our tests require a lot of "integrations"
(running as separate images - cassandra, rabbitmq, postgres/mysql databases
etc) and the fact that running them with pytests (pytest apparently uses
more memory).

*Proposal*

One of the proposals on slack was to get rid of Cassandra tests and disable
cassandra temporarily - but I think we can do better and I can get it
merged in a day or two and get it sorted out for now (and good for the
future).

I already wrote an integration test proposal recently

(I
will resurrect that thread now) how we can split our integration tests
using pytest markers and get support from Breeze and our CI testing
framework into separate integrations. I already have working code for that
(it is a result of my resumed work on Production Image) and most of the
code is already Green in Travis and they need to get a review from other
committers. At the end of the message I copy the excerpt from the docs how
this will work.

Once we have that in, we will have a very easy (and maintainable for the
future) way that helps both with CI resources but also make Breeze far more
usable (and less resource-hungry):

   - add pytest markers so that we know which tests are "integration" ones.
   - start Breeze locally without any external integrations (right now only
   Kerberos is needed) - most of the tests works there. Far less resource usage
   - start Breeze easily with *--integration mongo --integration
cassandra *etc. whenever
   we need to run tests for that integration
   - run all the "non-integration" tests in CI without the integrations
   started
   - run only the "integration-related" tests in CI with the integrations
   started
   - we will have more jobs in CI but they should run much more reliably
   and faster in general
   - also one of the changes is to improve the way we build kind/kubernetes
   tests in order to unblock migration to GithubActions that Tomek works on -
   that might be our ultimate speedup/stabilisation
   - For those curious ones is updated documentation in my PR: "Launching
   Breeze integrations"
   
,
   "Running
   

   tests with Kubernetes in Breeze"
   


*PRs - kind request to other committers*

I have a series of PRs that are already implementing almost all of it (I
needed that in order to implement Production Image support). They are
depending on each other - I added unit test support for Bash scripts and
several improvements and added simplifications:

   - [AIRFLOW-6489] Add BATS support for Bash unit testing
    [ready for review] -
   needed to get more control over other changes.
   - [AIRFLOW-6475] Remove duplication of volume mount specs in Breeze.
   [ready for review]-
   improves the consistency on how we run Breeze/CI
   - [AIRFLOW-6491] improve parameter handling in breeze
    [ready for review] -
   tested and improved way how we handle --options in Breeze (needed for
   Kubernetes improvements
   - [AIRFLOW-5704] Improve Kind Kubernetes scripts for local testing
   . - [testing]  improve
   handling Kubernetes Kind testing (fixes issues with loading images/
   upgrades kind to latest version).
   - [AIRFLOW-6489] Separate integrations [WIP]
    - [WIP] this is the test
   introducing different integrations - it already works for Breeze and has
   support for deciding which integrations should be started  - I just need to
   separate out the "Integration tests" to separate jobs.

I have a kind request to other committers - can you please take a look at
those and help to merge it quickly?

I