Cutting down on testing time - updated

Bolke de Bruin Sat, 25 Feb 2017 10:19:39 -0800

Hi All,

(Welcome to new MacBook Pro that has a send “button” on the touch bar)

Jeremiah and I have been looking into optimising the time that is spend on
tests. The reason for this was that Travis’ runs are taking more and more time
and we are being throttled by travis. As part of that we enabled color coding
of test outcomes and timing of tests. The results kind of …surprising.

This is the top 20 of tests were we spend the most time. MySQL (remember
concurrent access enabled) -
https://s3.amazonaws.com/archive.travis-ci.org/jobs/205277617/log.txt:
<https://s3.amazonaws.com/archive.travis-ci.org/jobs/205277617/log.txt:>

tests.BackfillJobTest.test_backfill_examples: 287.9209s
tests.BackfillJobTest.test_backfill_multi_dates: 53.5198s
tests.SchedulerJobTest.test_scheduler_start_date: 36.4935s
tests.CoreTest.test_scheduler_job: 35.5852s
tests.CliTests.test_backfill: 29.7484s
tests.SchedulerJobTest.test_scheduler_multiprocessing: 26.1573s
tests.DaskExecutorTest.test_backfill_integration: 24.5456s
tests.CoreTest.test_schedule_dag_no_end_date_up_to_today_only: 17.3278s
tests.SubDagOperatorTests.test_subdag_deadlock: 16.1957s
tests.SensorTimeoutTest.test_timeout: 15.1000s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past: 13.8812s
tests.BackfillJobTest.test_cli_backfill_depends_on_past: 12.9539s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past_advance_ex_date:
12.8779s
tests.SchedulerJobTest.test_dagrun_success: 12.8177s
tests.SchedulerJobTest.test_dagrun_root_fail: 10.3953s
tests.SchedulerJobTest.test_dag_with_system_exit: 10.1132s
tests.TransferTests.test_mysql_to_hive: 8.5939s
tests.SchedulerJobTest.test_retry_still_in_executor: 8.1739s
tests.SchedulerJobTest.test_dagrun_fail: 7.9855s
tests.ImpersonationTest.test_default_impersonation: 7.4993s

Yes we spend a whopping 5 minutes on executing all examples. Another
interesting one is “tests.CoreTest.test_scheduler_job”. This test just checks
whether a certain directories are creating as part of logging. This could have
been covered by a real unit test just covering the functionality of the
function that creates the files - now it takes 35s.

We discussed several strategies for reducing time apart from rewriting some of
the tests (that would be a herculean job!). What the most optimal seems is:

1. Run the scheduler tests apart from all other tests.
2. Run “operator” integration tests in their own unit.
3. Run UI tests separate
4. Run API tests separate

This creates the following build matrix (warning ASCII art):

——————————————————————————————————————
| | Scheduler | Operators | UI
| API |
——————————————————————————————————————
| Python 2 | x |. x |
x | x |
——————————————————————————————————————
| Python 3 | x | x |
x | x |
——————————————————————————————————————
| Kerberos | | |
x | x |
——————————————————————————————————————
| Ldap | | |
x | |
——————————————————————————————————————
| Hive | | x |
x | x |
——————————————————————————————————————
| SSH | | x |
| |
——————————————————————————————————————
| Postgres | x | x |
x | x |
——————————————————————————————————————
| MySQL | x | x |
x | x |
——————————————————————————————————————
| SQLite | x | x
| x | x |
——————————————————————————————————————

So from this build matrix one can deduct that Postgres, MySQL are generic
services that will be present in every build. In addition all builds will use
Python 2 and Python 3. And I propose using Python 3.4 and Python 3.5. The
matrix can be expressed by environment variables. See .travis.yml for the
current build matrix.

Furthermore, I would like us to label our tests correctly, e.g. unit test or
integration test. This can be done by a comment or introducing a decorator
@unittest and @integrationtest. This is to help reviewers and maintainers to
find out whether new functionality is correctly covered. At a minimum a unit
test is required for new functionality.

What is a unit test (thanks stack overflow): A unit test is a test written by
the programmer to verify that a relatively small piece of code is doing what it
is intended to do. They are narrow in scope, they should be easy to write and
execute, and their effectiveness depends on what the programmer considers to be
useful. Part of being a unit test is the implication that things outside the
code under test are mocked or stubbed out. Unit tests shouldn't have
dependencies on outside systems. They test internal consistency as opposed to
proving that they play nicely with some outside system.

An integration test is done to demonstrate that different pieces of the system
work together. Integration tests cover whole applications, and they require
much more effort to put together. They usually require resources like database
instances and hardware to be allocated for them. The integration tests do a
more convincing job of demonstrating the system works (especially to
non-programmers) than a set of unit tests can, at least to the extent the
integration test environment resembles production.

Lastly, I would like us to use the “mirror the file you are testing”. Ie.
tests/models.py tests models.py etc. This means we should stop adding tests to
core.py and migrate away from it.

I will create a couple of Jiras to track this.

Any thoughts?

Bolke

Cutting down on testing time - updated

Reply via email to