Hi All,

(Welcome to new MacBook Pro that has a send “button” on the touch bar)

Jeremiah and I have been looking into optimising the time that is spend on 
tests. The reason for this was that Travis’ runs are taking more and more time 
and we are being throttled by travis. As part of that we enabled color coding 
of test outcomes and timing of tests. The results kind of …surprising.

This is the top 20 of tests were we spend the most time. MySQL (remember 
concurrent access enabled) - 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/205277617/log.txt: 
<https://s3.amazonaws.com/archive.travis-ci.org/jobs/205277617/log.txt:>

tests.BackfillJobTest.test_backfill_examples: 287.9209s
tests.BackfillJobTest.test_backfill_multi_dates: 53.5198s
tests.SchedulerJobTest.test_scheduler_start_date: 36.4935s
tests.CoreTest.test_scheduler_job: 35.5852s
tests.CliTests.test_backfill: 29.7484s
tests.SchedulerJobTest.test_scheduler_multiprocessing: 26.1573s
tests.DaskExecutorTest.test_backfill_integration: 24.5456s
tests.CoreTest.test_schedule_dag_no_end_date_up_to_today_only: 17.3278s
tests.SubDagOperatorTests.test_subdag_deadlock: 16.1957s
tests.SensorTimeoutTest.test_timeout: 15.1000s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past: 13.8812s
tests.BackfillJobTest.test_cli_backfill_depends_on_past: 12.9539s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past_advance_ex_date:
 12.8779s
tests.SchedulerJobTest.test_dagrun_success: 12.8177s
tests.SchedulerJobTest.test_dagrun_root_fail: 10.3953s
tests.SchedulerJobTest.test_dag_with_system_exit: 10.1132s
tests.TransferTests.test_mysql_to_hive: 8.5939s
tests.SchedulerJobTest.test_retry_still_in_executor: 8.1739s
tests.SchedulerJobTest.test_dagrun_fail: 7.9855s
tests.ImpersonationTest.test_default_impersonation: 7.4993s

Yes we spend a whopping 5 minutes on executing all examples. Another 
interesting one is “tests.CoreTest.test_scheduler_job”. This test just checks 
whether a certain directories are creating as part of logging. This could have 
been covered by a real unit test just covering the functionality of the 
function that creates the files - now it takes 35s. 

We discussed several strategies for reducing time apart from rewriting some of 
the tests (that would be a herculean job!). What the most optimal seems is:

1. Run the scheduler tests apart from all other tests. 
2. Run “operator” integration tests in their own unit.
3. Run UI tests separate
4. Run API tests separate

This creates the following build matrix (warning ASCII art):

——————————————————————————————————————
|                       |  Scheduler     |      Operators       |       UI      
|       API     | 
——————————————————————————————————————
| Python 2      | x                      |.     x                       |       
x       |       x       |
——————————————————————————————————————
| Python 3      | x                      |      x                       |       
x       |       x       |
——————————————————————————————————————
| Kerberos      |                        |                              |       
x       |       x       |
——————————————————————————————————————
| Ldap          |                        |                              |       
x       |               |
——————————————————————————————————————
| Hive          |                        |      x                       |       
x       |       x       |
——————————————————————————————————————
| SSH           |                        |      x                       |       
        |               |
——————————————————————————————————————
| Postgres      | x                      |      x                       |       
x       |       x       |
——————————————————————————————————————
| MySQL         | x                      |      x                       |       
x       |       x       |
——————————————————————————————————————
| SQLite                | x                      |      x                       
|       x       |       x       |
——————————————————————————————————————


So from this build matrix one can deduct that Postgres, MySQL are generic 
services that will be present in every build. In addition all builds will use 
Python 2 and Python 3. And I propose using Python 3.4 and Python 3.5. The 
matrix can be expressed by environment variables. See .travis.yml for the 
current build matrix.

Furthermore, I would like us to label our tests correctly, e.g. unit test or 
integration test. This can be done by a comment or introducing a decorator 
@unittest and @integrationtest. This is to help reviewers and maintainers to 
find out whether new functionality is correctly covered. At a minimum a unit 
test is required for new functionality.

What is a unit test (thanks stack overflow): A unit test is a test written by 
the programmer to verify that a relatively small piece of code is doing what it 
is intended to do. They are narrow in scope, they should be easy to write and 
execute, and their effectiveness depends on what the programmer considers to be 
useful. Part of being a unit test is the implication that things outside the 
code under test are mocked or stubbed out. Unit tests shouldn't have 
dependencies on outside systems. They test internal consistency as opposed to 
proving that they play nicely with some outside system. 

An integration test is done to demonstrate that different pieces of the system 
work together. Integration tests cover whole applications, and they require 
much more effort to put together. They usually require resources like database 
instances and hardware to be allocated for them. The integration tests do a 
more convincing job of demonstrating the system works (especially to 
non-programmers) than a set of unit tests can, at least to the extent the 
integration test environment resembles production.

Lastly, I would like us to use the “mirror the file you are testing”. Ie. 
tests/models.py tests models.py etc. This means we should stop adding tests to 
core.py and migrate away from it.

I will create a couple of Jiras to track this.

Any thoughts?

Bolke

Reply via email to