I have been investigating what takes time in Django's test runner and
if there is anything to do about it. The short answer is: yes, there
is a lot of room for improvement. I managed to reduce the running
speed of the test suite (on sqlite3) from 1700 seconds to around 500
seconds. On postgresql I reduced the run time from 5000 to 2500
seconds.

The things I did are listed below. Note that I haven't implemented any
of these (except #4) in a way that would be acceptable for core
inclusion. I implemented them in the quickest way possible to see what
the effect is.
  1. Use MD5 passwords (or SHA1 passwords) in testing. This is just a
change to the default test_sqlite.py file, and documentation. Result:
150 seconds from this alone. One password check takes 0.3 seconds on
default settings. There is no reason to run with the highest security
passwords when testing.
  2. Do not validate the whole project for each test case. Django uses
call_command to flush database, load fixtures etc between test runs.
These commands run validation of the whole project. For every test.
That is not necessary. In the patch I just disabled validation, which
is not the correct approach. I think this is responsible for around
100-200 seconds.
  3. Fix fixture loading. Fixtures are reloaded for every test case.
Now, the problem is that checking for the fixture file is implemented
in a way that results in around 2.5 million "file exists" checks
during the test suite. Basically, for every fixture directory, every
compression type and every fixture file type combination check if a
file with that combination exists. I hacked this into somewhat better
shape (from performance point of view) but it could be much better,
both in code quality and speed. Around 200 seconds saved from here.
  4. I applied the deepcopy removal patch from ticket #16759. I think
this is responsible for around 100-200 seconds. This would be
important to get into Django regardless of test runner speed.
  5. Track "model data changed". This way, Django can skip flushing
most tables (most notably contenttypes and permissions) between each
transactional test case. This data is mostly static, and the test
suite has really a lot of models, and thus a lot of contenttypes and
permissions. This results in massive reloads between each test case.
In the patch I track changes to models. If there are none, no need to
flush & reload data. Now, the implementation is far from perfect. And
there is the question if this is too much magic. However, I think this
alone is responsible for 1000-1500 seconds when running under
postgresql. I bet for mysql & oracle there is still more gain from
this.

After all of the above, there aren't any really easy gains left, or at
least I can't find them. From above points 1-4 should be somewhat easy
to do. I would like 5 included also, but I can see some counter
arguments for that. It would also be nice for static data in testing.
I work in medical sector here, ICD-10 is 10000+ codes. From
application perspective this is static data. I really do not wish to
reload 10000 rows for every test.

There is a github branch (https://github.com/akaariai/django/compare/
fast_tests) for the things I did. Note that there really isn't
anything ready for 1, 2, 3 or 5 there. They are there just to show the
pain-points. 4 is ready and is tracked by ticket #16759.

There is also a profile file from after the above fixes.
https://github.com/akaariai/django/commit/fedc5c1d10960cefe845fa91f3692cab953253fd

Unfortunately I do not have before-profile available any more, and
generating another one will take more than an hour.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to