tl;dr: I've written an alternative TestCase base class which makes 
fixture-using tests much more I/O efficient on transactional DBs, and I'd like 
to upstream it.

Greetings, all! This is my first django-dev post, so please be gentle. :-) I 
hack on support.mozilla.com, a fairly large Django site with about 1000 tests. 
Those tests make heavy use of fixtures and, as a result, used to take over 5 
minutes to run. So, I spent a few days seeing if I could cut the amount of DB 
I/O needed. Ultimately, I got the run down to just over 1 minute, and almost 
all of those gains are translatable to any Django site running against a 
transactional DB. No changes to the apps themselves are needed. I'd love to 
push some of this work upstream, if there's interest (or even lack of 
opposition ;-)).

The speedups came from 3 main optimizations:

1. Class-level fixture setup

Given a transaction DB, there's no reason to reload fixtures via dozens of SQL 
statements before every test. I made use of setup_class() and teardown_class() 
(yay, unittest2!) to change the flow for TestCase-using tests to this:
    a. Load the fixtures at the top of the class, and commit.
    b. Run a test.
    c. Roll back, returning to pristine fixtures. Go back to step b.
    d. At class teardown, figure out which tables the fixtures loaded into, and 
expressly clear out what was added.

Before this optimization: 302s to run the suite
After: 97s.

Before: 37,583 queries
After: 4,116

On top of that, an additional 4s was saved by reusing a single connection 
rather than opening and closing them all the time, bringing the final number 
down to 93s. (We can get away with this because we're committing any 
on-cursor-initialization setup, whereas the old TestCase rolled it back.)

Here's the code: 
https://github.com/erikrose/test-utils/blob/master/test_utils/__init__.py#L121. 
I'd love to generalize it a bit (to fall back to the old behavior with 
non-transactional backends, for example) and offer it as a patch to Django 
proper, replacing TestCase. Thoughts?

(If you notice that copy-and-paste of loaddata sitting off to the side in 
another module, don't fret; in the patch, that would turn into a refactoring of 
loaddata to make the computation of the fixture-referenced tables separately 
reusable.)


2. Fixture grouping

I next observed that many test classes reused the same sets of fixtures, often 
via subclassing. After the previous optimization, our tests still loaded 
fixtures 114 times, even though there were only 11 distinct sets of them. So, I 
thought: why not write a custom testrunner that buckets the classes by fixture 
set and advises the classes that, unless they're the first or last in a bucket, 
they shouldn't bother tearing down or setting up the fixtures, respectively? 
This took the form of a custom nose plugin (we use nose for all our Django 
stuff), and it took another quarter off the test run:

Before: 97s
After: 74s

Of course, test independence is still preserved. We're just factoring out 
pointlessly repeated setup.

I don't really have plans to upstream this unless someone calls for it, but 
I'll be making it available soon, likely as part of django-nose.


3. Startup optimizations

At this point, it was bothering me that, just to run a single test, I had to 
wait through 15s of DB initialization (mostly auth_permissions and 
django_content_type population)—stuff which was already perfectly valid from 
the previous test run. So, building on some work we had already done in this 
direction, I decided to skip the teardown of the test DB and, symmetrically, 
the setup on future runs. If you make schema changes, just set an env var, and 
it wipes and remakes the DB like usual. I could see pushing this into 
django-nose as well, but it's got the hackiest implementation and can 
potentially confuse users. I mention it for completeness.

Before: startup time 15s
After: 3s (There's quite a wide variance due to I/O caching luck.)

Code: https://github.com/erikrose/test-utils/commit/b95a1b7


If you read this far, you get a cookie! I welcome your feedback on merging 
optimization #1 into core, as well as any accusations of insanity re: #2 and 
#3. FWIW, everything works great without touching any of the tests on 3 of our 
Django sites, totaling over 2000 tests.

Best regards and wishes for a happy weekend,
Erik Rose
support.mozilla.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to