Re: Test optimizations (2-5x as fast)

2011-06-08 Thread Ned Batchelder

On 6/6/2011 10:19 PM, Ramiro Morales wrote:

On Sun, Jun 5, 2011 at 5:18 PM, Ned Batchelder  wrote:

When I try this on a PostgreSQL database, I have problems relating to
violated uniqueness constraints, sometimes from tests themselves, sometimes
from setUpClass, sometimes from tearDownClass.  In the latter two cases,
it's the sites table involved.  Is this something others have dealt with, or
am I on my own? :)

I tried adding a PostgreSQL "disable constraints" statement here:
https://github.com/jbalogh/test-utils/blob/master/test_utils/__init__.py#L109

cursor.execute('SET CONSTRAINT ALL DEFERRED')

It didn't help.

This might be related to ticket [1]#11665, a knownissue in the TestCase
handling of constraints with pg. Suggestion athere si to use
SET CONSTRAINTS ALL IMMEDIATE
before the rollback.

HTH

Thanks for the idea, but it did not help, which makes sense to me.  The 
ticket in question is complaining that TestCase doesn't throw enough 
constraint violation exceptions, and suggests setting SET CONSTRAINT ALL 
DEFERRED as a way to force problems to the surface.  I'm trying to do 
the opposite...


--Ned.

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-06-06 Thread Ramiro Morales
On Sun, Jun 5, 2011 at 5:18 PM, Ned Batchelder  wrote:
> When I try this on a PostgreSQL database, I have problems relating to
> violated uniqueness constraints, sometimes from tests themselves, sometimes
> from setUpClass, sometimes from tearDownClass.  In the latter two cases,
> it's the sites table involved.  Is this something others have dealt with, or
> am I on my own? :)
>
> I tried adding a PostgreSQL "disable constraints" statement here:
> https://github.com/jbalogh/test-utils/blob/master/test_utils/__init__.py#L109
>
>    cursor.execute('SET CONSTRAINT ALL DEFERRED')
>
> It didn't help.

This might be related to ticket [1]#11665, a knownissue in the TestCase
handling of constraints with pg. Suggestion athere si to use
SET CONSTRAINTS ALL IMMEDIATE
before the rollback.

HTH

-- 
Ramiro Morales

1. https://code.djangoproject.com/ticket/11665

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-06-06 Thread Ramiro Morales
On Fri, May 13, 2011 at 11:57 PM, Erik Rose  wrote:
> tl;dr: I've written an alternative TestCase base class which makes 
> fixture-using tests much more I/O efficient on transactional DBs, and I'd 
> like to upstream it.
> [...]
> 1. Class-level fixture setup
> [...]
> 2. Fixture grouping
> [...]

Only wanted to point that optimization of database data fixture loading
in tests is the topic of ticket [1]9449 in the Django bug tracker.

The two contributors that participated in the discussion there so far
took the fixture caching path, but I think the ticket can be considered
associated with the general problem and so is open for other solution
strategies.

-- 
Ramiro Morales

1. https://code.djangoproject.com/ticket/9449

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-06-06 Thread Erik Rose
On Jun 5, 2011, at 10:18 AM, Ned Batchelder wrote:

> When I try this on a PostgreSQL database, I have problems relating to 
> violated uniqueness constraints, sometimes from tests themselves, sometimes 
> from setUpClass, sometimes from tearDownClass.  In the latter two cases, it's 
> the sites table involved.  Is this something others have dealt with, or am I 
> on my own? :)

Do you perchance have anything getting inserted into tables other than the ones 
explicitly mentioned in your fixtures, perhaps by a post_save signal? The code 
I have up at the moment doesn't deal with that. A cheap way to make it work 
would be to truncate *all* tables on teardown_fixtures (slow, ick).

I will be sure to give it a swing with Postgres; I hope to find time to work on 
the Django patch sometime this week.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-06-05 Thread Ned Batchelder

On 5/17/2011 2:28 PM, Erik Rose wrote:

I would be very happy to test this against Oracle database to see is how
much patch improves speed since previously running tests against Oracle
has been a real pain specially all db recreate stuff took a long long
time.

Great! I'll post again to this thread when the patch is ready. Or, if you'd 
like to try it now, you can download https://github.com/jbalogh/test-utils and 
make your test classes subclass FastFixtureTestCase rather than TestCase.

When I try this on a PostgreSQL database, I have problems relating to 
violated uniqueness constraints, sometimes from tests themselves, 
sometimes from setUpClass, sometimes from tearDownClass.  In the latter 
two cases, it's the sites table involved.  Is this something others have 
dealt with, or am I on my own? :)


I tried adding a PostgreSQL "disable constraints" statement here: 
https://github.com/jbalogh/test-utils/blob/master/test_utils/__init__.py#L109


cursor.execute('SET CONSTRAINT ALL DEFERRED')

It didn't help.

Thanks,

--Ned.

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-20 Thread David Cramer
Here's my proposal, assuming it can be done:

1. Create default database.

2. Run a test

3. If a test has fixtures, check for, and if not, copy base table to
``name_``.

4. Start transaction

5. Run Tests

6. Roll back

I think that pretty much would solve all cases, and assuming you reuse
tons of fixtures it should be a huge benefit.

Also if the begin/rollback aren't good enough, and we can already
"copy" a database, then we could just continually copy databases each
time (assuming its fast).

On May 19, 6:12 am, Hanne Moa  wrote:
> On 18 May 2011 01:46, Erik Rose  wrote:
>
> >> Is there a sensible to way "copy" databases in SQL?
>
> > SQL 2003 introduced CREATE TABLE x LIKE y for cloning the schema of a 
> > table. It's supported in MySQL at least. You could then do a bunch of 
> > INSERT INTO ... SELECTs if you deferred foreign key checks first.
>
> Sometimes, in order to rescue data from an overfull table (because the
> cleanup-job had died and a DELETE would take too long) I've done the
> following:
>
> - start transcation
> - rename bad table
> - receate the table (CREATE TABLE x LIKE would work)
> -  INSERT INTO ... SELECT good data into the recreated table from the
> renamed table
> - drop renamed table
> - end transaction
>
> This works even when the system is up and running, on production servers.
>
> HM

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-19 Thread Hanne Moa
On 18 May 2011 01:46, Erik Rose  wrote:
>> Is there a sensible to way "copy" databases in SQL?
>
> SQL 2003 introduced CREATE TABLE x LIKE y for cloning the schema of a table. 
> It's supported in MySQL at least. You could then do a bunch of INSERT INTO 
> ... SELECTs if you deferred foreign key checks first.

Sometimes, in order to rescue data from an overfull table (because the
cleanup-job had died and a DELETE would take too long) I've done the
following:

- start transcation
- rename bad table
- receate the table (CREATE TABLE x LIKE would work)
-  INSERT INTO ... SELECT good data into the recreated table from the
renamed table
- drop renamed table
- end transaction

This works even when the system is up and running, on production servers.


HM

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-18 Thread Erik Rose
> I suspect he was thinking of PostgreSQL's support for template
> databases.  It skips parsing overhead, so that creating a copy of a
> template is roughly disk-bound.

Ah yes. I've been away from my old friend Postgres for a few years. :-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Jeremy Dunck
On Tue, May 17, 2011 at 6:46 PM, Erik Rose  wrote:
>> Is there a sensible to way "copy" databases in SQL?
>
> SQL 2003 introduced CREATE TABLE x LIKE y for cloning the schema of a table. 
> It's supported in MySQL at least. You could then do a bunch of INSERT INTO 
> ... SELECTs if you deferred foreign key checks first.
>
> Wait, why do you want to?

I suspect he was thinking of PostgreSQL's support for template
databases.  It skips parsing overhead, so that creating a copy of a
template is roughly disk-bound.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Erik Rose
> Is there a sensible to way "copy" databases in SQL?

SQL 2003 introduced CREATE TABLE x LIKE y for cloning the schema of a table. 
It's supported in MySQL at least. You could then do a bunch of INSERT INTO ... 
SELECTs if you deferred foreign key checks first.

Wait, why do you want to?

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread David Cramer
Is there a sensible to way "copy" databases in SQL? it's pretty
obvious with things like sqlite, but outside of that seems tricky. I
really like that idea, and you should definitely just be able to (at
the very least) run a unique hash on the required fixtures to
determine if a database is available for this or not.

On May 17, 11:28 am, Erik Rose  wrote:
> > I would be very happy to test this against Oracle database to see is how
> > much patch improves speed since previously running tests against Oracle
> > has been a real pain specially all db recreate stuff took a long long
> > time.
>
> Great! I'll post again to this thread when the patch is ready. Or, if you'd 
> like to try it now, you can downloadhttps://github.com/jbalogh/test-utilsand 
> make your test classes subclass FastFixtureTestCase rather than TestCase.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Erik Rose
On May 17, 2011, at 7:16 AM, Jonas H. wrote:

> 3) Hash the SQL generated for setup/fixtures. (step in right before the SQL 
> is sent to the database)
> Advantages: No false-positives, simple
> Disadvantages: Does not eliminate the need for SQL generation and fixture 
> parsing + model creation, so this might not be the "highest of highs" ;-)

FWIW, I tried this awhile back while trying to get rid of the necessity of the 
FORCE_DB option in test-utils 
(https://github.com/jbalogh/test-utils/blob/master/test_utils). It added 2 
seconds to our test startup time, so I decided against it. It wasn't clear from 
the profiles that string manips (SQL generation) was taking a lot of that time, 
so I decided to forgo the model metadata deep comparison code I had thought 
about writing.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Erik Rose
> I would be very happy to test this against Oracle database to see is how
> much patch improves speed since previously running tests against Oracle
> has been a real pain specially all db recreate stuff took a long long
> time.

Great! I'll post again to this thread when the patch is ready. Or, if you'd 
like to try it now, you can download https://github.com/jbalogh/test-utils and 
make your test classes subclass FastFixtureTestCase rather than TestCase.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Jeremy Dunck
On Tue, May 17, 2011 at 12:59 PM, Jonas H.  wrote:
> On 05/17/2011 04:48 PM, Jeremy Dunck wrote:
>>>
>>> 1) Use file modification timestamps for all model and test related files.
>>> Advantages: simple, works.
>>> Disadvantages: Triggers cache invalidation for changes not related to
>>> models
>>> or tests
>>
>> I think this is a pretty big win, even though it's not theoretically
>> optimal.
>
> Only for "does-it-still-work" sort of tests. Not for test-driven
> development, because your models and tests change all the time.
>

Well, we're debating various ways to improve, and I'm saying, let's do
the simplest thing that will work to raise the chance that it'll
actually get done. :-)

I declare myself bike-shedding.  Given the 3 options, I'm:

+1 on #1
+0 on #2
-0 on #3

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Erik Rose
> I declare myself bike-shedding.  Given the 3 options, I'm:
> 
> +1 on #1
> +0 on #2
> -0 on #3

Heh, I was just going to quietly sit here and do that while everybody else kept 
mailing. :-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Jonas H.

On 05/17/2011 04:48 PM, Jeremy Dunck wrote:

1) Use file modification timestamps for all model and test related files.
Advantages: simple, works.
Disadvantages: Triggers cache invalidation for changes not related to models
or tests


I think this is a pretty big win, even though it's not theoretically optimal.


Only for "does-it-still-work" sort of tests. Not for test-driven 
development, because your models and tests change all the time.


--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Ned Batchelder

On 5/17/2011 11:31 AM, Jeremy Dunck wrote:

On Tue, May 17, 2011 at 10:24 AM, Ned Batchelder  wrote:

Maybe it wouldn't be so bad to punt on invalidation?  The cached databases
would only have to be rebuilt if the models changed or if the fixtures
changed, right?  We have a similar situation now with migrations: you have
to write one every time you change a model, and there's no automatic
mechanism that kicks in to tell you to write one, you just have to know:
"change a model, write a migration."  If that's working now, then what's
wrong with, "change a model or a fixture, re-run the test database cacher."

The difference is, migrations can be merged.  Database cache is local
state.  No?


Hmm, that's a good point.

--Ned.

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Jeremy Dunck
On Tue, May 17, 2011 at 10:24 AM, Ned Batchelder  wrote:
> Maybe it wouldn't be so bad to punt on invalidation?  The cached databases
> would only have to be rebuilt if the models changed or if the fixtures
> changed, right?  We have a similar situation now with migrations: you have
> to write one every time you change a model, and there's no automatic
> mechanism that kicks in to tell you to write one, you just have to know:
> "change a model, write a migration."  If that's working now, then what's
> wrong with, "change a model or a fixture, re-run the test database cacher."

The difference is, migrations can be merged.  Database cache is local
state.  No?

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Ned Batchelder

On 5/17/2011 10:48 AM, Jeremy Dunck wrote:

On Tue, May 17, 2011 at 9:16 AM, Jonas H.  wrote:


Invalidation is what I'm unsure about too -- multiple ideas came to my mind,
all involving some sort of Great Hash(tm):

Even within a single test command run, the same DB setup and same
fixture loads are done many times (for a sizable suite).  Invalidating
too often is better than invalidating too little.


1) Use file modification timestamps for all model and test related files.
Advantages: simple, works.
Disadvantages: Triggers cache invalidation for changes not related to models
or tests

I think this is a pretty big win, even though it's not theoretically optimal.

Maybe it wouldn't be so bad to punt on invalidation?  The cached 
databases would only have to be rebuilt if the models changed or if the 
fixtures changed, right?  We have a similar situation now with 
migrations: you have to write one every time you change a model, and 
there's no automatic mechanism that kicks in to tell you to write one, 
you just have to know: "change a model, write a migration."  If that's 
working now, then what's wrong with, "change a model or a fixture, 
re-run the test database cacher."


--Ned.

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Ian Kelly
On Mon, May 16, 2011 at 10:12 PM, David Cramer  wrote:
> Postgres requires resetting the sequences I believe. I just assume
> Oracle/MSSQL are probably similar.

Actually in the Oracle backend, resetting the sequence for an empty
table is currently a no-op for transactional reasons.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Jeremy Dunck
On Tue, May 17, 2011 at 9:16 AM, Jonas H.  wrote:

> Invalidation is what I'm unsure about too -- multiple ideas came to my mind,
> all involving some sort of Great Hash(tm):

Even within a single test command run, the same DB setup and same
fixture loads are done many times (for a sizable suite).  Invalidating
too often is better than invalidating too little.

> 1) Use file modification timestamps for all model and test related files.
> Advantages: simple, works.
> Disadvantages: Triggers cache invalidation for changes not related to models
> or tests

I think this is a pretty big win, even though it's not theoretically optimal.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Jonas H.

On 05/17/2011 01:55 PM, Ned Batchelder wrote:

On 5/16/2011 11:18 PM, Erik Rose wrote:

How about caching the test databases? The database state could be
cached after model setup (which takes some time if you've got lots of
them) + initial data fixture setup, and after the setup for each test
case (fixtures + setUp() method).

So, in the best case, no database setup is required at all to run
tests -- which encourages test driven development :-)

So that would be 11 separate DBs for our tests, and you'd just switch
between them? Interesting idea.


Erik: yep


I'd been thinking recently about this as well: when you consider all the
test runs, they're very repetitive. Every time the tests are run, they
go through the same set of steps: a) create database, b) install
fixtures, c) run tests. Steps a, b, and c take too long. Step c is what
we're really interested in, and almost always, steps a and b have the
same outcome as the last time we ran them. We all know what to do if an
operation takes too long and usually is the same as last time: cache its
outcome. The outcome in this case is the state of the database. Caching
it could be as simple as making a copy of the database after the
fixtures are installed, then using that copy to run tests.

The complications are: 1) in any interesting test suite, there isn't a
single outcome of a+b, because different tests will have different
fixtures and perhaps even different models, so a number of copies will
have to be captured. 2) As with any caching scheme, invalidation is
important and tricky. In the normal course of development, how will
these cached copies of the database be invalidated and recreated?
Perhaps this isn't so bad, it's roughly analogous to writing migrations,
which we know how to deal with.


Invalidation is what I'm unsure about too -- multiple ideas came to my 
mind, all involving some sort of Great Hash(tm):


1) Use file modification timestamps for all model and test related files.
Advantages: simple, works.
Disadvantages: Triggers cache invalidation for changes not related to 
models or tests


2) #1 but do hash the model definitions (at Python level)
Advantages: no cache invalidation on non-model changes.
Disadvantages: tricky, triggers cache invalidation for changes not 
related to tests


3) Hash the SQL generated for setup/fixtures. (step in right before the 
SQL is sent to the database)

Advantages: No false-positives, simple
Disadvantages: Does not eliminate the need for SQL generation and 
fixture parsing + model creation, so this might not be the "highest of 
highs" ;-)


Jonas

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Ned Batchelder

On 5/16/2011 11:18 PM, Erik Rose wrote:

How about caching the test databases? The database state could be cached after 
model setup (which takes some time if you've got lots of them) + initial data 
fixture setup, and after the setup for each test case (fixtures + setUp() 
method).

So, in the best case, no database setup is required at all to run tests -- 
which encourages test driven development :-)

So that would be 11 separate DBs for our tests, and you'd just switch between 
them? Interesting idea. Or are you proposing caching the results of queries for 
each test class, essentially mocking out the DB?

I'd been thinking recently about this as well: when you consider all the 
test runs, they're very repetitive.  Every time the tests are run, they 
go through the same set of steps: a) create database, b) install 
fixtures, c) run tests.  Steps a, b, and c take too long.  Step c is 
what we're really interested in, and almost always, steps a and b have 
the same outcome as the last time we ran them.  We all know what to do 
if an operation takes too long and usually is the same as last time: 
cache its outcome.  The outcome in this case is the state of the 
database.  Caching it could be as simple as making a copy of the 
database after the fixtures are installed, then using that copy to run 
tests.


The complications are: 1) in any interesting test suite, there isn't a 
single outcome of a+b, because different tests will have different 
fixtures and perhaps even different models, so a number of copies will 
have to be captured.  2) As with any caching scheme, invalidation is 
important and tricky.  In the normal course of development, how will 
these cached copies of the database be invalidated and recreated?  
Perhaps this isn't so bad, it's roughly analogous to writing migrations, 
which we know how to deal with.


I don't have any code to do this, but I envision a set of test 
databases, with a modified test runner than knows how to cycle among 
them by manipulating settings.DATABASES to use the proper one for each 
test class.  I'd be glad to help build such a thing, and may be working 
toward it myself.


--Ned.


Perhaps some numbers would illuminate: I clock the total setup and teardown 
time for support.mozilla.com's 1064 tests at 2.59 seconds after my 
optimizations. So I'm pretty happy with that. :-) CPU use for the test run is 
76% according to the `time` commands, so there's a little more I/O to kill but 
not much.

Cheers,
Erik



--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-17 Thread Jani Tiainen
On Fri, 2011-05-13 at 16:57 -0700, Erik Rose wrote:
> tl;dr: I've written an alternative TestCase base class which makes 
> fixture-using tests much more I/O efficient on transactional DBs, and I'd 
> like to upstream it.
> 
> Greetings, all! This is my first django-dev post, so please be gentle. :-) I 
> hack on support.mozilla.com, a fairly large Django site with about 1000 
> tests. Those tests make heavy use of fixtures and, as a result, used to take 
> over 5 minutes to run. So, I spent a few days seeing if I could cut the 
> amount of DB I/O needed. Ultimately, I got the run down to just over 1 
> minute, and almost all of those gains are translatable to any Django site 
> running against a transactional DB. No changes to the apps themselves are 
> needed. I'd love to push some of this work upstream, if there's interest (or 
> even lack of opposition ;-)).
> 
> The speedups came from 3 main optimizations:
> 
> 1. Class-level fixture setup
> 
> Given a transaction DB, there's no reason to reload fixtures via dozens of 
> SQL statements before every test. I made use of setup_class() and 
> teardown_class() (yay, unittest2!) to change the flow for TestCase-using 
> tests to this:
> a. Load the fixtures at the top of the class, and commit.
> b. Run a test.
> c. Roll back, returning to pristine fixtures. Go back to step b.
> d. At class teardown, figure out which tables the fixtures loaded into, 
> and expressly clear out what was added.
> 
> Before this optimization: 302s to run the suite
> After: 97s.
> 
> Before: 37,583 queries
> After: 4,116
> 
> On top of that, an additional 4s was saved by reusing a single connection 
> rather than opening and closing them all the time, bringing the final number 
> down to 93s. (We can get away with this because we're committing any 
> on-cursor-initialization setup, whereas the old TestCase rolled it back.)
> 
> Here's the code: 
> https://github.com/erikrose/test-utils/blob/master/test_utils/__init__.py#L121.
>  I'd love to generalize it a bit (to fall back to the old behavior with 
> non-transactional backends, for example) and offer it as a patch to Django 
> proper, replacing TestCase. Thoughts?
> 
> (If you notice that copy-and-paste of loaddata sitting off to the side in 
> another module, don't fret; in the patch, that would turn into a refactoring 
> of loaddata to make the computation of the fixture-referenced tables 
> separately reusable.)
> 
> 
> 2. Fixture grouping
> 
> I next observed that many test classes reused the same sets of fixtures, 
> often via subclassing. After the previous optimization, our tests still 
> loaded fixtures 114 times, even though there were only 11 distinct sets of 
> them. So, I thought: why not write a custom testrunner that buckets the 
> classes by fixture set and advises the classes that, unless they're the first 
> or last in a bucket, they shouldn't bother tearing down or setting up the 
> fixtures, respectively? This took the form of a custom nose plugin (we use 
> nose for all our Django stuff), and it took another quarter off the test run:
> 
> Before: 97s
> After: 74s
> 
> Of course, test independence is still preserved. We're just factoring out 
> pointlessly repeated setup.
> 
> I don't really have plans to upstream this unless someone calls for it, but 
> I'll be making it available soon, likely as part of django-nose.
> 
> 
> 3. Startup optimizations
> 
> At this point, it was bothering me that, just to run a single test, I had to 
> wait through 15s of DB initialization (mostly auth_permissions and 
> django_content_type population)—stuff which was already perfectly valid from 
> the previous test run. So, building on some work we had already done in this 
> direction, I decided to skip the teardown of the test DB and, symmetrically, 
> the setup on future runs. If you make schema changes, just set an env var, 
> and it wipes and remakes the DB like usual. I could see pushing this into 
> django-nose as well, but it's got the hackiest implementation and can 
> potentially confuse users. I mention it for completeness.
> 
> Before: startup time 15s
> After: 3s (There's quite a wide variance due to I/O caching luck.)
> 
> Code: https://github.com/erikrose/test-utils/commit/b95a1b7
> 
> 
> If you read this far, you get a cookie! I welcome your feedback on merging 
> optimization #1 into core, as well as any accusations of insanity re: #2 and 
> #3. FWIW, everything works great without touching any of the tests on 3 of 
> our Django sites, totaling over 2000 tests.

I would be very happy to test this against Oracle database to see is how
much patch improves speed since previously running tests against Oracle
has been a real pain specially all db recreate stuff took a long long
time.

-- 

Jani Tiainen


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from 

Re: Test optimizations (2-5x as fast)

2011-05-16 Thread Erik Rose
> Regarding the signals, basically we have a bunch of post_save type
> things, which tend to store aggregate data for certain conditions.
> These get populated (in some cases) in our tests, but don't correspond
> to a fixture or a model in the same app.

Ah, gotcha. So, a couple solutions off the top of my head: either just punt and 
truncate everything (except auth_permission and django_content_type). That 
would be a shame, since it takes awhile. OTOH, if you use fixture grouping 
(optimization #2), that's a lot more tolerable. Or maybe we could  monitor what 
tables are getting insertions somehow. I'll give it some thought.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-16 Thread David Cramer
Postgres requires resetting the sequences I believe. I just assume
Oracle/MSSQL are probably similar.

Regarding the signals, basically we have a bunch of post_save type
things, which tend to store aggregate data for certain conditions.
These get populated (in some cases) in our tests, but don't correspond
to a fixture or a model in the same app.
--
David Cramer
http://justcramer.com



On Mon, May 16, 2011 at 9:09 PM, Erik Rose  wrote:
> Woo, thanks for the constructive suggestions!
>
>> Also, one thing I'm quickly noticing (I'm a bit confused why its
>> setup_class and not setUpClass as well),
>
> I was writing to nose's hooks; didn't realize Django used unittest2 now!
>
>> but this wont work with
>> postgres without changing the DELETE code to work like the test
>> runner's TRUNCATE foo, bar; (due to foreign key constraints).
>
> Absolutely. I assume this is what you fix below
>
>> You can do something like this to handle the
>> flushing:
>>
>>                    sql_list = connection.ops.sql_flush(no_style(),
>> tables, connection.introspection.sequence_list())
>>                    for sql in sql_list:
>>                        cursor.execute(sql)
>
> Brilliant! Thanks! Say, can you think of any backends in which you actually 
> have to reset the sequences after truncating? That seems like an interesting 
> decoupling to me. MySQL, anyway, does the reset implicitly; perhaps we can 
> optimize its sql_flush routine.
>
>> Unfortunately, you're still reliant that nothing was created with
>> signals that uses constraints. For us this is very common, and I can't
>> imagine we're an edge case there
>
> Can you tell me of these signals? Which ones? I don't think we use them, but 
> I don't want to overlook them.
>
> Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-16 Thread Erik Rose
Woo, thanks for the constructive suggestions!

> Also, one thing I'm quickly noticing (I'm a bit confused why its
> setup_class and not setUpClass as well),

I was writing to nose's hooks; didn't realize Django used unittest2 now!

> but this wont work with
> postgres without changing the DELETE code to work like the test
> runner's TRUNCATE foo, bar; (due to foreign key constraints).

Absolutely. I assume this is what you fix below

> You can do something like this to handle the
> flushing:
> 
>sql_list = connection.ops.sql_flush(no_style(),
> tables, connection.introspection.sequence_list())
>for sql in sql_list:
>cursor.execute(sql)

Brilliant! Thanks! Say, can you think of any backends in which you actually 
have to reset the sequences after truncating? That seems like an interesting 
decoupling to me. MySQL, anyway, does the reset implicitly; perhaps we can 
optimize its sql_flush routine.

> Unfortunately, you're still reliant that nothing was created with
> signals that uses constraints. For us this is very common, and I can't
> imagine we're an edge case there

Can you tell me of these signals? Which ones? I don't think we use them, but I 
don't want to overlook them.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-16 Thread Erik Rose
> How about caching the test databases? The database state could be cached 
> after model setup (which takes some time if you've got lots of them) + 
> initial data fixture setup, and after the setup for each test case (fixtures 
> + setUp() method).
> 
> So, in the best case, no database setup is required at all to run tests -- 
> which encourages test driven development :-)

So that would be 11 separate DBs for our tests, and you'd just switch between 
them? Interesting idea. Or are you proposing caching the results of queries for 
each test class, essentially mocking out the DB?

Perhaps some numbers would illuminate: I clock the total setup and teardown 
time for support.mozilla.com's 1064 tests at 2.59 seconds after my 
optimizations. So I'm pretty happy with that. :-) CPU use for the test run is 
76% according to the `time` commands, so there's a little more I/O to kill but 
not much.

Cheers,
Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-16 Thread Jacob Kaplan-Moss
On Mon, May 16, 2011 at 7:36 PM, Erik Rose  wrote:
> Toward that, should I work up a Django patch, or would the core team rather I 
> release my work as a pluggable package?

Patch, please! Fast is good :)

Jacob

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-16 Thread Erik Rose
Ahoy, Alex! Thanks for the quick response.

> 1. Class-level fixture setup
> 
>> This is the one I'm most interested.  I did a patch a number of months ago 
>> to do the fixture parsing, but not DB insertion on a per-class basis.  I 
>> didn't find that to be a big win.  However, I'm going to be working on a 
>> patch to do bulk inserts (that is a single execute/executemany call for all 
>> objects to be inserted), which could be a big win for fixture loading, so 
>> I'd kind of like to do that first, to see how big a win this is after that.  
>> This is obviously more specialized, and invasive (IMO), so if we can get 
>> most of the win without it that might be good enough.

Could you explain what you mean by "to do the fixture parsing"? Did you try to 
speed up or cache the JSON parsing or something?

It's the per-class setup that yielded the biggest win for 2 of our largest 
sites: support.mozilla.com and addons.mozilla.com. We have (unsurprisingly) 
several tests in each class, and this avoids redoing all the I/O for each test. 
(CPU is practically free in an I/O-using situation like this, so I went 
straight for the disk writes.)

As for bulk inserts, that would be great to have as well! I'd be surprised if 
they were a huge win, since, in an MVCC DB, the writes typically happen on 
commit, and there wouldn't be any fewer of those. On the other hand, it's 
another way to cut traffic, and engines like MyISAM should benefit more, since 
they commit immediately. I'd love to see numbers on it. Have you had a chance 
to bench it?

> Speeding up tests is defintely of interest to me, so thanks for the great 
> work!

You're welcome! I scratched a personal itch, but I hope others can benefit as 
well.

Toward that, should I work up a Django patch, or would the core team rather I 
release my work as a pluggable package? I realistically have the time to do 
only one.

Cheers,
Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-14 Thread Jonas H.

On 05/14/2011 04:26 PM, Jonas H. wrote:

I believe there's no generalized way of creating databases in Django
now, so that would have to be added.


s/creating/copying/

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-14 Thread Jonas H.
How about caching the test databases? The database state could be cached 
after model setup (which takes some time if you've got lots of them) + 
initial data fixture setup, and after the setup for each test case 
(fixtures + setUp() method).


So, in the best case, no database setup is required at all to run tests 
-- which encourages test driven development :-)


I believe there's no generalized way of creating databases in Django 
now, so that would have to be added.


I'd love to hack on that :-)

Jonas

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Test optimizations (2-5x as fast)

2011-05-13 Thread David Cramer
More quick notes. You can do something like this to handle the
flushing:

sql_list = connection.ops.sql_flush(no_style(),
tables, connection.introspection.sequence_list())
for sql in sql_list:
cursor.execute(sql)

Unfortunately, you're still reliant that nothing was created with
signals that uses constraints. For us this is very common, and I can't
imagine we're an edge case there

On May 13, 9:42 pm, David Cramer  wrote:
> You sir, are my personal hero for the day :)
>
> We had also been looking at how we could speed up the fixture loading
> (we were almost ready to go so far as to make one giant fixture that
> just loaded at the start of the test runner). This is awesome progress
>
> On May 13, 4:57 pm, Erik Rose  wrote:
>
>
>
>
>
>
>
> > tl;dr: I've written an alternative TestCase base class which makes 
> > fixture-using tests much more I/O efficient on transactional DBs, and I'd 
> > like to upstream it.
>
> > Greetings, all! This is my first django-dev post, so please be gentle. :-) 
> > I hack on support.mozilla.com, a fairly large Django site with about 1000 
> > tests. Those tests make heavy use of fixtures and, as a result, used to 
> > take over 5 minutes to run. So, I spent a few days seeing if I could cut 
> > the amount of DB I/O needed. Ultimately, I got the run down to just over 1 
> > minute, and almost all of those gains are translatable to any Django site 
> > running against a transactional DB. No changes to the apps themselves are 
> > needed. I'd love to push some of this work upstream, if there's interest 
> > (or even lack of opposition ;-)).
>
> > The speedups came from 3 main optimizations:
>
> > 1. Class-level fixture setup
>
> > Given a transaction DB, there's no reason to reload fixtures via dozens of 
> > SQL statements before every test. I made use of setup_class() and 
> > teardown_class() (yay, unittest2!) to change the flow for TestCase-using 
> > tests to this:
> >     a. Load the fixtures at the top of the class, and commit.
> >     b. Run a test.
> >     c. Roll back, returning to pristine fixtures. Go back to step b.
> >     d. At class teardown, figure out which tables the fixtures loaded into, 
> > and expressly clear out what was added.
>
> > Before this optimization: 302s to run the suite
> > After: 97s.
>
> > Before: 37,583 queries
> > After: 4,116
>
> > On top of that, an additional 4s was saved by reusing a single connection 
> > rather than opening and closing them all the time, bringing the final 
> > number down to 93s. (We can get away with this because we're committing any 
> > on-cursor-initialization setup, whereas the old TestCase rolled it back.)
>
> > Here's the 
> > code:https://github.com/erikrose/test-utils/blob/master/test_utils/__init_
> >  I'd love to generalize it a bit (to fall back to the old behavior with 
> > non-transactional backends, for example) and offer it as a patch to Django 
> > proper, replacing TestCase. Thoughts?
>
> > (If you notice that copy-and-paste of loaddata sitting off to the side in 
> > another module, don't fret; in the patch, that would turn into a 
> > refactoring of loaddata to make the computation of the fixture-referenced 
> > tables separately reusable.)
>
> > 2. Fixture grouping
>
> > I next observed that many test classes reused the same sets of fixtures, 
> > often via subclassing. After the previous optimization, our tests still 
> > loaded fixtures 114 times, even though there were only 11 distinct sets of 
> > them. So, I thought: why not write a custom testrunner that buckets the 
> > classes by fixture set and advises the classes that, unless they're the 
> > first or last in a bucket, they shouldn't bother tearing down or setting up 
> > the fixtures, respectively? This took the form of a custom nose plugin (we 
> > use nose for all our Django stuff), and it took another quarter off the 
> > test run:
>
> > Before: 97s
> > After: 74s
>
> > Of course, test independence is still preserved. We're just factoring out 
> > pointlessly repeated setup.
>
> > I don't really have plans to upstream this unless someone calls for it, but 
> > I'll be making it available soon, likely as part of django-nose.
>
> > 3. Startup optimizations
>
> > At this point, it was bothering me that, just to run a single test, I had 
> > to wait through 15s of DB initialization (mostly auth_permissions and 
> > django_content_type population)—stuff which was already perfectly valid 
> > from the previous test run. So, building on some work we had already done 
> > in this direction, I decided to skip the teardown of the test DB and, 
> > symmetrically, the setup on future runs. If you make schema changes, just 
> > set an env var, and it wipes and remakes the DB like usual. I could see 
> > pushing this into django-nose as well, but it's got the hackiest 
> > implementation and can potentially confuse users. I mention it for 
> > 

Re: Test optimizations (2-5x as fast)

2011-05-13 Thread David Cramer
Also, one thing I'm quickly noticing (I'm a bit confused why its
setup_class and not setUpClass as well), but this wont work with
postgres without changing the DELETE code to work like the test
runner's TRUNCATE foo, bar; (due to foreign key constraints).

On May 13, 9:42 pm, David Cramer  wrote:
> You sir, are my personal hero for the day :)
>
> We had also been looking at how we could speed up the fixture loading
> (we were almost ready to go so far as to make one giant fixture that
> just loaded at the start of the test runner). This is awesome progress
>
> On May 13, 4:57 pm, Erik Rose  wrote:
>
>
>
>
>
>
>
> > tl;dr: I've written an alternative TestCase base class which makes 
> > fixture-using tests much more I/O efficient on transactional DBs, and I'd 
> > like to upstream it.
>
> > Greetings, all! This is my first django-dev post, so please be gentle. :-) 
> > I hack on support.mozilla.com, a fairly large Django site with about 1000 
> > tests. Those tests make heavy use of fixtures and, as a result, used to 
> > take over 5 minutes to run. So, I spent a few days seeing if I could cut 
> > the amount of DB I/O needed. Ultimately, I got the run down to just over 1 
> > minute, and almost all of those gains are translatable to any Django site 
> > running against a transactional DB. No changes to the apps themselves are 
> > needed. I'd love to push some of this work upstream, if there's interest 
> > (or even lack of opposition ;-)).
>
> > The speedups came from 3 main optimizations:
>
> > 1. Class-level fixture setup
>
> > Given a transaction DB, there's no reason to reload fixtures via dozens of 
> > SQL statements before every test. I made use of setup_class() and 
> > teardown_class() (yay, unittest2!) to change the flow for TestCase-using 
> > tests to this:
> >     a. Load the fixtures at the top of the class, and commit.
> >     b. Run a test.
> >     c. Roll back, returning to pristine fixtures. Go back to step b.
> >     d. At class teardown, figure out which tables the fixtures loaded into, 
> > and expressly clear out what was added.
>
> > Before this optimization: 302s to run the suite
> > After: 97s.
>
> > Before: 37,583 queries
> > After: 4,116
>
> > On top of that, an additional 4s was saved by reusing a single connection 
> > rather than opening and closing them all the time, bringing the final 
> > number down to 93s. (We can get away with this because we're committing any 
> > on-cursor-initialization setup, whereas the old TestCase rolled it back.)
>
> > Here's the 
> > code:https://github.com/erikrose/test-utils/blob/master/test_utils/__init_
> >  I'd love to generalize it a bit (to fall back to the old behavior with 
> > non-transactional backends, for example) and offer it as a patch to Django 
> > proper, replacing TestCase. Thoughts?
>
> > (If you notice that copy-and-paste of loaddata sitting off to the side in 
> > another module, don't fret; in the patch, that would turn into a 
> > refactoring of loaddata to make the computation of the fixture-referenced 
> > tables separately reusable.)
>
> > 2. Fixture grouping
>
> > I next observed that many test classes reused the same sets of fixtures, 
> > often via subclassing. After the previous optimization, our tests still 
> > loaded fixtures 114 times, even though there were only 11 distinct sets of 
> > them. So, I thought: why not write a custom testrunner that buckets the 
> > classes by fixture set and advises the classes that, unless they're the 
> > first or last in a bucket, they shouldn't bother tearing down or setting up 
> > the fixtures, respectively? This took the form of a custom nose plugin (we 
> > use nose for all our Django stuff), and it took another quarter off the 
> > test run:
>
> > Before: 97s
> > After: 74s
>
> > Of course, test independence is still preserved. We're just factoring out 
> > pointlessly repeated setup.
>
> > I don't really have plans to upstream this unless someone calls for it, but 
> > I'll be making it available soon, likely as part of django-nose.
>
> > 3. Startup optimizations
>
> > At this point, it was bothering me that, just to run a single test, I had 
> > to wait through 15s of DB initialization (mostly auth_permissions and 
> > django_content_type population)—stuff which was already perfectly valid 
> > from the previous test run. So, building on some work we had already done 
> > in this direction, I decided to skip the teardown of the test DB and, 
> > symmetrically, the setup on future runs. If you make schema changes, just 
> > set an env var, and it wipes and remakes the DB like usual. I could see 
> > pushing this into django-nose as well, but it's got the hackiest 
> > implementation and can potentially confuse users. I mention it for 
> > completeness.
>
> > Before: startup time 15s
> > After: 3s (There's quite a wide variance due to I/O caching luck.)
>
> > Code:https://github.com/erikrose/test-utils/commit/b95a1b7
>
> > If 

Re: Test optimizations (2-5x as fast)

2011-05-13 Thread David Cramer
You sir, are my personal hero for the day :)

We had also been looking at how we could speed up the fixture loading
(we were almost ready to go so far as to make one giant fixture that
just loaded at the start of the test runner). This is awesome progress

On May 13, 4:57 pm, Erik Rose  wrote:
> tl;dr: I've written an alternative TestCase base class which makes 
> fixture-using tests much more I/O efficient on transactional DBs, and I'd 
> like to upstream it.
>
> Greetings, all! This is my first django-dev post, so please be gentle. :-) I 
> hack on support.mozilla.com, a fairly large Django site with about 1000 
> tests. Those tests make heavy use of fixtures and, as a result, used to take 
> over 5 minutes to run. So, I spent a few days seeing if I could cut the 
> amount of DB I/O needed. Ultimately, I got the run down to just over 1 
> minute, and almost all of those gains are translatable to any Django site 
> running against a transactional DB. No changes to the apps themselves are 
> needed. I'd love to push some of this work upstream, if there's interest (or 
> even lack of opposition ;-)).
>
> The speedups came from 3 main optimizations:
>
> 1. Class-level fixture setup
>
> Given a transaction DB, there's no reason to reload fixtures via dozens of 
> SQL statements before every test. I made use of setup_class() and 
> teardown_class() (yay, unittest2!) to change the flow for TestCase-using 
> tests to this:
>     a. Load the fixtures at the top of the class, and commit.
>     b. Run a test.
>     c. Roll back, returning to pristine fixtures. Go back to step b.
>     d. At class teardown, figure out which tables the fixtures loaded into, 
> and expressly clear out what was added.
>
> Before this optimization: 302s to run the suite
> After: 97s.
>
> Before: 37,583 queries
> After: 4,116
>
> On top of that, an additional 4s was saved by reusing a single connection 
> rather than opening and closing them all the time, bringing the final number 
> down to 93s. (We can get away with this because we're committing any 
> on-cursor-initialization setup, whereas the old TestCase rolled it back.)
>
> Here's the 
> code:https://github.com/erikrose/test-utils/blob/master/test_utils/__init_
>  I'd love to generalize it a bit (to fall back to the old behavior with 
> non-transactional backends, for example) and offer it as a patch to Django 
> proper, replacing TestCase. Thoughts?
>
> (If you notice that copy-and-paste of loaddata sitting off to the side in 
> another module, don't fret; in the patch, that would turn into a refactoring 
> of loaddata to make the computation of the fixture-referenced tables 
> separately reusable.)
>
> 2. Fixture grouping
>
> I next observed that many test classes reused the same sets of fixtures, 
> often via subclassing. After the previous optimization, our tests still 
> loaded fixtures 114 times, even though there were only 11 distinct sets of 
> them. So, I thought: why not write a custom testrunner that buckets the 
> classes by fixture set and advises the classes that, unless they're the first 
> or last in a bucket, they shouldn't bother tearing down or setting up the 
> fixtures, respectively? This took the form of a custom nose plugin (we use 
> nose for all our Django stuff), and it took another quarter off the test run:
>
> Before: 97s
> After: 74s
>
> Of course, test independence is still preserved. We're just factoring out 
> pointlessly repeated setup.
>
> I don't really have plans to upstream this unless someone calls for it, but 
> I'll be making it available soon, likely as part of django-nose.
>
> 3. Startup optimizations
>
> At this point, it was bothering me that, just to run a single test, I had to 
> wait through 15s of DB initialization (mostly auth_permissions and 
> django_content_type population)—stuff which was already perfectly valid from 
> the previous test run. So, building on some work we had already done in this 
> direction, I decided to skip the teardown of the test DB and, symmetrically, 
> the setup on future runs. If you make schema changes, just set an env var, 
> and it wipes and remakes the DB like usual. I could see pushing this into 
> django-nose as well, but it's got the hackiest implementation and can 
> potentially confuse users. I mention it for completeness.
>
> Before: startup time 15s
> After: 3s (There's quite a wide variance due to I/O caching luck.)
>
> Code:https://github.com/erikrose/test-utils/commit/b95a1b7
>
> If you read this far, you get a cookie! I welcome your feedback on merging 
> optimization #1 into core, as well as any accusations of insanity re: #2 and 
> #3. FWIW, everything works great without touching any of the tests on 3 of 
> our Django sites, totaling over 2000 tests.
>
> Best regards and wishes for a happy weekend,
> Erik Rose
> support.mozilla.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, 

Re: Test optimizations (2-5x as fast)

2011-05-13 Thread Alex Gaynor
On Fri, May 13, 2011 at 6:57 PM, Erik Rose  wrote:

> tl;dr: I've written an alternative TestCase base class which makes
> fixture-using tests much more I/O efficient on transactional DBs, and I'd
> like to upstream it.
>
> Greetings, all! This is my first django-dev post, so please be gentle. :-)
> I hack on support.mozilla.com, a fairly large Django site with about 1000
> tests. Those tests make heavy use of fixtures and, as a result, used to take
> over 5 minutes to run. So, I spent a few days seeing if I could cut the
> amount of DB I/O needed. Ultimately, I got the run down to just over 1
> minute, and almost all of those gains are translatable to any Django site
> running against a transactional DB. No changes to the apps themselves are
> needed. I'd love to push some of this work upstream, if there's interest (or
> even lack of opposition ;-)).
>
> The speedups came from 3 main optimizations:
>
> 1. Class-level fixture setup
>
> Given a transaction DB, there's no reason to reload fixtures via dozens of
> SQL statements before every test. I made use of setup_class() and
> teardown_class() (yay, unittest2!) to change the flow for TestCase-using
> tests to this:
>a. Load the fixtures at the top of the class, and commit.
>b. Run a test.
>c. Roll back, returning to pristine fixtures. Go back to step b.
>d. At class teardown, figure out which tables the fixtures loaded into,
> and expressly clear out what was added.
>
> Before this optimization: 302s to run the suite
> After: 97s.
>
> Before: 37,583 queries
> After: 4,116
>
> On top of that, an additional 4s was saved by reusing a single connection
> rather than opening and closing them all the time, bringing the final number
> down to 93s. (We can get away with this because we're committing any
> on-cursor-initialization setup, whereas the old TestCase rolled it back.)
>
> Here's the code:
> https://github.com/erikrose/test-utils/blob/master/test_utils/__init__.py#L121.
> I'd love to generalize it a bit (to fall back to the old behavior with
> non-transactional backends, for example) and offer it as a patch to Django
> proper, replacing TestCase. Thoughts?
>
> (If you notice that copy-and-paste of loaddata sitting off to the side in
> another module, don't fret; in the patch, that would turn into a refactoring
> of loaddata to make the computation of the fixture-referenced tables
> separately reusable.)
>
>
This is the one I'm most interested.  I did a patch a number of months ago
to do the fixture parsing, but not DB insertion on a per-class basis.  I
didn't find that to be a big win.  However, I'm going to be working on a
patch to do bulk inserts (that is a single execute/executemany call for all
objects to be inserted), which could be a big win for fixture loading, so
I'd kind of like to do that first, to see how big a win this is after that.
This is obviously more specialized, and invasive (IMO), so if we can get
most of the win without it that might be good enough.


>
> 2. Fixture grouping
>
> I next observed that many test classes reused the same sets of fixtures,
> often via subclassing. After the previous optimization, our tests still
> loaded fixtures 114 times, even though there were only 11 distinct sets of
> them. So, I thought: why not write a custom testrunner that buckets the
> classes by fixture set and advises the classes that, unless they're the
> first or last in a bucket, they shouldn't bother tearing down or setting up
> the fixtures, respectively? This took the form of a custom nose plugin (we
> use nose for all our Django stuff), and it took another quarter off the test
> run:
>
> Before: 97s
> After: 74s
>
> Of course, test independence is still preserved. We're just factoring out
> pointlessly repeated setup.
>
> I don't really have plans to upstream this unless someone calls for it, but
> I'll be making it available soon, likely as part of django-nose.
>
>
No particular thoughts at the moment.


> 3. Startup optimizations
>
> At this point, it was bothering me that, just to run a single test, I had
> to wait through 15s of DB initialization (mostly auth_permissions and
> django_content_type population)—stuff which was already perfectly valid from
> the previous test run. So, building on some work we had already done in this
> direction, I decided to skip the teardown of the test DB and, symmetrically,
> the setup on future runs. If you make schema changes, just set an env var,
> and it wipes and remakes the DB like usual. I could see pushing this into
> django-nose as well, but it's got the hackiest implementation and can
> potentially confuse users. I mention it for completeness.
>
> Before: startup time 15s
> After: 3s (There's quite a wide variance due to I/O caching luck.)
>
> Code: https://github.com/erikrose/test-utils/commit/b95a1b7
>
>
This is another thing that I think we can get most of the win from doing the
bulk inserts.  Given this looks rather specialized (and if you have other

Test optimizations (2-5x as fast)

2011-05-13 Thread Erik Rose
tl;dr: I've written an alternative TestCase base class which makes 
fixture-using tests much more I/O efficient on transactional DBs, and I'd like 
to upstream it.

Greetings, all! This is my first django-dev post, so please be gentle. :-) I 
hack on support.mozilla.com, a fairly large Django site with about 1000 tests. 
Those tests make heavy use of fixtures and, as a result, used to take over 5 
minutes to run. So, I spent a few days seeing if I could cut the amount of DB 
I/O needed. Ultimately, I got the run down to just over 1 minute, and almost 
all of those gains are translatable to any Django site running against a 
transactional DB. No changes to the apps themselves are needed. I'd love to 
push some of this work upstream, if there's interest (or even lack of 
opposition ;-)).

The speedups came from 3 main optimizations:

1. Class-level fixture setup

Given a transaction DB, there's no reason to reload fixtures via dozens of SQL 
statements before every test. I made use of setup_class() and teardown_class() 
(yay, unittest2!) to change the flow for TestCase-using tests to this:
a. Load the fixtures at the top of the class, and commit.
b. Run a test.
c. Roll back, returning to pristine fixtures. Go back to step b.
d. At class teardown, figure out which tables the fixtures loaded into, and 
expressly clear out what was added.

Before this optimization: 302s to run the suite
After: 97s.

Before: 37,583 queries
After: 4,116

On top of that, an additional 4s was saved by reusing a single connection 
rather than opening and closing them all the time, bringing the final number 
down to 93s. (We can get away with this because we're committing any 
on-cursor-initialization setup, whereas the old TestCase rolled it back.)

Here's the code: 
https://github.com/erikrose/test-utils/blob/master/test_utils/__init__.py#L121. 
I'd love to generalize it a bit (to fall back to the old behavior with 
non-transactional backends, for example) and offer it as a patch to Django 
proper, replacing TestCase. Thoughts?

(If you notice that copy-and-paste of loaddata sitting off to the side in 
another module, don't fret; in the patch, that would turn into a refactoring of 
loaddata to make the computation of the fixture-referenced tables separately 
reusable.)


2. Fixture grouping

I next observed that many test classes reused the same sets of fixtures, often 
via subclassing. After the previous optimization, our tests still loaded 
fixtures 114 times, even though there were only 11 distinct sets of them. So, I 
thought: why not write a custom testrunner that buckets the classes by fixture 
set and advises the classes that, unless they're the first or last in a bucket, 
they shouldn't bother tearing down or setting up the fixtures, respectively? 
This took the form of a custom nose plugin (we use nose for all our Django 
stuff), and it took another quarter off the test run:

Before: 97s
After: 74s

Of course, test independence is still preserved. We're just factoring out 
pointlessly repeated setup.

I don't really have plans to upstream this unless someone calls for it, but 
I'll be making it available soon, likely as part of django-nose.


3. Startup optimizations

At this point, it was bothering me that, just to run a single test, I had to 
wait through 15s of DB initialization (mostly auth_permissions and 
django_content_type population)—stuff which was already perfectly valid from 
the previous test run. So, building on some work we had already done in this 
direction, I decided to skip the teardown of the test DB and, symmetrically, 
the setup on future runs. If you make schema changes, just set an env var, and 
it wipes and remakes the DB like usual. I could see pushing this into 
django-nose as well, but it's got the hackiest implementation and can 
potentially confuse users. I mention it for completeness.

Before: startup time 15s
After: 3s (There's quite a wide variance due to I/O caching luck.)

Code: https://github.com/erikrose/test-utils/commit/b95a1b7


If you read this far, you get a cookie! I welcome your feedback on merging 
optimization #1 into core, as well as any accusations of insanity re: #2 and 
#3. FWIW, everything works great without touching any of the tests on 3 of our 
Django sites, totaling over 2000 tests.

Best regards and wishes for a happy weekend,
Erik Rose
support.mozilla.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.