Re: [GENERAL] Testing Technique when using a DB

2013-03-16 Thread Jasen Betts
On 2013-03-13, Joe Van Dyk j...@tanga.com wrote:
 --047d7b6226a405604904d7d09001
 Content-Type: text/plain; charset=UTF-8

 On Wed, Mar 13, 2013 at 8:47 AM, Steve Crawford 
 scrawf...@pinpointresearch.com wrote:

 On 03/12/2013 09:05 PM, Perry Smith wrote:

 To all who replied:

 Thank you. ...


 I had not seriously considered pg_dump / pg_restore because I assumed it
 would be fairly slow but I will experiment with pg_restore and template
 techniques this weekend and see which ones prove viable.


 Another possibility a bit outside my area of expertise but what about a VM
 image configured to your needs that you just spin up as needed then discard
 when done (i.e. always spinning up the same starting image)?


 I'd guess the OP is running hundreds of tests, where the data needs to be
 reverted/reset after each test, and each individual test might run in, say,
 0.1 seconds. This is a really common technique when testing web apps. I
 don't think you'd want to start a VM for each of these tests, especially
 when the tests are small and specific.

A vm rewinds to a snapshot in a few seconds this will likely be faster than
any other way* if the database is large.  

*except possibly a similar trick using ZFS snapshots may be faster.  


-- 
⚂⚃ 100% natural



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Testing Technique when using a DB

2013-03-13 Thread Steven Schlansker

On Mar 12, 2013, at 8:09 PM, Joe Van Dyk j...@tanga.com wrote:

 On Mar 12, 2013, at 8:42 AM, Perry Smith pedz...@gmail.com wrote:
 
 
 The other thought is perhaps there is a snap shot type concept.  I don't 
 see it in the list of SQL commands.  A snap shot would do exactly what it 
 sounds like.  It would take a snap shot and save it somehow.  Then a 
 restore to snap shot would restore the DB back to that state. 
 
 This would be super super super awesome, but it doesn't exist as far as I 
 know. This would be a permanent snapshot that could be easily and quickly 
 restored. 
 
 I wonder if it would be possible to make an extension that made this easy to 
 do. 

For what it's worth, I have something very similar to that to write JUnit tests 
in Java.
While that might not be much help to the original poster's Ruby environment, 
the code is
available as open source for anyone who thinks this is an interesting idea.

https://github.com/NessComputing/components-ness-pg

We got tired of maintaining a Postgres instance on every development box and 
trying to make sure they had reasonably close versions, so I hacked this 
together.

It's fairly specific to the infrastructure we use but if there is any interest 
I could spruce it up a bit, write some documentation, and make it more 
stand-alone.

Short description:

* Drop your SQL files in the class path somewhere (src/test/resources for Maven 
folk)
* Provides an embedded PostgreSQL instance that you can spin up and down on 
demand -- builds via shell script, auto detects platform
* The SQL files are loaded into a Postgres template database, and then cloning 
the databases can be done relatively cheaply and quickly.
* A fresh database is prepared in a thread in the background so one is always 
immediately available to test cases
* Provides a handy JUnit @Rule so you can mix it in to test cases easily




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Testing Technique when using a DB

2013-03-13 Thread Steve Crawford

On 03/12/2013 09:05 PM, Perry Smith wrote:

To all who replied:

Thank you. ...

I had not seriously considered pg_dump / pg_restore because I assumed 
it would be fairly slow but I will experiment with pg_restore and 
template techniques this weekend and see which ones prove viable.


Another possibility a bit outside my area of expertise but what about a 
VM image configured to your needs that you just spin up as needed then 
discard when done (i.e. always spinning up the same starting image)?


Cheers,
Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Testing Technique when using a DB

2013-03-13 Thread Joe Van Dyk
On Wed, Mar 13, 2013 at 8:47 AM, Steve Crawford 
scrawf...@pinpointresearch.com wrote:

 On 03/12/2013 09:05 PM, Perry Smith wrote:

 To all who replied:

 Thank you. ...


 I had not seriously considered pg_dump / pg_restore because I assumed it
 would be fairly slow but I will experiment with pg_restore and template
 techniques this weekend and see which ones prove viable.


 Another possibility a bit outside my area of expertise but what about a VM
 image configured to your needs that you just spin up as needed then discard
 when done (i.e. always spinning up the same starting image)?


I'd guess the OP is running hundreds of tests, where the data needs to be
reverted/reset after each test, and each individual test might run in, say,
0.1 seconds. This is a really common technique when testing web apps. I
don't think you'd want to start a VM for each of these tests, especially
when the tests are small and specific.


Re: [GENERAL] Testing Technique when using a DB

2013-03-12 Thread Steve Crawford

On 03/12/2013 08:41 AM, Perry Smith wrote:


One choice would be to create the database, use it, and then drop it for each 
test.  I would create the database from a template that already has data taken 
from the production database (and probably trimmed down to a small subset of 
it).  This requires some crafty dancing in the Rails set up since it likes to 
just attach to a database and run but it could be done.  From first blush, this 
sounds like it would be really slow but may be not.


It depends on your environment (i.e. do you have isolated dev, test and 
production or are you testing your code on production machines) and the 
nature of your tests (functionality and bugs only or load and performance).


The speed of CREATE DATABASE foo TEMPLATE bar; depends, of course on the 
size of your template but I've found it to be fast enough for test 
databases (a few hundred MB takes well under a minute on an old 
desktop). Try it and see.


  


The other thought is perhaps there is a snap shot type concept.  I don't see it in the list of 
SQL commands.  A snap shot would do exactly what it sounds like.  It would take a snap shot and 
save it somehow.  Then a restore to snap shot would restore the DB back to that state.
I may be missing something here but pg_dump/pg_restore do exactly that. 
Or you could get more complicated and use point-in-time recovery, 
external tools like pg_barman or even, perhaps, a file-level snapshot of 
the database files (when the PostgreSQL is shut down, of course) to 
handle your base test starting point.


Of all the options mentioned my first inclination would be to create 
your test db from a known template prior to each test run.


Cheers,
Steve



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Testing Technique when using a DB

2013-03-12 Thread Steve Atkins

On Mar 12, 2013, at 8:41 AM, Perry Smith pedz...@gmail.com wrote:
 
 
 One choice would be to create the database, use it, and then drop it for each 
 test.  I would create the database from a template that already has data 
 taken from the production database (and probably trimmed down to a small 
 subset of it).  This requires some crafty dancing in the Rails set up since 
 it likes to just attach to a database and run but it could be done.  From 
 first blush, this sounds like it would be really slow but may be not. 

I do this. It's not blindingly fast, but plenty fast enough for automated 
testing as long as your tests aren't too fine-grained and your test database 
isn't too big. It takes no more than two or three seconds on my (slow, 
IO-starved) QA VMs.

By parameterizing the database name you can parallelize tests - each test 
creates it's own copy of the template database, runs whatever it needs to run, 
then drops the database. That lets you hide a lot of the setup/teardown latency.

 
 The other choice would be to somehow copy the data to temporary tables before 
 the test run and then copy it back.  The advantage to this is it is not very 
 PostgreSQL specific.  Indeed, if the template database is already set up, 
 then only one copy would be needed at the start of the test. 

You'd also need to undo any other state that was changed. Sequences, for 
instance, if they can affect the meaning of your test or expected results in 
any way.

 
 The other thought I had is if there is some type of leaky transaction.  A 
 transaction where another process can see the data but the roll back would 
 still work and be effective.  Essentially I'm asking if all the protections a 
 database offers could be dropped... but I thought I'd see if that was 
 possible. 

That - or anything else involving rolling back transactions - would only work 
if you were testing an app that didn't use transactions.

 The other thought is perhaps there is a snap shot type concept.  I don't 
 see it in the list of SQL commands.  A snap shot would do exactly what it 
 sounds like.  It would take a snap shot and save it somehow.  Then a restore 
 to snap shot would restore the DB back to that state. 

That's pretty much what creating a database from a template does, other than 
the need to have everybody disconnect from the database before doing the 
drop+create.

Dump / restore will do that too - somewhat slower, but doesn't require 
disconnecting from the DB.

File-system level snapshots are another option, but I'm pretty sure you'd need 
to shut down and restart the database server, which'd cost far more than you'd 
save.

Cheers,
  Steve

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Testing Technique when using a DB

2013-03-12 Thread Joe Van Dyk
On Mar 12, 2013, at 8:42 AM, Perry Smith pedz...@gmail.com wrote:

I tried posting this from Google Groups but I did not see it come through
after an hour so this may be a duplicate message for some.

The current testing technique for things like Ruby On Rails has three
choices but all of the choices will not work in my case.

The first choice is truncate which starts a transaction before each test.


I think you mean transactional tests here, not truncate. While the test
database is truncated once at the start of all the tests, each test runs
inside its own transaction which will never be committed.

 If the testing is within the same process, this works and a roll back
restores the DB.  But if the testing involves two processes, then the test
data entered by the test rig can not be seen by the system under test
(SUT).  With Rails, there are times when this is needed.  The test rig
drives a browser which calls into a Rails application.  There are
dangerous ways to still use this method but each has various down falls
or risks.


IMO, you don't want to use transactional tests. When you do, ActiveRecord
will use savepoints to mimic transactions. This means now() will never
change during the tests, deferred constraints/triggers won't work, other
processes can't see the data, etc.  The system is behaving differently in
the test environment than in the real one, which is bad.

If you are treating the database as a really dumb store of data, then you
may want to use transactional tests. But be aware of the caveats.


The other two choices are delete and truncate which both end up with an
empty database just after each test.  This prevents any test data that is
already in the database from being used after the first test.  Note that a
test run will run through a sequence of tests (usually quite a few).

All of these are fairly fast with each one being faster under different
conditions (according to users).

Generally, this pushes the Rails community to have either fixtures or
factories.  Both are ok solutions but both also have problems.  In my
case, I have a dozen or so tables all with very tight constraints and
creating either fixtures or factories is very troublesome.  Also, I have a
real database with real data in production and it seems foolish not to take
advantage of the knowledge contained within that database.  By knowledge
I mean the particular values and weirdness within the data that a factory
or a fixture might not realize.

One choice would be to create the database, use it, and then drop it for
each test.  I would create the database from a template that already has
data taken from the production database (and probably trimmed down to a
small subset of it).  This requires some crafty dancing in the Rails set up
since it likes to just attach to a database and run but it could be done.
 From first blush, this sounds like it would be really slow but may be not.

The other choice would be to somehow copy the data to temporary tables
before the test run and then copy it back.  The advantage to this is it is
not very PostgreSQL specific.  Indeed, if the template database is already
set up, then only one copy would be needed at the start of the test.

The other thought I had is if there is some type of leaky transaction.  A
transaction where another process can see the data but the roll back would
still work and be effective.  Essentially I'm asking if all the protections
a database offers could be dropped... but I thought I'd see if that was
possible.

The other thought is perhaps there is a snap shot type concept.  I don't
see it in the list of SQL commands.  A snap shot would do exactly what it
sounds like.  It would take a snap shot and save it somehow.  Then a
restore to snap shot would restore the DB back to that state.


This would be super super super awesome, but it doesn't exist as far as I
know. This would be a permanent snapshot that could be easily and quickly
restored.

I wonder if it would be possible to make an extension that made this easy
to do.


I thought this group might suggest other ideas and either nuke the really
bad ideas or promote the plausible ideas I've mentioned above.


Make sure to look at database_cleaner if you haven't yet. Also this may be
interesting: https://github.com/bmabey/database_cleaner/issues/80

Personally, I use database_cleaner's delete method, plus I load a SQL file
at the beginning of each test.



Sorry for the long post.  I appreciate your thoughts.

Perry


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Testing Technique when using a DB

2013-03-12 Thread Perry Smith
To all who replied:

Thank you.  I did typo.  I meant transaction instead of truncate.

I had not seriously considered pg_dump / pg_restore because I assumed it would 
be fairly slow but I will experiment with pg_restore and template techniques 
this weekend and see which ones prove viable.

I know about and use database cleaner and intend on offering it to them.  I 
posted a comment on the bug report provided.

Loading a separate SQL file seems like you are 99% of the way there.  How is 
that as far as performance?  pg_restore should be faster.

The other advantage to the pg_restore or loading an SQL file is you can have 
various files for different tests.


On Mar 12, 2013, at 10:09 PM, Joe Van Dyk wrote:

 On Mar 12, 2013, at 8:42 AM, Perry Smith pedz...@gmail.com wrote:
 
 I tried posting this from Google Groups but I did not see it come through 
 after an hour so this may be a duplicate message for some.
 
 The current testing technique for things like Ruby On Rails has three 
 choices but all of the choices will not work in my case. 
 
 The first choice is truncate which starts a transaction before each test.
 
 I think you mean transactional tests here, not truncate. While the test 
 database is truncated once at the start of all the tests, each test runs 
 inside its own transaction which will never be committed. 
 
  If the testing is within the same process, this works and a roll back 
 restores the DB.  But if the testing involves two processes, then the test 
 data entered by the test rig can not be seen by the system under test 
 (SUT).  With Rails, there are times when this is needed.  The test rig 
 drives a browser which calls into a Rails application.  There are 
 dangerous ways to still use this method but each has various down falls or 
 risks. 
 
 IMO, you don't want to use transactional tests. When you do, ActiveRecord 
 will use savepoints to mimic transactions. This means now() will never change 
 during the tests, deferred constraints/triggers won't work, other processes 
 can't see the data, etc.  The system is behaving differently in the test 
 environment than in the real one, which is bad. 
 
 If you are treating the database as a really dumb store of data, then you may 
 want to use transactional tests. But be aware of the caveats. 
 
 
 The other two choices are delete and truncate which both end up with an 
 empty database just after each test.  This prevents any test data that is 
 already in the database from being used after the first test.  Note that a 
 test run will run through a sequence of tests (usually quite a few). 
 
 All of these are fairly fast with each one being faster under different 
 conditions (according to users). 
 
 Generally, this pushes the Rails community to have either fixtures or 
 factories.  Both are ok solutions but both also have problems.  In my 
 case, I have a dozen or so tables all with very tight constraints and 
 creating either fixtures or factories is very troublesome.  Also, I have a 
 real database with real data in production and it seems foolish not to take 
 advantage of the knowledge contained within that database.  By knowledge I 
 mean the particular values and weirdness within the data that a factory or a 
 fixture might not realize. 
 
 One choice would be to create the database, use it, and then drop it for 
 each test.  I would create the database from a template that already has 
 data taken from the production database (and probably trimmed down to a 
 small subset of it).  This requires some crafty dancing in the Rails set up 
 since it likes to just attach to a database and run but it could be done.  
 From first blush, this sounds like it would be really slow but may be not. 
 
 The other choice would be to somehow copy the data to temporary tables 
 before the test run and then copy it back.  The advantage to this is it is 
 not very PostgreSQL specific.  Indeed, if the template database is already 
 set up, then only one copy would be needed at the start of the test. 
 
 The other thought I had is if there is some type of leaky transaction.  A 
 transaction where another process can see the data but the roll back would 
 still work and be effective.  Essentially I'm asking if all the protections 
 a database offers could be dropped... but I thought I'd see if that was 
 possible. 
 
 The other thought is perhaps there is a snap shot type concept.  I don't 
 see it in the list of SQL commands.  A snap shot would do exactly what it 
 sounds like.  It would take a snap shot and save it somehow.  Then a 
 restore to snap shot would restore the DB back to that state. 
 
 This would be super super super awesome, but it doesn't exist as far as I 
 know. This would be a permanent snapshot that could be easily and quickly 
 restored. 
 
 I wonder if it would be possible to make an extension that made this easy to 
 do. 
 
 
 I thought this group might suggest other ideas and either nuke the really 
 bad ideas or promote the