>
> ---------- Forwarded message ----------
> From: Tom Lane <t...@sss.pgh.pa.us>
> To: Robert Haas <robertmh...@gmail.com>
> Date: Thu, 27 Aug 2009 10:11:24 -0400
> Subject: Re: 8.5 release timetable, again
>
> What I'd like to see is some sort of test mechanism for WAL recovery.
> What I've done sometimes in the past (and recently had to fix the tests
> to re-enable) is to kill -9 a backend immediately after running the
> regression tests, let the system replay the WAL for the tests, and then
> take a pg_dump and compare that to the dump gotten after a conventional
> run.  However this is quite haphazard since (a) the regression tests
> aren't especially designed to exercise all of the WAL logic, and (b)
> pg_dump might not show the effects of some problems, particularly not
> corruption in non-system indexes.  It would be worth the trouble to
> create a more specific test methodology.


I hacked mdwrite so that it had a static int counter.  When the counter hit
400 and if the guc_of_death was set, it would write out a partial block (to
simulate a partial page write) and then PANIC.  I have some Perl code that
runs against the database doing a bunch of updates until the database dies.
Then when it can reconnect again it makes sure the data reflects what Perl
thinks it should.  This is how I (belatedly) found and traced down the bug
in the visibility bit.  (What I was trying to do is determine if my toying
around with XLogInsert was breaking anything.  Since the regression suit
wouldn't show me a problem if one existed, I came up with this.  Then I
found things were broken even before I started toying with it...)

I don't know how lucky I was to hit open a test that found an already
existing bug.  I have to assume I was somewhat lucky, simply because it took
a run of many hours or overnight (with a simulated crash every 2 minutes or
so) to reliably detect the problem.  But how do you turn something like this
into a regression test?  Scattering the code with intentional crash inducing
code that is there to exercise the error recover parts seems like it would
be quite a mess.



> In short: merely making the tests bigger doesn't impress me in the
> least.  Focused testing on areas we aren't covering at all could be
> worth the trouble.


Do you have suggestions on what other areas need it?

Jeff

Reply via email to