> > ---------- Forwarded message ---------- > From: Tom Lane <t...@sss.pgh.pa.us> > To: Robert Haas <robertmh...@gmail.com> > Date: Thu, 27 Aug 2009 10:11:24 -0400 > Subject: Re: 8.5 release timetable, again > > What I'd like to see is some sort of test mechanism for WAL recovery. > What I've done sometimes in the past (and recently had to fix the tests > to re-enable) is to kill -9 a backend immediately after running the > regression tests, let the system replay the WAL for the tests, and then > take a pg_dump and compare that to the dump gotten after a conventional > run. However this is quite haphazard since (a) the regression tests > aren't especially designed to exercise all of the WAL logic, and (b) > pg_dump might not show the effects of some problems, particularly not > corruption in non-system indexes. It would be worth the trouble to > create a more specific test methodology.
I hacked mdwrite so that it had a static int counter. When the counter hit 400 and if the guc_of_death was set, it would write out a partial block (to simulate a partial page write) and then PANIC. I have some Perl code that runs against the database doing a bunch of updates until the database dies. Then when it can reconnect again it makes sure the data reflects what Perl thinks it should. This is how I (belatedly) found and traced down the bug in the visibility bit. (What I was trying to do is determine if my toying around with XLogInsert was breaking anything. Since the regression suit wouldn't show me a problem if one existed, I came up with this. Then I found things were broken even before I started toying with it...) I don't know how lucky I was to hit open a test that found an already existing bug. I have to assume I was somewhat lucky, simply because it took a run of many hours or overnight (with a simulated crash every 2 minutes or so) to reliably detect the problem. But how do you turn something like this into a regression test? Scattering the code with intentional crash inducing code that is there to exercise the error recover parts seems like it would be quite a mess. > In short: merely making the tests bigger doesn't impress me in the > least. Focused testing on areas we aren't covering at all could be > worth the trouble. Do you have suggestions on what other areas need it? Jeff