On 27 November 2015 at 21:28, Greg Stark <st...@mit.edu> wrote: > On Fri, Nov 27, 2015 at 11:17 AM, Tomas Vondra > <tomas.von...@2ndquadrant.com> wrote: > > I plan to do more power failure testing soon, with more complex test > > scenarios. I suspect there might be other similar issues (e.g. when we > > rename a file before a checkpoint and don't fsync the directory - then > the > > rename won't be replayed and will be lost). > > I'm curious how you're doing this testing. The easiest way I can think > of would be to run a database on an LVM volume and take a large number > of LVM snapshots very rapidly and then see if the database can start > up from each snapshot. Bonus points for keeping track of the committed > transactions before each snaphsot and ensuring they're still there I > guess. >
I've had a few tries at implementing a qemu-based crashtester where it hard kills the qemu instance at a random point then starts it back up. I always got stuck on the validation part - actually ensuring that the DB state is how we expect. I think I could probably get that right now, it's been a while. The VM can be started back up and killed again over and over quite quickly. It's not as good as physical plug-pull, but it's a lot more practical. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services