Re: [HACKERS] silent data loss with ext4 / all current versions

Craig Ringer Sun, 29 Nov 2015 05:39:04 -0800

On 27 November 2015 at 21:28, Greg Stark <[email protected]> wrote:

> On Fri, Nov 27, 2015 at 11:17 AM, Tomas Vondra
> <[email protected]> wrote:
> > I plan to do more power failure testing soon, with more complex test
> > scenarios. I suspect there might be other similar issues (e.g. when we
> > rename a file before a checkpoint and don't fsync the directory - then
> the
> > rename won't be replayed and will be lost).
>
> I'm curious how you're doing this testing. The easiest way I can think
> of would be to run a database on an LVM volume and take a large number
> of LVM snapshots very rapidly and then see if the database can start
> up from each snapshot. Bonus points for keeping track of the committed
> transactions before each snaphsot and ensuring they're still there I
> guess.
>


I've had a few tries at implementing a qemu-based crashtester where it hard
kills the qemu instance at a random point then starts it back up.

I always got stuck on the validation part - actually ensuring that the DB
state is how we expect. I think I could probably get that right now, it's
been a while.

The VM can be started back up and killed again over and over quite quickly.

It's not as good as physical plug-pull, but it's a lot more practical.


-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] silent data loss with ext4 / all current versions

Reply via email to