Truncation Data Loss

Nick Guenther Tue, 10 Nov 2009 01:33:38 -0800

So, as nicely summarized at
http://www.h-online.com/open/news/item/Possible-data-loss-in-Ext4-740467.html,
ext4 is kind of broken. It won't honor fsync and, as a /feature/, will
wait up to two minutes to write out data, leading to lots of files
emptied to the great bitbucket in the sky if the machine goes down in
that period. Why is this relevant to OpenBSD? Well sometimes I've been
writing a file in vi or mg and had my machine go down, and when it
comes back I find that the file is empty and I'm just trying to figure
out if this is just because the data wasn't fsync'd or if it's because
of softdep or what.


I know this is kind of a newbish question but I have no idea how I'd
go about researching it. And I'd like to sort this out because it's a
big gap in my knowledge. I thought there was a paper on softdep but
http://openbsd.org/papers doesn't have it.

NetBSD's summary <http://www.netbsd.org/docs/misc/#softdep-impact> says:
"The FFS takes care to correctly order all metadata operations, as
well as to ensure that all metadata operations precede operations on
the data to which they refer, so that the file system may be
guaranteed to be recoverable after a crash. The last N seconds of file
data may not be recoverable, where N is the syncer interval, but the
file system metadata will be. N is usually 30."

So my interpretation of this is that my missing file is a
to-be-expected ancient part of posix, unless I run sync after every
write. Is this right? Out of curiousity, what would happen if I ran
sync and pulled the power at the same time (that is, what cases can
cause the filesystem to get inconsistent)?


But I still don't get how softdeps fits into all this. That page goes on:

"With softdeps running, you've got almost the same guarantee. With
softdeps, you have the guarantee that you will get a consistent
snapshot of the file system as it was at some particular point in time
before the crash. So you don't know, as you did without softdeps,
that, for example, if you did an atomic operation such as a rename of
a lock file, the lock file will actually be there; but you do know
that the directory it was in won't be trashed and you do know that
ordering dependencies between that atomic operation and future atomic
operations will have been preserved, so if you are depending on atomic
operations to control, say, some database-like process (e.g. writing
mail spool files in batches, gathering data from a transaction system,
etc.) you can safely start back up where you appear to have left off."

but while I kind of grasp the details, I can't seem to figure out what
they mean in context.

Enlightenment appreciated! I don't want to be that guy in 20 years who
rewrites the filesystem to be more efficient by not actually writing
the quantum-light-platter.

(and btw, why isn't http://openbsd.org/papers linked from the front page?)

-Nick

Truncation Data Loss

Reply via email to