Excerpts from Olaf van der Spek's message of 2011-01-26 13:30:08 -0500:
> On Sat, Jan 8, 2011 at 3:40 PM, Olaf van der Spek <olafvds...@gmail.com> 
> wrote:
> > On Fri, Jan 7, 2011 at 8:29 PM, Chris Mason <chris.ma...@oracle.com> wrote:
> >> The exact amount of tracking is going to vary.  The reason why is that
> >> actually doing the truncate is an O(size of the file) operation and so
> >> you can't just flip a switch when the write or the close comes in.  You
> >> have to run through all the metadata of the file and do something
> >> temporary with each part that is only completed when the file IO is
> >> actually done.
> >
> > That's true. Maybe the proper way, via O_ATOMIC, is better.
> >
> >> Honestly, there many different ways to solve this in the application.
> >> Requiring high speed atomic replacement of individual file contents is a
> >> recipe for frustration.
> >
> > Did you see message of Massimo? That'd be the ideal way from an app
> > point of view.
> > Not solving this properly in the FS moves the problem to userspace
> > where it's even harder to solve and is not as performant.
> >
> > Replacing file data is a common operation that IMO the FS should
> > support in a safe way.
> 
> Chris?
> 

My answer hasn't really changed ;)  Replacing file data is a common
operation, but it is still surprisingly complex.  Again, the truncate is
O(size of the file) and it is actually impossible to do this atomically
in most filesystems.

You don't notice this because xfs/ext34/btrfs (and many others) have
code that makes sure a truncate is restarted if you crash.  So, it
appears to be atomic even though we're really just restarting the
operation.  In order to have a truncate + replacement of data operation,
we'd have to do a disk format change that includes both the truncate and
the new data.

It would look a lot like echo data > file.new ; truncate file ; mv
file.new file, but recorded in the FS metadata.

I don't have this in the btrfs roadmap.  It would be nice but most
people use databases for things that require atomic operations.  I
think what ext4 and btrfs do today fall into the category of best
effort and least surprise, and I think it is as good as we can get
without huge performance penalties for normal use.

Now, if you want to talk about atomic replacement of file data without
changing the file size, that's much easier.  At least it's easier for
those of us with cows in our pockets.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to