Excerpts from Olaf van der Spek's message of 2011-01-26 13:30:08 -0500: > On Sat, Jan 8, 2011 at 3:40 PM, Olaf van der Spek <olafvds...@gmail.com> > wrote: > > On Fri, Jan 7, 2011 at 8:29 PM, Chris Mason <chris.ma...@oracle.com> wrote: > >> The exact amount of tracking is going to vary. The reason why is that > >> actually doing the truncate is an O(size of the file) operation and so > >> you can't just flip a switch when the write or the close comes in. You > >> have to run through all the metadata of the file and do something > >> temporary with each part that is only completed when the file IO is > >> actually done. > > > > That's true. Maybe the proper way, via O_ATOMIC, is better. > > > >> Honestly, there many different ways to solve this in the application. > >> Requiring high speed atomic replacement of individual file contents is a > >> recipe for frustration. > > > > Did you see message of Massimo? That'd be the ideal way from an app > > point of view. > > Not solving this properly in the FS moves the problem to userspace > > where it's even harder to solve and is not as performant. > > > > Replacing file data is a common operation that IMO the FS should > > support in a safe way. > > Chris? >
My answer hasn't really changed ;) Replacing file data is a common operation, but it is still surprisingly complex. Again, the truncate is O(size of the file) and it is actually impossible to do this atomically in most filesystems. You don't notice this because xfs/ext34/btrfs (and many others) have code that makes sure a truncate is restarted if you crash. So, it appears to be atomic even though we're really just restarting the operation. In order to have a truncate + replacement of data operation, we'd have to do a disk format change that includes both the truncate and the new data. It would look a lot like echo data > file.new ; truncate file ; mv file.new file, but recorded in the FS metadata. I don't have this in the btrfs roadmap. It would be nice but most people use databases for things that require atomic operations. I think what ext4 and btrfs do today fall into the category of best effort and least surprise, and I think it is as good as we can get without huge performance penalties for normal use. Now, if you want to talk about atomic replacement of file data without changing the file size, that's much easier. At least it's easier for those of us with cows in our pockets. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html