On 05/18/2010 09:13 AM, Chris Mason wrote:
On Tue, May 18, 2010 at 02:03:49PM +0200, Jakob Unterwurzacher wrote:
On 18/05/10 02:59, Chris Mason wrote:
Ok, I upgraded to 2.6.34 final and switched to defconfig.
I only did the rename test ( i.e. no overwrite ), the window is now
1.1s, both with vanilla and with the patch.
Thanks, so much for the easy fix. I'll take a look.
Ohhhhh, I read your initial email wrong, I'm sorry. The test we're
failing, the rentest, doesn't overwrite one file with another. It is
just creating a file and then renaming it.
Yes, the overwrite test goes perfectly fine.
Btrfs is explicitly choosing not to sync the file in this case because
the rename isn't replacing good old data with new unwritten data. The
rename is taking new unwritten data and giving it a different name.
Are there applications that rely on this?
-chris
Well, dpkg (the Debian/Ubuntu package manager) did. Then ext4 became the
default fs in Ubuntu and massive breakage was reported [1]. Now dpkg is
fsync()ing everything and is about 2x slower than it was with ext3 [2].
Btrfs is so close to getting it "right" that i wondered whether the new
file name hitting the disk could be delayed that one second for the data
to make it to disk first.
The thing is that different apps have a different version of 'right'. Rename
is atomically replacing one file with another, and I completely agree
that when we have an established file on disk, we shouldn't replace it
with something that is potentially garbage.
But for the zeros case we have a file that isn't on disk and we're just
giving it a new name. I can see a different class of applications
getting upset about renames slowing the system down dramatically because
they suddenly imply a lot of IO.
I'm more than open to discussion on this one, but I don't see how:
rm -f foo2
dd if=/dev/zero of=foo bs=1M count=1000
mv foo foo2
Should be expected to write 1GB of data.
-chris
Just to weigh in here, I think that you have the right behaviour
already. If an application wants to force this to sync the data to disk,
it should use fsync() after the rename.
Having application depend on semantics that only ext3 provided is not an
excuse for making a rename take multiple seconds....
Thanks!
Ric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html