[PATCH] btrfs: fix locking issues in find_parent_nodes()
- We might unlock head->mutex while it was not locked - We might leave the function without unlocking delayed_refs->lock Signed-off-by: Li Zefan --- fs/btrfs/backref.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 98f6bf10..0436c12 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -583,7 +583,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, struct btrfs_path *path; struct btrfs_key info_key = { 0 }; struct btrfs_delayed_ref_root *delayed_refs = NULL; - struct btrfs_delayed_ref_head *head = NULL; + struct btrfs_delayed_ref_head *head; int info_level = 0; int ret; struct list_head prefs_delayed; @@ -607,6 +607,8 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, * at a specified point in time */ again: + head = NULL; + ret = btrfs_search_slot(trans, fs_info->extent_root, &key, path, 0, 0); if (ret < 0) goto out; @@ -635,8 +637,10 @@ again: goto again; } ret = __add_delayed_refs(head, seq, &info_key, &prefs_delayed); - if (ret) + if (ret) { + spin_unlock(&delayed_refs->lock); goto out; + } } spin_unlock(&delayed_refs->lock); -- 1.7.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
You might try sorting the entries returned by readdir by inode number before you stat them.This is a long-standing weakness in ext3/ext4, and it has to do with how we added hashed tree indexes to directories in (a) a backwards compatible way, that (b) was POSIX compliant with respect to adding and removing directory entries concurrently with reading all of the directory entries using readdir. You might try compiling spd_readdir from the e2fsprogs source tree (in the contrib directory): http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=blob;f=contrib/spd_readdir.c;h=f89832cd7146a6f5313162255f057c5a754a4b84;hb=d9a5d37535794842358e1cfe4faa4a89804ed209 … and then using that as a LD_PRELOAD, and see how that changes things. The short version is that we can't easily do this in the kernel since it's a problem that primarily shows up with very big directories, and using non-swappable kernel memory to store all of the directory entries and then sort them so they can be returned in inode number just isn't practical. It is something which can be easily done in userspace, though, and a number of programs (including mutt for its Maildir support) does do, and it helps greatly for workloads where you are calling readdir() followed by something that needs to access the inode (i.e., stat, unlink, etc.) -- Ted On Feb 29, 2012, at 8:52 AM, Jacek Luczak wrote: > Hi All, > > /*Sorry for sending incomplete email, hit wrong button :) I guess I > can't use Gmail */ > > Long story short: We've found that operations on a directory structure > holding many dirs takes ages on ext4. > > The Question: Why there's that huge difference in ext4 and btrfs? See > below test results for real values. > > Background: I had to backup a Jenkins directory holding workspace for > few projects which were co from svn (implies lot of extra .svn dirs). > The copy takes lot of time (at least more than I've expected) and > process was mostly in D (disk sleep). I've dig more and done some > extra test to see if this is not a regression on block/fs site. To > isolate the issue I've also performed same tests on btrfs. > > Test environment configuration: > 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT > enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs. > 2) Kernels: All tests were done on following kernels: > - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of > config changes mostly. In -3 we've introduced ,,fix readahead pipeline > break caused by block plug'' patch. Otherwise it's pure 2.6.39.4. > - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been > release recently). > 3) A subject of tests, directory holding: > - 54GB of data (measured on ext4) > - 1978149 files > - 844008 directories > 4) Mount options: > - ext4 -- errors=remount-ro,noatime, > data=writeback > - btrfs -- noatime,nodatacow and for later investigation on > copression effect: noatime,nodatacow,compress=lzo > > In all tests I've been measuring time of execution. Following tests > were performed: > - find . -type d > - find . -type f > - cp -a > - rm -rf > > Ext4 results: > | Type | 2.6.39.4-3 | 3.2.7 > | Dir cnt | 17m 40sec | 11m 20sec > | File cnt | 17m 36sec | 11m 22sec > | Copy| 1h 28m| 1h 27m > | Remove| 3m 43sec| 3m 38sec > > Btrfs results (without lzo comression): > | Type | 2.6.39.4-3 | 3.2.7 > | Dir cnt | 2m 22sec | 2m 21sec > | File cnt | 2m 26sec | 2m 23sec > | Copy| 36m 22sec | 39m 35sec > | Remove| 7m 51sec | 10m 43sec > > From above one can see that copy takes close to 1h less on btrfs. I've > done strace counting times of calls, results are as follows (from > 3.2.7): > 1) Ext4 (only to elements): > % time seconds usecs/call callserrors syscall > -- --- --- - - > 57.01 13.257850 1 15082163 read > 23.405.440353 3 1687702 getdents > 6.151.430559 0 3672418 lstat > 3.800.883767 0 13106961 write > 2.320.539959 0 4794099 open > 1.690.393589 0843695 mkdir > 1.280.296700 0 5637802 setxattr > 0.800.186539 0 7325195 stat > > 2) Btrfs: > % time seconds usecs/call callserrors syscall > -- --- --- - - > 53.389.486210 1 15179751 read > 11.382.021662 1 1688328 getdents > 10.641.890234 0 4800317 open > 6.831.213723 0 13201590 write > 4.850.862731 0 5644314 setxattr > 3.500.621194 1844008 mkdir > 2.750.489059 0 3675992 1 lstat > 1.710.303544 0 5644314 llistxattr > 1.500.265943 0 1978149 u
Re: LABEL only 1 device
Karel Zak posted on Tue, 28 Feb 2012 23:35:57 +0100 as excerpted: > On Sun, Feb 26, 2012 at 06:07:31PM +, Duncan wrote: >> Unfortunately, since gpt is reasonably new in terms of filesystem and >> partitioning tools, there isn't really anything (mount, etc) that makes >> /use/ of that label yet, > > udev exports GPT labels and uuids by symlinks, see > > ls /dev/disk/by-partlabel/ > ls /dev/disk/by-partuuid/ So it does. =:^) I knew about the /dev/disk/by-*/ dirs in general and had no doubt browsed past them before without actually noting the significance, but hadn't actually noticed the by-part* until you pointed it out specifically. Either that or exporting these is relatively new to udev, tho it's probably been there and I simply didn't see it. Either way, thanks! =:^) > you can use these links in your fstab. Yes. Now that I know they are there, using them in fstab makes sense, since I remember seeing the note in the mount manpage that it uses the udev symlinks internally already, so whatever udev does in this regard should "just work" with mount, and thus in fstab. Useful indeed! It seems modern Linux (or more properly, a modern udev and mount, along with the kernel of course) has rather more use for partition- labels than I was aware and thus than I was giving it credit for! =:^) Thanks! > And if I good remember kernel > supports PARTUUID for root= command line option. That wouldn't surprise me at all. That leaves grub2 (and other bootloaders). I already know grub2 prefers UUIDs to /dev/* device names. But I don't know if it handles labels, either the gpt-partlabel or the fs-label version. I'll have to try that too. Fortunately for me my device ordering is quite stable (and I hand- edit grub.cfg, no mkgrub-config here), so that "just works". But UUIDs are designed for computer use, not human use, while labels work well for both, so if grub2 handles labels and I can use either fs or partition/ device labels there too, I'll be a happy camper indeed! =:^) But just knowing mount/fstab supports partlabels is going to be a boon for me! My current setup (pending multi-way raid1 mirroring, and perhaps a bit more stability, in btrfs) has multiple partitions and partitioned md/raids, with working and backup copies of nearly all of them. When I update the backup, I often mkfs and start with a clean filesystem, then copy all the data over from the working copy. The mkfs step of course changes filesystem UUID and my labeling scheme includes the date the filesystem and backup image was made, so it changes too. So while I've been using (filesystem) labels in fstab for some time, I've had to update them when I update my backups. Now I should be able to use the partlabels in fstab instead, and those only change if I repartition, a much less frequent occurrence, meaning I can update my backups without having to update the fstab for mounting them, at the same time. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Kernel Bug at fs/btrfs/volumes.c:3638
I just noticed that there's a bugreport from opensuse user tripping over the same BUG() during log replay (and his problem was solved by btrfs-zero-log), probably after some crash. The kernel version was 3.1 ie. without the corruption fixes, so while it happened during normal use (and not via a crafted fs image), I'm not sure if this is still the case with recent kernels. Turning the BUG in __btrfs_map_block to return needs checking the value in not-so-few callers and from various callpaths, it's not straightforward to do eg. a quick return during mount, as in your case. Good that Jeff Mahoney's error handling series reduce the number of callers to update. david [ cut here ] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.1.0/linux-3.1/fs/btrfs/tree-log.c:1729 walk_down_log_tree+0x 15a/0x3e0 [btrfs]() Pid: 8978, comm: mount Not tainted 3.1.0-1.2-desktop #1 Call Trace: [] dump_trace+0xaa/0x2b0 [] dump_stack+0x69/0x6f [] warn_slowpath_common+0x7b/0xc0 [] walk_down_log_tree+0x15a/0x3e0 [btrfs] [] walk_log_tree+0xc7/0x1f0 [btrfs] [] btrfs_recover_log_trees+0x1ec/0x2d0 [btrfs] [] open_ctree+0x13c3/0x1740 [btrfs] [] btrfs_fill_super.isra.36+0x73/0x150 [btrfs] [] btrfs_mount+0x359/0x3e0 [btrfs] [] mount_fs+0x45/0x1d0 [] vfs_kern_mount+0x66/0xd0 [] do_kern_mount+0x53/0x120 [] do_mount+0x1a5/0x260 [] sys_mount+0x9a/0xf0 [] system_call_fastpath+0x16/0x1b [<7fc524137daa>] 0x7fc524137da9 ---[ end trace 2bf4520d35da960f ]--- unable to find logical 5493736079360 len 4096 [ cut here ] 1728 if (btrfs_header_level(cur) != *level) 1729 WARN_ON(1); kernel BUG at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.1.0/linux-3.1/fs/btrfs/volumes.c:2891! invalid opcode: [#1] PREEMPT SMP CPU 1 Pid: 8978, comm: mount Tainted: GW 3.1.0-1.2-desktop #1 RIP: 0010:[] [] __btrfs_map_block+0x7c8/0x890 [btrfs] RSP: 0018:8801b7507798 EFLAGS: 00010296 RAX: 0043 RBX: 04ff1c30 RCX: 2a82 RDX: 723a RSI: 0046 RDI: 0202 RBP: 8801b7507860 R08: 000a R09: R10: R11: 0001 R12: 8801dcd10cc0 R13: 0001 R14: R15: 0001 FS: 7fc524c587e0() GS:88021fd0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7faea5cb8000 CR3: 0001b74f4000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process mount (pid: 8978, threadinfo 8801b7506000, task 8801b0d9c740) Call Trace: [] btrfs_map_bio+0x57/0x210 [btrfs] [] submit_one_bio+0x64/0xa0 [btrfs] [] read_extent_buffer_pages+0x367/0x4a0 [btrfs] [] btree_read_extent_buffer_pages.isra.63+0x80/0xc0 [btrfs] [] btrfs_read_buffer+0x2a/0x40 [btrfs] [] replay_one_buffer+0x46/0x360 [btrfs] [] walk_down_log_tree+0x20d/0x3e0 [btrfs] [] walk_log_tree+0xc7/0x1f0 [btrfs] [] btrfs_recover_log_trees+0x1ec/0x2d0 [btrfs] [] open_ctree+0x13c3/0x1740 [btrfs] [] btrfs_fill_super.isra.36+0x73/0x150 [btrfs] [] btrfs_mount+0x359/0x3e0 [btrfs] [] mount_fs+0x45/0x1d0 [] vfs_kern_mount+0x66/0xd0 [] do_kern_mount+0x53/0x120 [] do_mount+0x1a5/0x260 [] sys_mount+0x9a/0xf0 [] system_call_fastpath+0x16/0x1b [<7fc524137daa>] 0x7fc524137da9 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Storage Array Corrupted
I was running a fairly old version of the kernel: Linux server 3.0.0-16-generic #28-Ubuntu SMP Fri Jan 27 17:44:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux On Wed, Feb 29, 2012 at 5:44 PM, Chris Mason wrote: > On Wed, Feb 29, 2012 at 05:11:24PM -0600, Travis Shivers wrote: >> Thank you all for helping. My btrfs array consists of 4 disks: 2 (2 >> TB) disks and 2(500 GB) disks. Since I have disks of different sizes, >> I have the array being mirrored so that there are two copies of a file >> on two separate disks. The data and metadata are mirrored. >> >> I originally made the array by using this command: >> >> # mkfs.btrfs -m raid1 -d raid1 /dev/sd[abcd] >> (The drives were originally those letters) >> >> >> All of the disks sit in an external 4 bay ESATA enclosure going into a >> PCI-E RAID card set up as JBOD, so I can use btrfs' software >> mirroring. This is the enclosure that I have: >> http://www.newegg.com/Product/Product.aspx?Item=N82E16816132029 >> >> The corruption was unexpected. I am not entirely sure what caused it, >> but a few days before the corruption, there were several power >> outages. I do not think that the problem is with the actual hard drive >> hardware since they are fairly new (6 months old) and they pass all >> SMART tests. After a reboot, the btrfs array refused to mount and >> started giving off errors. I do weekly scrubs, balances, and >> defragmentation. > > Ok, all of this should have worked. Which kernel were you running when > you had the power outages? > > I'm testing out the patch to skip the extent allocation tree at mount. > That will be the easiest way to get to the data (readonly, but it'll > work). > > -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Storage Array Corrupted
On Wed, Feb 29, 2012 at 05:11:24PM -0600, Travis Shivers wrote: > Thank you all for helping. My btrfs array consists of 4 disks: 2 (2 > TB) disks and 2(500 GB) disks. Since I have disks of different sizes, > I have the array being mirrored so that there are two copies of a file > on two separate disks. The data and metadata are mirrored. > > I originally made the array by using this command: > > # mkfs.btrfs -m raid1 -d raid1 /dev/sd[abcd] > (The drives were originally those letters) > > > All of the disks sit in an external 4 bay ESATA enclosure going into a > PCI-E RAID card set up as JBOD, so I can use btrfs' software > mirroring. This is the enclosure that I have: > http://www.newegg.com/Product/Product.aspx?Item=N82E16816132029 > > The corruption was unexpected. I am not entirely sure what caused it, > but a few days before the corruption, there were several power > outages. I do not think that the problem is with the actual hard drive > hardware since they are fairly new (6 months old) and they pass all > SMART tests. After a reboot, the btrfs array refused to mount and > started giving off errors. I do weekly scrubs, balances, and > defragmentation. Ok, all of this should have worked. Which kernel were you running when you had the power outages? I'm testing out the patch to skip the extent allocation tree at mount. That will be the easiest way to get to the data (readonly, but it'll work). -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Storage Array Corrupted
Thank you all for helping. My btrfs array consists of 4 disks: 2 (2 TB) disks and 2(500 GB) disks. Since I have disks of different sizes, I have the array being mirrored so that there are two copies of a file on two separate disks. The data and metadata are mirrored. I originally made the array by using this command: # mkfs.btrfs -m raid1 -d raid1 /dev/sd[abcd] (The drives were originally those letters) All of the disks sit in an external 4 bay ESATA enclosure going into a PCI-E RAID card set up as JBOD, so I can use btrfs' software mirroring. This is the enclosure that I have: http://www.newegg.com/Product/Product.aspx?Item=N82E16816132029 The corruption was unexpected. I am not entirely sure what caused it, but a few days before the corruption, there were several power outages. I do not think that the problem is with the actual hard drive hardware since they are fairly new (6 months old) and they pass all SMART tests. After a reboot, the btrfs array refused to mount and started giving off errors. I do weekly scrubs, balances, and defragmentation. Here is what btrfs filesystem show says: # btrfs filesystem show Label: none uuid: 2c11a326-5630-484e-9f1d-9dab777a1028 Total devices 4 FS bytes used 1.08TB devid1 size 1.82TB used 1.08TB path /dev/sdf devid2 size 1.82TB used 1.08TB path /dev/sdg devid3 size 465.76GB used 8.00MB path /dev/sdh devid4 size 465.76GB used 8.00MB path /dev/sdi Btrfs Btrfs v0.19 These are my normal mount line for the array in /etc/fstab UUID=2c11a326-5630-484e-9f1d-9dab777a1028 /mnt/main btrfs noatime,nodiratime,compress=lzo,space_cache,inode_cache 0 1 On Wed, Feb 29, 2012 at 4:14 PM, Chris Mason wrote: > On Wed, Feb 29, 2012 at 03:57:19PM -0600, Travis Shivers wrote: >> Here is the output from the commands: >> >> # ./btrfs-debug-tree -R /dev/sdh >> failed to read /dev/sr0: No medium found >> failed to read /dev/sde: No medium found >> failed to read /dev/sdd: No medium found >> failed to read /dev/sdc: No medium found >> failed to read /dev/sdb: No medium found >> failed to read /dev/sda: No medium found >> parent transid verify failed on 5568194695168 wanted 43477 found 43151 > > So far all the blocks that have come up look like they are in the extent > allocation tree. This helps because it is the easiest to recover. > > I can also make a patch for you against 3.3-rc that skips reading it > entirely, which should make it possible to copy things off. > > But before I do that, could you describe the raid array? Was it > mirrored or raid10? What exactly happened when it stopped working? > > -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Storage Array Corrupted
On Wed, Feb 29, 2012 at 03:57:19PM -0600, Travis Shivers wrote: > Here is the output from the commands: > > # ./btrfs-debug-tree -R /dev/sdh > failed to read /dev/sr0: No medium found > failed to read /dev/sde: No medium found > failed to read /dev/sdd: No medium found > failed to read /dev/sdc: No medium found > failed to read /dev/sdb: No medium found > failed to read /dev/sda: No medium found > parent transid verify failed on 5568194695168 wanted 43477 found 43151 So far all the blocks that have come up look like they are in the extent allocation tree. This helps because it is the easiest to recover. I can also make a patch for you against 3.3-rc that skips reading it entirely, which should make it possible to copy things off. But before I do that, could you describe the raid array? Was it mirrored or raid10? What exactly happened when it stopped working? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Storage Array Corrupted
Here is the output from the commands: # ./btrfs-debug-tree -R /dev/sdh failed to read /dev/sr0: No medium found failed to read /dev/sde: No medium found failed to read /dev/sdd: No medium found failed to read /dev/sdc: No medium found failed to read /dev/sdb: No medium found failed to read /dev/sda: No medium found parent transid verify failed on 5568194695168 wanted 43477 found 43151 parent transid verify failed on 5568194695168 wanted 43477 found 43151 parent transid verify failed on 5568194695168 wanted 43477 found 43151 parent transid verify failed on 5568194695168 wanted 43477 found 43151 Ignoring transid failure parent transid verify failed on 5568194748416 wanted 43477 found 43151 parent transid verify failed on 5568194748416 wanted 43477 found 43151 parent transid verify failed on 5568194748416 wanted 43477 found 43151 parent transid verify failed on 5568194748416 wanted 43477 found 43151 Ignoring transid failure root tree: 5568194412544 level 1 chunk tree: 20979712 level 1 extent tree key (EXTENT_TREE ROOT_ITEM 0) 5568194416640 level 3 device tree key (DEV_TREE ROOT_ITEM 0) 4895076519936 level 1 fs tree key (FS_TREE ROOT_ITEM 0) 4895092506624 level 2 checksum tree key (CSUM_TREE ROOT_ITEM 0) 5568194695168 level 0 parent transid verify failed on 5568194801664 wanted 43477 found 43151 parent transid verify failed on 5568194801664 wanted 43477 found 43151 parent transid verify failed on 5568194801664 wanted 43477 found 43151 parent transid verify failed on 5568194801664 wanted 43477 found 43151 Ignoring transid failure parent transid verify failed on 5568194674688 wanted 43477 found 43151 parent transid verify failed on 5568194674688 wanted 43477 found 43151 parent transid verify failed on 5568194674688 wanted 43477 found 43151 parent transid verify failed on 5568194674688 wanted 43477 found 43151 Ignoring transid failure parent transid verify failed on 5568194678784 wanted 43477 found 43151 parent transid verify failed on 5568194678784 wanted 43477 found 43151 parent transid verify failed on 5568194678784 wanted 43477 found 43151 parent transid verify failed on 5568194678784 wanted 43477 found 43151 Ignoring transid failure parent transid verify failed on 5568194809856 wanted 43477 found 43151 parent transid verify failed on 5568194809856 wanted 43477 found 43151 parent transid verify failed on 5568194809856 wanted 43477 found 43151 parent transid verify failed on 5568194809856 wanted 43477 found 43151 Ignoring transid failure parent transid verify failed on 5568194875392 wanted 43477 found 42983 parent transid verify failed on 5568194875392 wanted 43477 found 42983 parent transid verify failed on 5568194875392 wanted 43477 found 42983 parent transid verify failed on 5568194875392 wanted 43477 found 42983 Ignoring transid failure parent transid verify failed on 5568195104768 wanted 43477 found 43151 parent transid verify failed on 5568195104768 wanted 43477 found 43151 parent transid verify failed on 5568195104768 wanted 43477 found 43151 parent transid verify failed on 5568195104768 wanted 43477 found 43151 Ignoring transid failure parent transid verify failed on 5568195043328 wanted 43477 found 43151 parent transid verify failed on 5568195162112 wanted 43477 found 43175 parent transid verify failed on 5568195162112 wanted 43477 found 43175 parent transid verify failed on 5568195162112 wanted 43477 found 43175 parent transid verify failed on 5568195162112 wanted 43477 found 43175 Ignoring transid failure parent transid verify failed on 5568195166208 wanted 43477 found 43175 parent transid verify failed on 5568195166208 wanted 43477 found 43175 parent transid verify failed on 5568195166208 wanted 43477 found 43175 parent transid verify failed on 5568195166208 wanted 43477 found 43175 Ignoring transid failure btrfs root backup slot 0 tree root gen 9799893461141291008 block 0 extent root gen 67174399 block 976369115086847 chunk root gen 18446605274118684671 block 9799972705260863487 device root gen 977658994114559 block 18446638534628474880 csum root gen 94490787839 block 18446638559949619199 fs root gen 262144 block 1048576 974850661629952 used 0 total 977659432419327 devices btrfs root backup slot 1 tree root gen 16777216 block 38655295488 extent root gen 1179648 block 6989415099341275135 chunk root gen 18446605285113004031 block 977659432353792 device root gen 9223372036861329408 block 0 csum root gen 65535 block 977659424489472 fs root gen 4295032832 block 25769803776 282399669551104 used 282400664715264 total 9799892621752008704 devices btrfs root backup slot 2 tree root gen 65535 block 18446744073709551615 extent root gen 977659447099391 block 977659447033856 chunk root gen 0 block 0 device root gen 9799
[PATCH] Btrfs: stop silently switching single chunks to raid0 on balance
This has been causing a lot of confusion for quite a while now and a lot of users were surprised by this (some of them were even stuck in a ENOSPC situation which they couldn't easily get out of). The addition of restriper gives users a clear choice between raid0 and drive concat setup so there's absolutely no excuse for us to keep doing this. Signed-off-by: Ilya Dryomov --- fs/btrfs/extent-tree.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 37e0a80..e0969eb 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7029,7 +7029,6 @@ static u64 update_block_group_flags(struct btrfs_root *root, u64 flags) if (flags & (BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10)) return stripped | BTRFS_BLOCK_GROUP_DUP; - return flags; } else { /* they already had raid on here, just return */ if (flags & stripped) @@ -7042,9 +7041,9 @@ static u64 update_block_group_flags(struct btrfs_root *root, u64 flags) if (flags & BTRFS_BLOCK_GROUP_DUP) return stripped | BTRFS_BLOCK_GROUP_RAID1; - /* turn single device chunks into raid0 */ - return stripped | BTRFS_BLOCK_GROUP_RAID0; + /* this is drive concat, leave it alone */ } + return flags; } -- 1.7.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Copy and remove performance impacted by COW reflink rewrite (2)
Hi All, (retrying to post again - somehow message got blocked) I am running 3.2.7-1.fc16.x86_64 (latest FC 16 kernel) I posted regarding this problem earlier, but after some research and found improvement with new kernel version. so I decided to repost in a new thread to clean things up... The intention of this post is to hopefully be useful and point out some performance numbers that devs can use for testing and hopefully get some tips to improve performance of cp and rm. The test: 1. Create a 10GB file A 2. Make 4 writes of a few bytes each. 3. Make a reflink of A as B. 4. Remove A 5. Iterate steps 2-4 and use new B as A for each iteration. Ideally each iteration would take trivial time and run equally for each test and no or little CPU would be used. I observed increased CPU usage with each cp and rm after every iteration. One of the users pointed out that SSD erase-blocks may have something to do with the observation but I'm having difficulties taking this as an answer because the numbers are repeatable with each re-run and SSD has plenty of space. I wish I had a 16 GB RAM (probably will next week) to make a ramdisk and to run a test on it. Test can be found here: pastebin _dot_ com/1gD0aZic To run, make a directory on a btrfs filesystem with about 15G free space at least, CD INTO IT (files will be created in pwd) and run "bash random". The output will be in random.csv Graphs: The X axis is thest iteration 1-20. The Y axis on the left is time to cp/rm/sync and the RIGHT (blue) axis are write times. There's a slight improvement for the initial writes, but that's probably due to the original file having only one extent. SSD: img7.imageshack _dot_ us/img7/3928/ssdac.png (rw,relatime,ssd,nospace_cache) HDD: http://img819.imageshack _dot_ us/img819/5595/hddw.png (rw,relatime,nospace_cache) HDD graph is posted as reference. The oscillation is expected due to media fragmentation, but internal fragmentation trend follows the SSD findings for a fw initial reqrites. The slow write times are expected due to random writes causing a 10ms head delay (500 secs / 4000 ops ~= 10ms for each write) The SSD graph shows the actual problem better. The curves for rm, cp and sync seem to follow the log(n) inherent design (B-tree search complexty), but take too long in my opinion. I would like to get some advice on how to improve these times and/or some analysis from a developer. I tried balance after the test and deleted the last remaining file 21.tst, but it would take it 1.5 MINUTES to run! My conclusion is that it is broken. I also tried nodatacow and nodatasum and got no obvious improvements. Defrag would return immediately and I could run it during the test too without impacting the test numbers. Thanks, Nik Markovic -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Set nodatacow per file?
> I get the following errors when running fileflags on large (>2GB) database > files: > > open(): No such file or directory > > open(): Value too large for defined data type http://www.gnu.org/software/coreutils/faq/#Value-too-large-for-defined-data-type """The message "Value too large for defined data type" is a system error message reported when an operation on a large file is attempted using a non-large file data type. Large files are defined as anything larger than a signed 32-bit integer, or stated differently, larger than 2GB. Many system calls that deal with files return values in a "long int" data type. On 32-bit hardware a long int is 32-bits and therefore this imposes a 2GB limit on the size of files. When this was invented that was HUGE and it was hard to conceive of needing anything that large. Time has passed and files can be much larger today. On native 64-bit systems the file size limit is usually 2GB * 2GB. Which we will again think is huge.""" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Set nodatacow per file?
> > Actually it is possible. Check out David's response to my question from > > some time ago: > > http://permalink.gmane.org/gmane.comp.file-systems.btrfs/14227 > > this was a quick aid, please see attached file for an updated tool to set > the file flags, now added 'z' for NOCOMPRESS flag, and supports chattr > syntax plus all of the standard file flags. > > Setting and unsetting nocow is done like 'fileflags +C file' or -C for > unseting. Without any + or - options it prints current state. I get the following errors when running fileflags on large (>2GB) database files: open(): No such file or directory open(): Value too large for defined data type -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
2012/2/29 Chris Mason : > On Wed, Feb 29, 2012 at 03:07:45PM +0100, Jacek Luczak wrote: > > [ btrfs faster than ext for find and cp -a ] > >> 2012/2/29 Jacek Luczak : >> >> I will try to answer the question from the broken email I've sent. >> >> @Lukas, it was always a fresh FS on top of LVM logical volume. I've >> been cleaning cache/remounting to sync all data before (re)doing >> tests. > > The next step is to get cp -a out of the picture, in this case you're > benchmarking both the read speed and the write speed (what are you > copying to btw?). It's simple cp -a Jenkins{,.bak} so dir to dir copy on same volume. > Using tar cf /dev/zero is one way to get a consistent picture > of the read speed. IMO the problem is not - only - in read speed. The directory order hit here. There's a difference in the sequential tests that place btrfs as the winner but still this should not have that huge influence on getdents. I know a bit on the difference between ext4 and btrfs directory handling and I would not expect that huge difference. On the production system where the issue has been observed doing some real work in the background copy takes up to 4h. For me btrfs looks perfect here, what could be worth checking is the change of timing in syscall between 39.4 and 3.2.7. Before getdents was not that high on the list while now it jumps to second position but without huge impact on the timings. > You can confirm the theory that it is directory order causing problems > by using acp to read the data. > > http://oss.oracle.com/~mason/acp/acp-0.6.tar.bz2 Will check this still today and report back. -jacek -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
On Wed, Feb 29, 2012 at 03:07:45PM +0100, Jacek Luczak wrote: [ btrfs faster than ext for find and cp -a ] > 2012/2/29 Jacek Luczak : > > I will try to answer the question from the broken email I've sent. > > @Lukas, it was always a fresh FS on top of LVM logical volume. I've > been cleaning cache/remounting to sync all data before (re)doing > tests. The next step is to get cp -a out of the picture, in this case you're benchmarking both the read speed and the write speed (what are you copying to btw?). Using tar cf /dev/zero is one way to get a consistent picture of the read speed. You can confirm the theory that it is directory order causing problems by using acp to read the data. http://oss.oracle.com/~mason/acp/acp-0.6.tar.bz2 -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
On Wed, Feb 29, 2012 at 08:51:58AM -0500, Chris Mason wrote: > On Wed, Feb 29, 2012 at 02:31:03PM +0100, Jacek Luczak wrote: > > Ext4 results: > > | Type | 2.6.39.4-3 | 3.2.7 > > | Dir cnt | 17m 40sec | 11m 20sec > > | File cnt | 17m 36sec | 11m 22sec > > | Copy| 1h 28m| 1h 27m > > | Remove| 3m 43sec > > Are the btrfs numbers missing? ;) [ answered in a different reply, btrfs is faster in everything except delete ] The btrfs readdir uses an index that is much more likely to be sequential on disk than ext. This makes the readdir more sequential and it makes the actual file IO more sequential because we're reading things in the order they were created instead of (random) htree index order. > > In order for btrfs to be faster for cp -a, the files probably didn't > change much since creation. Btrfs maintains extra directory indexes > that help in sequential backup scans, but this usually means slower > delete performance. > > But, how exactly did you benchmark it? If you compare a fresh > mkfs.btrfs where you just copied all the data over with an ext4 FS that > has been on the disk for a long time, it isn't quite fair to ext4. > But, the consistent benchmarking part is really important. We shouldn't put an aged ext4 up against a fresh mkfs.btrfs. Did you do ext4 comparisons on a fresh copy? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
2012/2/29 Jacek Luczak : > 2012/2/29 Jacek Luczak : >> Hi Chris, >> >> the last one was borked :) Please check this one. >> >> -jacek >> >> 2012/2/29 Jacek Luczak : >>> Hi All, >>> >>> /*Sorry for sending incomplete email, hit wrong button :) I guess I >>> can't use Gmail */ >>> >>> Long story short: We've found that operations on a directory structure >>> holding many dirs takes ages on ext4. >>> >>> The Question: Why there's that huge difference in ext4 and btrfs? See >>> below test results for real values. >>> >>> Background: I had to backup a Jenkins directory holding workspace for >>> few projects which were co from svn (implies lot of extra .svn dirs). >>> The copy takes lot of time (at least more than I've expected) and >>> process was mostly in D (disk sleep). I've dig more and done some >>> extra test to see if this is not a regression on block/fs site. To >>> isolate the issue I've also performed same tests on btrfs. >>> >>> Test environment configuration: >>> 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT >>> enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs. >>> 2) Kernels: All tests were done on following kernels: >>> - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of >>> config changes mostly. In -3 we've introduced ,,fix readahead pipeline >>> break caused by block plug'' patch. Otherwise it's pure 2.6.39.4. >>> - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been >>> release recently). >>> 3) A subject of tests, directory holding: >>> - 54GB of data (measured on ext4) >>> - 1978149 files >>> - 844008 directories >>> 4) Mount options: >>> - ext4 -- errors=remount-ro,noatime, >>> data=writeback >>> - btrfs -- noatime,nodatacow and for later investigation on >>> copression effect: noatime,nodatacow,compress=lzo >>> >>> In all tests I've been measuring time of execution. Following tests >>> were performed: >>> - find . -type d >>> - find . -type f >>> - cp -a >>> - rm -rf >>> >>> Ext4 results: >>> | Type | 2.6.39.4-3 | 3.2.7 >>> | Dir cnt | 17m 40sec | 11m 20sec >>> | File cnt | 17m 36sec | 11m 22sec >>> | Copy | 1h 28m | 1h 27m >>> | Remove| 3m 43sec | 3m 38sec >>> >>> Btrfs results (without lzo comression): >>> | Type | 2.6.39.4-3 | 3.2.7 >>> | Dir cnt | 2m 22sec | 2m 21sec >>> | File cnt | 2m 26sec | 2m 23sec >>> | Copy | 36m 22sec | 39m 35sec >>> | Remove| 7m 51sec | 10m 43sec >>> >>> From above one can see that copy takes close to 1h less on btrfs. I've >>> done strace counting times of calls, results are as follows (from >>> 3.2.7): >>> 1) Ext4 (only to elements): >>> % time seconds usecs/call calls errors syscall >>> -- --- --- - - >>> 57.01 13.257850 1 15082163 read >>> 23.40 5.440353 3 1687702 getdents >>> 6.15 1.430559 0 3672418 lstat >>> 3.80 0.883767 0 13106961 write >>> 2.32 0.539959 0 4794099 open >>> 1.69 0.393589 0 843695 mkdir >>> 1.28 0.296700 0 5637802 setxattr >>> 0.80 0.186539 0 7325195 stat >>> >>> 2) Btrfs: >>> % time seconds usecs/call calls errors syscall >>> -- --- --- - - >>> 53.38 9.486210 1 15179751 read >>> 11.38 2.021662 1 1688328 getdents >>> 10.64 1.890234 0 4800317 open >>> 6.83 1.213723 0 13201590 write >>> 4.85 0.862731 0 5644314 setxattr >>> 3.50 0.621194 1 844008 mkdir >>> 2.75 0.489059 0 3675992 1 lstat >>> 1.71 0.303544 0 5644314 llistxattr >>> 1.50 0.265943 0 1978149 utimes >>> 1.02 0.180585 0 5644314 844008 getxattr >>> >>> On btrfs getdents takes much less time which prove the bottleneck in >>> copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time >>> for getdents: >>> % time seconds usecs/call calls errors syscall >>> -- --- --- - - >>> 50.77 10.978816 1 15033132 read >>> 14.46 3.125996 1 4733589 open >>> 7.15 1.546311 0 5566988 setxattr >>> 5.89 1.273845 0 3626505 lstat >>> 5.81 1.255858 1 1667050 getdents >>> 5.66 1.224403 0 13083022 write >>> 3.40 0.735114 1 833371 mkdir >>> 1.96 0.424881 0 5566988 llistxattr >>> >>> >>> Why so huge difference in the getdents timings? >>> >>> -Jacek > > I will try to answer the question from the broken email I've sent. > > @Lukas, it was always a fre
Re: getdents - ext4 vs btrfs performance
2012/2/29 Jacek Luczak : > Hi Chris, > > the last one was borked :) Please check this one. > > -jacek > > 2012/2/29 Jacek Luczak : >> Hi All, >> >> /*Sorry for sending incomplete email, hit wrong button :) I guess I >> can't use Gmail */ >> >> Long story short: We've found that operations on a directory structure >> holding many dirs takes ages on ext4. >> >> The Question: Why there's that huge difference in ext4 and btrfs? See >> below test results for real values. >> >> Background: I had to backup a Jenkins directory holding workspace for >> few projects which were co from svn (implies lot of extra .svn dirs). >> The copy takes lot of time (at least more than I've expected) and >> process was mostly in D (disk sleep). I've dig more and done some >> extra test to see if this is not a regression on block/fs site. To >> isolate the issue I've also performed same tests on btrfs. >> >> Test environment configuration: >> 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT >> enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs. >> 2) Kernels: All tests were done on following kernels: >> - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of >> config changes mostly. In -3 we've introduced ,,fix readahead pipeline >> break caused by block plug'' patch. Otherwise it's pure 2.6.39.4. >> - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been >> release recently). >> 3) A subject of tests, directory holding: >> - 54GB of data (measured on ext4) >> - 1978149 files >> - 844008 directories >> 4) Mount options: >> - ext4 -- errors=remount-ro,noatime, >> data=writeback >> - btrfs -- noatime,nodatacow and for later investigation on >> copression effect: noatime,nodatacow,compress=lzo >> >> In all tests I've been measuring time of execution. Following tests >> were performed: >> - find . -type d >> - find . -type f >> - cp -a >> - rm -rf >> >> Ext4 results: >> | Type | 2.6.39.4-3 | 3.2.7 >> | Dir cnt | 17m 40sec | 11m 20sec >> | File cnt | 17m 36sec | 11m 22sec >> | Copy | 1h 28m | 1h 27m >> | Remove| 3m 43sec | 3m 38sec >> >> Btrfs results (without lzo comression): >> | Type | 2.6.39.4-3 | 3.2.7 >> | Dir cnt | 2m 22sec | 2m 21sec >> | File cnt | 2m 26sec | 2m 23sec >> | Copy | 36m 22sec | 39m 35sec >> | Remove| 7m 51sec | 10m 43sec >> >> From above one can see that copy takes close to 1h less on btrfs. I've >> done strace counting times of calls, results are as follows (from >> 3.2.7): >> 1) Ext4 (only to elements): >> % time seconds usecs/call calls errors syscall >> -- --- --- - - >> 57.01 13.257850 1 15082163 read >> 23.40 5.440353 3 1687702 getdents >> 6.15 1.430559 0 3672418 lstat >> 3.80 0.883767 0 13106961 write >> 2.32 0.539959 0 4794099 open >> 1.69 0.393589 0 843695 mkdir >> 1.28 0.296700 0 5637802 setxattr >> 0.80 0.186539 0 7325195 stat >> >> 2) Btrfs: >> % time seconds usecs/call calls errors syscall >> -- --- --- - - >> 53.38 9.486210 1 15179751 read >> 11.38 2.021662 1 1688328 getdents >> 10.64 1.890234 0 4800317 open >> 6.83 1.213723 0 13201590 write >> 4.85 0.862731 0 5644314 setxattr >> 3.50 0.621194 1 844008 mkdir >> 2.75 0.489059 0 3675992 1 lstat >> 1.71 0.303544 0 5644314 llistxattr >> 1.50 0.265943 0 1978149 utimes >> 1.02 0.180585 0 5644314 844008 getxattr >> >> On btrfs getdents takes much less time which prove the bottleneck in >> copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time >> for getdents: >> % time seconds usecs/call calls errors syscall >> -- --- --- - - >> 50.77 10.978816 1 15033132 read >> 14.46 3.125996 1 4733589 open >> 7.15 1.546311 0 5566988 setxattr >> 5.89 1.273845 0 3626505 lstat >> 5.81 1.255858 1 1667050 getdents >> 5.66 1.224403 0 13083022 write >> 3.40 0.735114 1 833371 mkdir >> 1.96 0.424881 0 5566988 llistxattr >> >> >> Why so huge difference in the getdents timings? >> >> -Jacek I will try to answer the question from the broken email I've sent. @Lukas, it was always a fresh FS on top of LVM logical volume. I've been cleaning cache/remounting to sync all data before (re)doing tests. -Jacek BTW: Sorry for the emai
Re: getdents - ext4 vs btrfs performance
On Wed, 29 Feb 2012, Chris Mason wrote: > On Wed, Feb 29, 2012 at 02:31:03PM +0100, Jacek Luczak wrote: > > Hi All, > > > > Long story short: We've found that operations on a directory structure > > holding many dirs takes ages on ext4. > > > > The Question: Why there's that huge difference in ext4 and btrfs? See > > below test results for real values. > > > > Background: I had to backup a Jenkins directory holding workspace for > > few projects which were co from svn (implies lot of extra .svn dirs). > > The copy takes lot of time (at least more than I've expected) and > > process was mostly in D (disk sleep). I've dig more and done some > > extra test to see if this is not a regression on block/fs site. To > > isolate the issue I've also performed same tests on btrfs. > > > > Test environment configuration: > > 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT > > enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs. > > 2) Kernels: All tests were done on following kernels: > > - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of > > config changes mostly. In -3 we've introduced ,,fix readahead pipeline > > break caused by block plug'' patch. Otherwise it's pure 2.6.39.4. > > - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been > > release recently). > > 3) A subject of tests, directory holding: > > - 54GB of data (measured on ext4) > > - 1978149 files > > - 844008 directories > > 4) Mount options: > > - ext4 -- errors=remount-ro,noatime,data=writeback > > - btrfs -- noatime,nodatacow and for later investigation on > > copression effect: noatime,nodatacow,compress=lzo > > For btrfs, nodatacow and compression don't really mix. The compression > will just override it. (Just FYI, not really related to these results). > > > > > In all tests I've been measuring time of execution. Following tests > > were performed: > > - find . -type d > > - find . -type f > > - cp -a > > - rm -rf > > > > Ext4 results: > > | Type | 2.6.39.4-3 | 3.2.7 > > | Dir cnt | 17m 40sec | 11m 20sec > > | File cnt | 17m 36sec | 11m 22sec > > | Copy| 1h 28m| 1h 27m > > | Remove| 3m 43sec > > Are the btrfs numbers missing? ;) > > In order for btrfs to be faster for cp -a, the files probably didn't > change much since creation. Btrfs maintains extra directory indexes > that help in sequential backup scans, but this usually means slower > delete performance. Exactly and IIRC ext4 have directory entries stored in hash order which does not really help the sequential access. > > But, how exactly did you benchmark it? If you compare a fresh > mkfs.btrfs where you just copied all the data over with an ext4 FS that > has been on the disk for a long time, it isn't quite fair to ext4. I have the same question, note that if the files on ext4 has been worked with it may very well be that directory hash trees are not in very good shape. You can attempt to optimize that by e2fsck (just run fsck.ext4 -f ) but that may take quite some time and memory, but it is worth trying. Thanks! -Lukas > > -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Storage Array Corrupted
On Tue, Feb 28, 2012 at 09:36:35PM -0600, Travis Shivers wrote: > I upgraded my kernel so my version is now: > Linux server 3.3.0-030300rc5-generic #201202251535 SMP Sat Feb 25 > 20:36:29 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > > The problem has not been solved and I still get the previous errors. Ok, Step one is to grab the development version of btrfs-progs, which currently sits in the dangerdonteveruse branch: git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git dangerdonteveruse Run btrfs-debug-tree -R /dev/sdh and then run btrfs-debug-tree -b 5568194695168 /dev/sdh and then run btrfsck /dev/sdh Send the results of all three here, it should tell us which tree that block belongs to, and from there we'll figure out the best way to fix it. -chris > > # mount /dev/sdh /mnt/main > mount: wrong fs type, bad option, bad superblock on /dev/sdh, >missing codepage or helper program, or other error >In some cases useful info is found in syslog - try >dmesg | tail or so > > # dmesg > [ 232.985248] device fsid 2c11a326-5630-484e-9f1d-9dab777a1028 devid > 4 transid 43477 /dev/sdi > [ 232.985434] device fsid 2c11a326-5630-484e-9f1d-9dab777a1028 devid > 3 transid 43477 /dev/sdh > [ 233.027881] device fsid 2c11a326-5630-484e-9f1d-9dab777a1028 devid > 2 transid 43477 /dev/sdg > [ 233.065675] device fsid 2c11a326-5630-484e-9f1d-9dab777a1028 devid > 1 transid 43476 /dev/sdf > [ 284.384320] device fsid 2c11a326-5630-484e-9f1d-9dab777a1028 devid > 3 transid 43477 /dev/sdh > [ 284.427076] btrfs: disk space caching is enabled > [ 284.442565] verify_parent_transid: 2 callbacks suppressed > [ 284.442572] parent transid verify failed on 5568194695168 wanted > 43477 found 43151 > [ 284.442834] parent transid verify failed on 5568194695168 wanted > 43477 found 43151 > [ 284.443151] parent transid verify failed on 5568194695168 wanted > 43477 found 43151 > [ 284.443159] parent transid verify failed on 5568194695168 wanted > 43477 found 43151 > [ 284.445740] btrfs: open_ctree failed > > > On Tue, Feb 28, 2012 at 9:16 PM, cwillu wrote: > > On Tue, Feb 28, 2012 at 9:00 PM, Travis Shivers wrote: > >> Where should I grab the source from? The main repo that you have > >> listed on your main wiki page > >> (https://btrfs.wiki.kernel.org/articles/b/t/r/Btrfs_source_repositories.html) > >> is down: > >> git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git > > > > The btrfs wiki is at http://btrfs.ipv5.de . The kernel.org one is a > > static snapshot of the contents made nearly a year ago, prior to the > > kernel.org break-in, and should be ignored. > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git is > > the development tree, although the above patch is in mainline as of > > 3.3rc5, which probably makes that the easiest way to try it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
Hi Chris, the last one was borked :) Please check this one. -jacek 2012/2/29 Jacek Luczak : > Hi All, > > /*Sorry for sending incomplete email, hit wrong button :) I guess I > can't use Gmail */ > > Long story short: We've found that operations on a directory structure > holding many dirs takes ages on ext4. > > The Question: Why there's that huge difference in ext4 and btrfs? See > below test results for real values. > > Background: I had to backup a Jenkins directory holding workspace for > few projects which were co from svn (implies lot of extra .svn dirs). > The copy takes lot of time (at least more than I've expected) and > process was mostly in D (disk sleep). I've dig more and done some > extra test to see if this is not a regression on block/fs site. To > isolate the issue I've also performed same tests on btrfs. > > Test environment configuration: > 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT > enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs. > 2) Kernels: All tests were done on following kernels: > - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of > config changes mostly. In -3 we've introduced ,,fix readahead pipeline > break caused by block plug'' patch. Otherwise it's pure 2.6.39.4. > - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been > release recently). > 3) A subject of tests, directory holding: > - 54GB of data (measured on ext4) > - 1978149 files > - 844008 directories > 4) Mount options: > - ext4 -- errors=remount-ro,noatime, > data=writeback > - btrfs -- noatime,nodatacow and for later investigation on > copression effect: noatime,nodatacow,compress=lzo > > In all tests I've been measuring time of execution. Following tests > were performed: > - find . -type d > - find . -type f > - cp -a > - rm -rf > > Ext4 results: > | Type | 2.6.39.4-3 | 3.2.7 > | Dir cnt | 17m 40sec | 11m 20sec > | File cnt | 17m 36sec | 11m 22sec > | Copy | 1h 28m | 1h 27m > | Remove| 3m 43sec | 3m 38sec > > Btrfs results (without lzo comression): > | Type | 2.6.39.4-3 | 3.2.7 > | Dir cnt | 2m 22sec | 2m 21sec > | File cnt | 2m 26sec | 2m 23sec > | Copy | 36m 22sec | 39m 35sec > | Remove| 7m 51sec | 10m 43sec > > From above one can see that copy takes close to 1h less on btrfs. I've > done strace counting times of calls, results are as follows (from > 3.2.7): > 1) Ext4 (only to elements): > % time seconds usecs/call calls errors syscall > -- --- --- - - > 57.01 13.257850 1 15082163 read > 23.40 5.440353 3 1687702 getdents > 6.15 1.430559 0 3672418 lstat > 3.80 0.883767 0 13106961 write > 2.32 0.539959 0 4794099 open > 1.69 0.393589 0 843695 mkdir > 1.28 0.296700 0 5637802 setxattr > 0.80 0.186539 0 7325195 stat > > 2) Btrfs: > % time seconds usecs/call calls errors syscall > -- --- --- - - > 53.38 9.486210 1 15179751 read > 11.38 2.021662 1 1688328 getdents > 10.64 1.890234 0 4800317 open > 6.83 1.213723 0 13201590 write > 4.85 0.862731 0 5644314 setxattr > 3.50 0.621194 1 844008 mkdir > 2.75 0.489059 0 3675992 1 lstat > 1.71 0.303544 0 5644314 llistxattr > 1.50 0.265943 0 1978149 utimes > 1.02 0.180585 0 5644314 844008 getxattr > > On btrfs getdents takes much less time which prove the bottleneck in > copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time > for getdents: > % time seconds usecs/call calls errors syscall > -- --- --- - - > 50.77 10.978816 1 15033132 read > 14.46 3.125996 1 4733589 open > 7.15 1.546311 0 5566988 setxattr > 5.89 1.273845 0 3626505 lstat > 5.81 1.255858 1 1667050 getdents > 5.66 1.224403 0 13083022 write > 3.40 0.735114 1 833371 mkdir > 1.96 0.424881 0 5566988 llistxattr > > > Why so huge difference in the getdents timings? > > -Jacek -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
getdents - ext4 vs btrfs performance
Hi All, /*Sorry for sending incomplete email, hit wrong button :) I guess I can't use Gmail */ Long story short: We've found that operations on a directory structure holding many dirs takes ages on ext4. The Question: Why there's that huge difference in ext4 and btrfs? See below test results for real values. Background: I had to backup a Jenkins directory holding workspace for few projects which were co from svn (implies lot of extra .svn dirs). The copy takes lot of time (at least more than I've expected) and process was mostly in D (disk sleep). I've dig more and done some extra test to see if this is not a regression on block/fs site. To isolate the issue I've also performed same tests on btrfs. Test environment configuration: 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs. 2) Kernels: All tests were done on following kernels: - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of config changes mostly. In -3 we've introduced ,,fix readahead pipeline break caused by block plug'' patch. Otherwise it's pure 2.6.39.4. - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been release recently). 3) A subject of tests, directory holding: - 54GB of data (measured on ext4) - 1978149 files - 844008 directories 4) Mount options: - ext4 -- errors=remount-ro,noatime, data=writeback - btrfs -- noatime,nodatacow and for later investigation on copression effect: noatime,nodatacow,compress=lzo In all tests I've been measuring time of execution. Following tests were performed: - find . -type d - find . -type f - cp -a - rm -rf Ext4 results: | Type | 2.6.39.4-3 | 3.2.7 | Dir cnt | 17m 40sec | 11m 20sec | File cnt | 17m 36sec | 11m 22sec | Copy| 1h 28m| 1h 27m | Remove| 3m 43sec| 3m 38sec Btrfs results (without lzo comression): | Type | 2.6.39.4-3 | 3.2.7 | Dir cnt | 2m 22sec | 2m 21sec | File cnt | 2m 26sec | 2m 23sec | Copy| 36m 22sec | 39m 35sec | Remove| 7m 51sec | 10m 43sec >From above one can see that copy takes close to 1h less on btrfs. I've done strace counting times of calls, results are as follows (from 3.2.7): 1) Ext4 (only to elements): % time seconds usecs/call callserrors syscall -- --- --- - - 57.01 13.257850 1 15082163 read 23.405.440353 3 1687702 getdents 6.151.430559 0 3672418 lstat 3.800.883767 0 13106961 write 2.320.539959 0 4794099 open 1.690.393589 0843695 mkdir 1.280.296700 0 5637802 setxattr 0.800.186539 0 7325195 stat 2) Btrfs: % time seconds usecs/call callserrors syscall -- --- --- - - 53.389.486210 1 15179751 read 11.382.021662 1 1688328 getdents 10.641.890234 0 4800317 open 6.831.213723 0 13201590 write 4.850.862731 0 5644314 setxattr 3.500.621194 1844008 mkdir 2.750.489059 0 3675992 1 lstat 1.710.303544 0 5644314 llistxattr 1.500.265943 0 1978149 utimes 1.020.180585 0 5644314844008 getxattr On btrfs getdents takes much less time which prove the bottleneck in copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time for getdents: % time seconds usecs/call callserrors syscall -- --- --- - - 50.77 10.978816 1 15033132 read 14.463.125996 1 4733589 open 7.151.546311 0 5566988 setxattr 5.891.273845 0 3626505 lstat 5.811.255858 1 1667050 getdents 5.661.224403 0 13083022 write 3.400.735114 1833371 mkdir 1.960.424881 0 5566988 llistxattr Why so huge difference in the getdents timings? -Jacek -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getdents - ext4 vs btrfs performance
On Wed, Feb 29, 2012 at 02:31:03PM +0100, Jacek Luczak wrote: > Hi All, > > Long story short: We've found that operations on a directory structure > holding many dirs takes ages on ext4. > > The Question: Why there's that huge difference in ext4 and btrfs? See > below test results for real values. > > Background: I had to backup a Jenkins directory holding workspace for > few projects which were co from svn (implies lot of extra .svn dirs). > The copy takes lot of time (at least more than I've expected) and > process was mostly in D (disk sleep). I've dig more and done some > extra test to see if this is not a regression on block/fs site. To > isolate the issue I've also performed same tests on btrfs. > > Test environment configuration: > 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT > enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs. > 2) Kernels: All tests were done on following kernels: > - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of > config changes mostly. In -3 we've introduced ,,fix readahead pipeline > break caused by block plug'' patch. Otherwise it's pure 2.6.39.4. > - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been > release recently). > 3) A subject of tests, directory holding: > - 54GB of data (measured on ext4) > - 1978149 files > - 844008 directories > 4) Mount options: > - ext4 -- errors=remount-ro,noatime,data=writeback > - btrfs -- noatime,nodatacow and for later investigation on > copression effect: noatime,nodatacow,compress=lzo For btrfs, nodatacow and compression don't really mix. The compression will just override it. (Just FYI, not really related to these results). > > In all tests I've been measuring time of execution. Following tests > were performed: > - find . -type d > - find . -type f > - cp -a > - rm -rf > > Ext4 results: > | Type | 2.6.39.4-3 | 3.2.7 > | Dir cnt | 17m 40sec | 11m 20sec > | File cnt | 17m 36sec | 11m 22sec > | Copy| 1h 28m| 1h 27m > | Remove| 3m 43sec Are the btrfs numbers missing? ;) In order for btrfs to be faster for cp -a, the files probably didn't change much since creation. Btrfs maintains extra directory indexes that help in sequential backup scans, but this usually means slower delete performance. But, how exactly did you benchmark it? If you compare a fresh mkfs.btrfs where you just copied all the data over with an ext4 FS that has been on the disk for a long time, it isn't quite fair to ext4. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
chris.ma...@oracle.com, Al Viro , Ted Ts'o
Hi All, /*Sorry for sending incomplete email, hit wrong button :) */ Long story short: We've found that operations on a directory structure holding many dirs takes ages on ext4. The Question: Why there's that huge difference in ext4 and btrfs? See below test results for real values. Background: I had to backup a Jenkins directory holding workspace for few projects which were co from svn (implies lot of extra .svn dirs). The copy takes lot of time (at least more than I've expected) and process was mostly in D (disk sleep). I've dig more and done some extra test to see if this is not a regression on block/fs site. To isolate the issue I've also performed same tests on btrfs. Test environment configuration: 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs. 2) Kernels: All tests were done on following kernels: - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of config changes mostly. In -3 we've introduced ,,fix readahead pipeline break caused by block plug'' patch. Otherwise it's pure 2.6.39.4. - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been release recently). 3) A subject of tests, directory holding: - 54GB of data (measured on ext4) - 1978149 files - 844008 directories 4) Mount options: - ext4 -- errors=remount-ro,noatime,data=writeback - btrfs -- noatime,nodatacow and for later investigation on copression effect: noatime,nodatacow,compress=lzo In all tests I've been measuring time of execution. Following tests were performed: - find . -type d - find . -type f - cp -a - rm -rf Ext4 results: | Type | 2.6.39.4-3 | 3.2.7 | Dir cnt | 17m 40sec | 11m 20sec | File cnt | 17m 36sec | 11m 22sec | Copy| 1h 28m| 1h 27m | Remove| 3m 43sec| 3m 38sec Btrfs results (without lzo comression): | Type | 2.6.39.4-3 | 3.2.7 | Dir cnt | 2m 22sec | 2m 21sec | File cnt | 2m 26sec | 2m 23sec | Copy| 36m 22sec | 39m 35sec | Remove| 7m 51sec | 10m 43sec >From above one can see that copy takes close to 1h less on btrfs. I've done strace counting times of calls, results are as follows (from 3.2.7): 1) Ext4 (only to elements): % time seconds usecs/call callserrors syscall -- --- --- - - 57.01 13.257850 1 15082163 read 23.405.440353 3 1687702 getdents 6.151.430559 0 3672418 lstat 3.800.883767 0 13106961 write 2.320.539959 0 4794099 open 1.690.393589 0843695 mkdir 1.280.296700 0 5637802 setxattr 0.800.186539 0 7325195 stat 2) Btrfs: % time seconds usecs/call callserrors syscall -- --- --- - - 53.389.486210 1 15179751 read 11.382.021662 1 1688328 getdents 10.641.890234 0 4800317 open 6.831.213723 0 13201590 write 4.850.862731 0 5644314 setxattr 3.500.621194 1844008 mkdir 2.750.489059 0 3675992 1 lstat 1.710.303544 0 5644314 llistxattr 1.500.265943 0 1978149 utimes 1.020.180585 0 5644314844008 getxattr On btrfs getdents takes much less time which prove the bottleneck in copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time for getdents: % time seconds usecs/call callserrors syscall -- --- --- - - 50.77 10.978816 1 15033132 read 14.463.125996 1 4733589 open 7.151.546311 0 5566988 setxattr 5.891.273845 0 3626505 lstat 5.811.255858 1 1667050 getdents 5.661.224403 0 13083022 write 3.400.735114 1833371 mkdir 1.960.424881 0 5566988 llistxattr Why so huge difference in the getdents timings? -Jacek -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
getdents - ext4 vs btrfs performance
Hi All, Long story short: We've found that operations on a directory structure holding many dirs takes ages on ext4. The Question: Why there's that huge difference in ext4 and btrfs? See below test results for real values. Background: I had to backup a Jenkins directory holding workspace for few projects which were co from svn (implies lot of extra .svn dirs). The copy takes lot of time (at least more than I've expected) and process was mostly in D (disk sleep). I've dig more and done some extra test to see if this is not a regression on block/fs site. To isolate the issue I've also performed same tests on btrfs. Test environment configuration: 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs. 2) Kernels: All tests were done on following kernels: - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of config changes mostly. In -3 we've introduced ,,fix readahead pipeline break caused by block plug'' patch. Otherwise it's pure 2.6.39.4. - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been release recently). 3) A subject of tests, directory holding: - 54GB of data (measured on ext4) - 1978149 files - 844008 directories 4) Mount options: - ext4 -- errors=remount-ro,noatime,data=writeback - btrfs -- noatime,nodatacow and for later investigation on copression effect: noatime,nodatacow,compress=lzo In all tests I've been measuring time of execution. Following tests were performed: - find . -type d - find . -type f - cp -a - rm -rf Ext4 results: | Type | 2.6.39.4-3 | 3.2.7 | Dir cnt | 17m 40sec | 11m 20sec | File cnt | 17m 36sec | 11m 22sec | Copy| 1h 28m| 1h 27m | Remove| 3m 43sec -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html