Re: df missing filesystem when run on subvolume
Resending, hopefully correct formatting. As the title suggests, running the df command on a subvolume doesn't return a filesystem. I'm not sure where the problem lies or if anyone else has noticed this. Some programs fail to detect free space as a result. Example for clarification: kyle@home:~$ sudo mount -o subvol=@data /mnt/btrfs/ kyle@home:~$ mkdir /mnt/btrfs/directory kyle@home:~$ btrfs subvolume create /mnt/btrfs/subvolume Create subvolume '/mnt/btrfs/subvolume' kyle@home:~$ sudo btrfs subvolume list /mnt/btrfs/ ID 258 gen 2757271 top level 5 path @data ID 5684 gen 2718215 top level 258 path subvolume kyle@home:~$ df /mnt/btrfs/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdc2 1412456448 1170400072 240688008 83% /mnt/btrfs kyle@home:~$ df /mnt/btrfs/directory Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdc2 1412456448 1170400072 240688008 83% /mnt/btrfs kyle@home:~$ df /mnt/btrfs/subvolume Filesystem 1K-blocks Used Available Use% Mounted on - 1412456448 1170400072 240688008 83% /mnt/btrfs/subvolume Thanks, Kyle-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
df missing filesystem when run on subvolume
As the title suggests, running the df command on a subvolume doesn't return a filesystem. I'm not sure where the problem lies or if anyone else has noticed this. Some programs fail to detect free space as a result. Example for clarification: kyle@home:~$ sudo mount -o subvol=@data /mnt/btrfs/ kyle@home:~$ mkdir /mnt/btrfs/directory kyle@home:~$ btrfs subvolume create /mnt/btrfs/subvolume Create subvolume '/mnt/btrfs/subvolume' kyle@home:~$ sudo btrfs subvolume list /mnt/btrfs/ ID 258 gen 2757271 top level 5 path @data ID 5684 gen 2718215 top level 258 path subvolume kyle@home:~$ df /mnt/btrfs/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdc2 1412456448 1170400072 240688008 83% /mnt/btrfs kyle@home:~$ df /mnt/btrfs/directory Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdc2 1412456448 1170400072 240688008 83% /mnt/btrfs kyle@home:~$ df /mnt/btrfs/subvolume Filesystem 1K-blocks Used Available Use% Mounted on - 1412456448 1170400072 240688008 83% /mnt/btrfs/subvolume Thanks, Kyle-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: your mail
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Austin S. Hemmelgarn > Sent: Thursday, September 01, 2016 6:18 AM > To: linux-btrfs@vger.kernel.org > Subject: Re: your mail > > On 2016-09-01 03:44, M G Berberich wrote: > > Am Mittwoch, den 31. August schrieb Fennec Fox: > >> Linux Titanium 4.7.2-1-MANJARO #1 SMP PREEMPT Sun Aug 21 15:04:37 > UTC > >> 2016 x86_64 GNU/Linux > >> btrfs-progs v4.7 > >> > >> Data, single: total=30.01GiB, used=18.95GiB System, single: > >> total=4.00MiB, used=16.00KiB Metadata, single: total=1.01GiB, > >> used=422.17MiB GlobalReserve, single: total=144.00MiB, used=0.00B > >> > >> {02:50} Wed Aug 31 > >> [fennectech@Titanium ~]$ sudo fstrim -v / [sudo] password for > >> fennectech: > >> Sorry, try again. > >> [sudo] password for fennectech: > >> /: 99.8 GiB (107167244288 bytes) trimmed > >> > >> {03:08} Wed Aug 31 > >> [fennectech@Titanium ~]$ sudo fstrim -v / [sudo] password for > >> fennectech: > >> /: 99.9 GiB (107262181376 bytes) trimmed > >> > >> I ran these commands minutes after echother ane each time it is > >> trimming the entire free space > >> > >> Anyone else seen this? the filesystem is the root FS and is compressed > > > > You should be very happy that it is trimming at all. Typical situation > > on a used btrfs is > > > > # fstrim -v / > > /: 0 B (0 bytes) trimmed > > > > even if there is 33G unused space ob the fs: > > > > # df -h / > > Filesystem Size Used Avail Use% Mounted on > > /dev/sda296G 61G 33G 66% / > > > I think you're using an old kernel, this has been working since at least 4.5, > but > was broken in some older releases. M G is running 4.7.2 The problem is that all space has been allocated by block groups and fstrim will only work on unallocated space. On my system all space has been allocated on my root filesystem so 0 B are trimmed: kyle@home:~$ uname -a Linux home 4.7.2-040702-generic #201608201334 SMP Sat Aug 20 17:37:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux kyle@home:~$ sudo btrfs fi show / Label: 'root' uuid: 6af4ebde-81ef-428a-a45f-0e8480ad969a Total devices 2 FS bytes used 13.44GiB devid 14 size 20.00GiB used 20.00GiB path /dev/sde2 devid 15 size 20.00GiB used 20.00GiB path /dev/sdb2 kyle@home:~$ btrfs fi df / Data, RAID1: total=18.97GiB, used=12.98GiB System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=1.00GiB, used=473.83MiB GlobalReserve, single: total=160.00MiB, used=0.00B kyle@home:~$ sudo fstrim -v / [sudo] password for kyle: /: 0 B (0 bytes) trimmed But I do have space trimmed on my home filesystem: kyle@home:~$ sudo btrfs fi show /home/ Label: 'home' uuid: b75fb450-4a28-434a-a483-e784940d463a Total devices 2 FS bytes used 18.63GiB devid 11 size 64.00GiB used 29.03GiB path /dev/sde3 devid 12 size 64.00GiB used 29.03GiB path /dev/sdb3 kyle@home:~$ btrfs fi df /home/ Data, RAID1: total=27.00GiB, used=18.46GiB System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=2.00GiB, used=168.62MiB GlobalReserve, single: total=64.00MiB, used=0.00B kyle@home:~$ sudo fstrim -v /home /home: 70 GiB (75092721664 bytes) trimmed -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
possible enhancement: failing device converted to a seed device
I'll preface this with the fact that I'm just a user and am only posing a question for a possible enhancement to btrfs. I'm quite sure it isn't currently allowed but would it be possible to set a failing device as a seed instead of kicking it out of a multi-device filesystem? This would make the failing device RO, while keeping the filesystem as a whole RW thereby allowing the user additional protection when recovering/balancing. Is this a feasible/realistic request? Thanks, Kyle -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ssd mode on rotational media
What issues would arise if ssd mode is activated because of a block layer setting the rotational flag to zero? This happens for me running btrfs on bcache. Would it be beneficial to pass the no_ssd flag? Thanks, Kyle -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] Btrfs-progs: rebuild the crc tree with --init-csum-tree
Others might be thinking this to so I better ask: Does this just read the first copy in the case of dup, raid1, etc. and plow on? I'm not sure how you would handle a mismatch due to a hardware error. Perhaps read all the copies and create another subvolume containing the mismatched copies? Thanks, Kyle From: jba...@fb.com To: linux-btrfs@vger.kernel.org Subject: [PATCH] Btrfs-progs: rebuild the crc tree with --init-csum-tree Date: Wed, 1 Oct 2014 10:34:51 -0400 We have --init-csum-tree, which just empties the csum tree. I'm not sure why we would ever need this, but we definitely need to be able to rebuild the csum tree in some cases. This patch adds the ability to completely rebuild the crc tree by reading all of the data and adding csum entries for them. This patch doesn't pay attention to NODATASUM inodes, it'll happily add csums for everything. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- cmds-check.c | 98 1 file changed, 98 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 03b0fbd..3141aa4 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -6625,6 +6625,98 @@ out: return ret; } +static int populate_csum(struct btrfs_trans_handle *trans, + struct btrfs_root *csum_root, char *buf, u64 start, + u64 len) +{ + u64 offset = 0; + u64 sectorsize; + int ret = 0; + + while (offset len) { + sectorsize = csum_root-sectorsize; + ret = read_extent_data(csum_root, buf, start + offset, +sectorsize, 0); + if (ret) + break; + ret = btrfs_csum_file_block(trans, csum_root, start + len, + start + offset, buf, sectorsize); + if (ret) + break; + offset += sectorsize; + } + return ret; +} + +static int fill_csum_tree(struct btrfs_trans_handle *trans, + struct btrfs_root *csum_root) +{ + struct btrfs_root *extent_root = csum_root-fs_info-extent_root; + struct btrfs_path *path; + struct btrfs_extent_item *ei; + struct extent_buffer *leaf; + char *buf; + struct btrfs_key key; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + key.objectid = 0; + key.type = BTRFS_EXTENT_ITEM_KEY; + key.offset = 0; + + ret = btrfs_search_slot(NULL, extent_root, key, path, 0, 0); + if (ret 0) { + btrfs_free_path(path); + return ret; + } + + buf = malloc(csum_root-sectorsize); + if (!buf) { + btrfs_free_path(path); + return -ENOMEM; + } + + while (1) { + if (path-slots[0]= btrfs_header_nritems(path-nodes[0])) { + ret = btrfs_next_leaf(extent_root, path); + if (ret 0) + break; + if (ret) { + ret = 0; + break; + } + } + leaf = path-nodes[0]; + + btrfs_item_key_to_cpu(leaf, key, path-slots[0]); + if (key.type != BTRFS_EXTENT_ITEM_KEY) { + path-slots[0]++; + continue; + } + + ei = btrfs_item_ptr(leaf, path-slots[0], + struct btrfs_extent_item); + if (!(btrfs_extent_flags(leaf, ei) + BTRFS_EXTENT_FLAG_DATA)) { + path-slots[0]++; + continue; + } + + ret = populate_csum(trans, csum_root, buf, key.objectid, + key.offset); + if (ret) + break; + path-slots[0]++; + } + + btrfs_free_path(path); + free(buf); + return ret; +} + static struct option long_options[] = { { super, 1, NULL, 's' }, { repair, 0, NULL, 0 }, @@ -6794,6 +6886,12 @@ int cmd_check(int argc, char **argv) ret = -EIO; goto close_out; } + + ret = fill_csum_tree(trans, info-csum_root); + if (ret) { + fprintf(stderr, crc refilling failed\n); + return -EIO; + } } /* * Ok now we commit and run the normal fsck, which will add -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To
RE: btrfs balance enospc
From: li...@colorremedies.com Date: Tue, 16 Sep 2014 11:26:16 -0600 On Sep 16, 2014, at 10:51 AM, Mark Murawski markm-li...@intellasoft.net wrote: Playing around with this filesystem I hot-removed a device from the array and put in a replacement. Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa Total devices 2 FS bytes used 7.43GiB devid 1 size 9.31GiB used 8.90GiB path /dev/sdc6 devid 3 size 9.31GiB used 8.90GiB path /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa removed /dev/sdc Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa Total devices 2 FS bytes used 7.43GiB devid 3 size 9.31GiB used 8.90GiB path /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa *** Some devices missing cartman {~} root# btrfs device add /dev/sdi6 / cartman {~} root# btrfs fi show Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa Total devices 3 FS bytes used 7.43GiB devid 3 size 9.31GiB used 8.90GiB path /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa devid 4 size 10.00GiB used 0.00 path /dev/sdi6 *** Some devices missing cartman {~} root# btrfs filesystem balance start / Better to use btrfs replace. But sequence wise you should do btrfs device delete missing, which should then effectively do a balance to the newly added device. So while the sequence isn't really correct, that's probably not why you're getting this failure. Does/should a balance imply removal of missing devices (as long as the minimum number of devices are still available)? Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2411, rd 0, flush 38, corrupt 137167, gen 25 Please post results of smartctl -x /dev/sdc Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2412, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2413, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2414, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2415, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2416, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2417, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2418, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2419, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2420, rd 0, flush 38, corrupt 137167, gen 25 Sep 16 12:47:14 localhost kernel: BTRFS: lost page write due to I/O error on /dev/sdc6 Sep 16 12:47:14 localhost kernel: BTRFS: lost page write due to I/O error on /dev/sdc6 I'd expect with Btrfs having problems writing to a device, that there'd be libata messages related to this also. Do you have earlier kernel messages indicating the drive or controller are reporting errors? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs-progs: mkfs: remove experimental tag
From: dste...@suse.cz To: linux-btrfs@vger.kernel.org CC: dste...@suse.cz Subject: [PATCH] btrfs-progs: mkfs: remove experimental tag Date: Thu, 31 Jul 2014 14:21:34 +0200 Make it consistent with kernel status and documentation. Signed-off-by: David Sterba dste...@suse.cz --- mkfs.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mkfs.c b/mkfs.c index 16e92221a547..538b6e6837b2 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1439,8 +1439,8 @@ int main(int ac, char **av) } /* if we are here that means all devs are good to btrfsify */ - printf(\nWARNING! - %s IS EXPERIMENTAL\n, BTRFS_BUILD_VERSION); - printf(WARNING! - see http://btrfs.wiki.kernel.org before using\n\n); + printf(%s\n, BTRFS_BUILD_VERSION); + printf(See http://btrfs.wiki.kernel.org for more\n\n); The sentence/thought isn't complete. I was left thinking more what? perhaps add: information, documentation Thanks. dev_cnt--; @@ -1597,7 +1597,6 @@ raid_groups: label, first_file, nodesize, leafsize, sectorsize, pretty_size(btrfs_super_total_bytes(root-fs_info-super_copy))); - printf(%s\n, BTRFS_BUILD_VERSION); btrfs_commit_transaction(trans, root); if (source_dir_set) { -- 1.9.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] btrfs-progs: Unify the messy error message formats
Date: Tue, 29 Jul 2014 11:18:17 +0900 From: takeuchi_sat...@jp.fujitsu.com To: kylega...@hotmail.com; linux-btrfs@vger.kernel.org Subject: Re: [PATCH 2/2] btrfs-progs: Unify the messy error message formats Hi Kyle, (2014/07/28 22:24), Kyle Gates wrote: small wording error inline below Date: Fri, 25 Jul 2014 15:17:05 +0900 From: takeuchi_sat...@jp.fujitsu.com To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/2] btrfs-progs: Unify the messy error message formats From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com - There are many format to show snapshot name in error messages, '%s', '%s, %s, ('%s'), and ('%s). Since it's messy, unify these to '%s' format. - Fix a type: s/uncorrect/incorrect/ Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- cmds-subvolume.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index b7bfb3e..ce38503 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -140,14 +140,14 @@ static int cmd_subvol_create(int argc, char **argv) dstdir = dirname(dupdir); if (!test_issubvolname(newname)) { - fprintf(stderr, ERROR: uncorrect subvolume name ('%s')\n, + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n, newname); goto out; } len = strlen(newname); if (len == 0 || len= BTRFS_VOL_NAME_MAX) { - fprintf(stderr, ERROR: subvolume name('%s)\n, + fprintf(stderr, ERROR: subvolume name too long '%s'\n, newname); goto out; } @@ -314,7 +314,7 @@ again: free(cpath); if (!test_issubvolname(vname)) { - fprintf(stderr, ERROR: incorrect subvolume name ('%s')\n, + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n, vname); ret = 1; goto out; @@ -322,7 +322,7 @@ again: len = strlen(vname); if (len == 0 || len= BTRFS_VOL_NAME_MAX) { - fprintf(stderr, ERROR: snapshot name too long ('%s)\n, + fprintf(stderr, ERROR: too long snapshot name '%s'\n, + fprintf(stderr, ERROR: snapshot name too long '%s'\n, Thank you for your comment. Fixed. How about is it? Yes, that looks good. Thanks. === From 73f9847c603fbe863f072d029b1a4948a1032d6e Mon Sep 17 00:00:00 2001 From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Date: Fri, 25 Jul 2014 12:46:27 +0900 Subject: [PATCH] btrfs-progs: unify the format of error messages. --- cmds-subvolume.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index b7bfb3e..5a99c94 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -140,14 +140,14 @@ static int cmd_subvol_create(int argc, char **argv) dstdir = dirname(dupdir); if (!test_issubvolname(newname)) { - fprintf(stderr, ERROR: uncorrect subvolume name ('%s')\n, + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n, newname); goto out; } len = strlen(newname); if (len == 0 || len= BTRFS_VOL_NAME_MAX) { - fprintf(stderr, ERROR: subvolume name too long ('%s)\n, + fprintf(stderr, ERROR: subvolume name too long '%s'\n, newname); goto out; } @@ -314,7 +314,7 @@ again: free(cpath); if (!test_issubvolname(vname)) { - fprintf(stderr, ERROR: incorrect subvolume name ('%s')\n, + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n, vname); ret = 1; goto out; @@ -322,7 +322,7 @@ again: len = strlen(vname); if (len == 0 || len= BTRFS_VOL_NAME_MAX) { - fprintf(stderr, ERROR: snapshot name too long ('%s)\n, + fprintf(stderr, ERROR: snapshot name too long '%s'\n, vname); ret = 1; goto out; @@ -722,14 +722,14 @@ static int cmd_snapshot(int argc, char **argv) } if (!test_issubvolname(newname)) { - fprintf(stderr, ERROR: incorrect snapshot name ('%s')\n, + fprintf(stderr, ERROR: incorrect snapshot name '%s'\n, newname); goto out; } len = strlen(newname); if (len == 0 || len= BTRFS_VOL_NAME_MAX) { - fprintf(stderr, ERROR: snapshot name too long ('%s)\n, + fprintf(stderr, ERROR: snapshot name too long '%s'\n, newname); goto out; } @@ -778,7 +778,7 @@ static int cmd_snapshot(int argc, char **argv) res = ioctl(fddst, BTRFS_IOC_SNAP_CREATE_V2, args); if (res 0) { - fprintf( stderr, ERROR: cannot snapshot %s - %s\n, + fprintf( stderr, ERROR: cannot snapshot '%s' - %s\n, subvol_descr, strerror(errno)); goto out; } @@ -991,7 +991,7 @@ static int cmd_subvol_show(int argc, char **argv) ret = find_mount_root(fullpath, mnt); if (ret 0) { - fprintf(stderr, ERROR: find_mount_root failed on %s: + fprintf(stderr, ERROR: find_mount_root failed on '%s': %s\n, fullpath, strerror(-ret)); goto out; } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] btrfs-progs: Unify the messy error message formats
small wording error inline below Date: Fri, 25 Jul 2014 15:17:05 +0900 From: takeuchi_sat...@jp.fujitsu.com To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/2] btrfs-progs: Unify the messy error message formats From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com - There are many format to show snapshot name in error messages, '%s', '%s, %s, ('%s'), and ('%s). Since it's messy, unify these to '%s' format. - Fix a type: s/uncorrect/incorrect/ Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- cmds-subvolume.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index b7bfb3e..ce38503 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -140,14 +140,14 @@ static int cmd_subvol_create(int argc, char **argv) dstdir = dirname(dupdir); if (!test_issubvolname(newname)) { - fprintf(stderr, ERROR: uncorrect subvolume name ('%s')\n, + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n, newname); goto out; } len = strlen(newname); if (len == 0 || len= BTRFS_VOL_NAME_MAX) { - fprintf(stderr, ERROR: subvolume name too long ('%s)\n, + fprintf(stderr, ERROR: subvolume name too long '%s'\n, newname); goto out; } @@ -314,7 +314,7 @@ again: free(cpath); if (!test_issubvolname(vname)) { - fprintf(stderr, ERROR: incorrect subvolume name ('%s')\n, + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n, vname); ret = 1; goto out; @@ -322,7 +322,7 @@ again: len = strlen(vname); if (len == 0 || len= BTRFS_VOL_NAME_MAX) { - fprintf(stderr, ERROR: snapshot name too long ('%s)\n, + fprintf(stderr, ERROR: too long snapshot name '%s'\n, + fprintf(stderr, ERROR: snapshot name too long '%s'\n, vname); ret = 1; goto out; @@ -722,14 +722,14 @@ static int cmd_snapshot(int argc, char **argv) } if (!test_issubvolname(newname)) { - fprintf(stderr, ERROR: incorrect snapshot name ('%s')\n, + fprintf(stderr, ERROR: incorrect snapshot name '%s'\n, newname); goto out; } len = strlen(newname); if (len == 0 || len= BTRFS_VOL_NAME_MAX) { - fprintf(stderr, ERROR: snapshot name too long ('%s)\n, + fprintf(stderr, ERROR: snapshot name too long '%s'\n, newname); goto out; } @@ -778,7 +778,7 @@ static int cmd_snapshot(int argc, char **argv) res = ioctl(fddst, BTRFS_IOC_SNAP_CREATE_V2, args); if (res 0) { - fprintf( stderr, ERROR: cannot snapshot %s - %s\n, + fprintf( stderr, ERROR: cannot snapshot '%s' - %s\n, subvol_descr, strerror(errno)); goto out; } @@ -991,7 +991,7 @@ static int cmd_subvol_show(int argc, char **argv) ret = find_mount_root(fullpath, mnt); if (ret 0) { - fprintf(stderr, ERROR: find_mount_root failed on %s: + fprintf(stderr, ERROR: find_mount_root failed on '%s': %s\n, fullpath, strerror(-ret)); goto out; } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: File server structure suggestion
Then there's raid10, which takes more drives and is faster, but is still limited to two mirrors. But while I haven't actually used raid10 myself, I do /not/ believe it's limited to pair-at-a-time additions. I believe it'll take, for instance five devices, just fine, staggering chunk allocation as necessary to fill all at about the same rate. I am running just that: 3 separate raid10 btrfs filesystems (root, home, media/backups) on 5 drives and they are unequal sizes too! My newer drives are bigger and have higher transfer rates which means they get more chunks but overall performance doesn't suffer. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: How does btrfs handle bad blocks in raid1?
On Thu, 9 Jan 2014 11:40:20 -0700 Chris Murphy wrote: On Jan 9, 2014, at 3:42 AM, Hugo Mills wrote: On Thu, Jan 09, 2014 at 11:26:26AM +0100, Clemens Eisserer wrote: Hi, I am running write-intensive (well sort of, one write every 10s) workloads on cheap flash media which proved to be horribly unreliable. A 32GB microSDHC card reported bad blocks after 4 days, while a usb pen drive returns bogus data without any warning at all. So I wonder, how would btrfs behave in raid1 on two such devices? Would it simply mark bad blocks as bad and continue to be operational, or will it bail out when some block can not be read/written anymore on one of the two devices? If a block is read and fails its checksum, then the other copy (in RAID-1) is checked and used if it's good. The bad copy is rewritten to use the good data. If the block is bad such that writing to it won't fix it, then there's probably two cases: the device returns an IO error, in which case I suspect (but can't be sure) that the FS will go read-only. Or the device silently fails the write and claims success, in which case you're back to the situation above of the block failing its checksum. In a normally operating drive, when the drive firmware locates a physical sector with persistent write failures, it's dereferenced. So the LBA points to a reserve physical sector, the originally can't be accessed by LBA. If all of the reserve sectors get used up, the next persistent write failure will result in a write error reported to libata and this will appear in dmesg, and should be treated as the drive being no longer in normal operation. It's a drive useful for storage developers, but not for production usage. There's no marking of bad blocks right now, and I don't know of anyone working on the feature, so the FS will probably keep going back to the bad blocks as it makes CoW copies for modification. This is maybe relevant: https://www.kernel.org/doc/htmldocs/libata/ataExceptions.html READ and WRITE commands report CHS or LBA of the first failed sector but ATA/ATAPI standard specifies that the amount of transferred data on error completion is indeterminate, so we cannot assume that sectors preceding the failed sector have been transferred and thus cannot complete those sectors successfully as SCSI does. If I understand that correctly, Btrfs really ought to either punt the device, or make the whole volume read-only. For production use, going read-only very well could mean data loss, even while preserving the state of the file system. Eventually I'd rather see the offending device ejected from the volume, and for the volume to remain rw,degraded. I would like to see btrfs hold onto the device in a read-only state like is done during a device replace operation. New writes would maintain the raid level but go out to the remaining devices and only go full filesystem read-only if the minimum number of writable devices is not met. Once a new device is added in, the replace operation could commence and drop the bad device when complete. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Does btrfs raid1 actually provide any resilience?
On 11/14/2013 11:35 AM, Lutz Vieweg wrote: On 11/14/2013 06:18 PM, George Mitchell wrote: The read only mount issue is by design. It is intended to make sure you know exactly what is going on before you proceed. Hmmm... but will a server be able to continue its operation (including writes) on an already mounted btrfs when a storage device in a btrfs-raid1 fails? (If not, that would contradict the idea of achieving a higher reliability.) The read only function is designed to make certain you know that you are simplex before you proceed further. Ok, but once I know - e.g. by verifying that indeed, one storage device is broken - is there any option to proceed (without redundancy) until I can replace the broken device? Bonus points if the raid mode is maintained during degraded operation via either dup (2 disk array) or allocating additional chunks (3+ disk array). I certainly wouldn't trust it just yet as it is not fully production ready. Sure, the server we intend to try btrfs on is one that we can restore when required, and there is a redundant server (without btrfs) that can stand in. I was just hoping for some good experiences to justify a larger field-trial. That said, I have been using it for over six months now, coming off of 3ware RAID, and I have no regrets. I guess every Linux software RAID option is an improvement when you come from those awful hardware RAID controllers, which caused us additional downtime more often than they prevented downtime. Regards, Lutz Vieweg On 11/14/2013 03:02 AM, Lutz Vieweg wrote: Hi, on a server that so far uses an MD RAID1 with XFS on it we wanted to try btrfs, instead. But even the most basic check for btrfs actually providing resilience against one of the physical storage devices failing yields a does not work result - so I wonder whether I misunderstood that btrfs is meant to not require block-device level RAID functionality underneath. Here are the test procedure: Testing was done using vanilla linux-3.12 (x86_64) plus btrfs-progs at commit c652e4efb8e2dd76ef1627d8cd649c6af5905902. Preparing two 100 MB image files: # dd if=/dev/zero of=/tmp/img1 bs=1024k count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 0.201003 s, 522 MB/s # dd if=/dev/zero of=/tmp/img2 bs=1024k count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 0.185486 s, 565 MB/s Preparing two loop devices on those images to act as the underlying block devices for btrfs: # losetup /dev/loop1 /tmp/img1 # losetup /dev/loop2 /tmp/img2 Preparing the btrfs filesystem on the loop devices: # mkfs.btrfs --data raid1 --metadata raid1 --label test /dev/loop1 /dev/loop2 SMALL VOLUME: forcing mixed metadata/data groups WARNING! - Btrfs v0.20-rc1-591-gc652e4e IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using Performing full device TRIM (100.00MiB) ... Turning ON incompat feature 'mixed-bg': mixed data and metadata block groups Created a data/metadata chunk of size 8388608 Performing full device TRIM (100.00MiB) ... adding device /dev/loop2 id 2 fs created label test on /dev/loop1 nodesize 4096 leafsize 4096 sectorsize 4096 size 200.00MiB Btrfs v0.20-rc1-591-gc652e4e Mounting the btfs filesystem: # mount -t btrfs /dev/loop1 /mnt/tmp Copying just 70MB of zeroes into a test file: # dd if=/dev/zero of=/mnt/tmp/testfile bs=1024k count=70 70+0 records in 70+0 records out 73400320 bytes (73 MB) copied, 0.0657669 s, 1.1 GB/s Checking that the testfile can be read: # md5sum /mnt/tmp/testfile b89fdccdd61d57b371f9611eec7d3cef /mnt/tmp/testfile Unmounting before further testing: # umount /mnt/tmp Now we assume that one of the two storage devices is broken, so we remove one of the two loop devices: # losetup -d /dev/loop1 Trying to mount the btrfs filesystem from the one storage device that is left: # mount -t btrfs -o device=/dev/loop2,degraded /dev/loop2 /mnt/tmp mount: wrong fs type, bad option, bad superblock on /dev/loop2, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so ... does not work. In /var/log/messages we find: kernel: btrfs: failed to read chunk root on loop2 kernel: btrfs: open_ctree failed (The same happenes when adding ,ro to the mount options.) Ok, so if the first of two disks was broken, so is our filesystem. Isn't that what RAID1 should prevent? We tried a different scenario, now the first disk remains but the second is broken: # losetup -d /dev/loop2 # losetup /dev/loop1 /tmp/img1 # mount -t btrfs -o degraded /dev/loop1 /mnt/tmp mount: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so In /var/log/messages: kernel: Btrfs: too
Re: [PATCH] Btrfs: fix broken nocow after balance
On Wednesday, June 05, 2013 Miao Xie wrote: Balance will create reloc_root for each fs root, and it's going to record last_snapshot to filter shared blocks. The side effect of setting last_snapshot is to break nocow attributes of files. Since the extents are not shared by the relocation tree after the balance, we can recover the old last_snapshot safely if no one snapshoted the source tree. We fix the above problem by this way. This patch also fixed my problem. I tend to like this patch better as the fix lands on disk allowing nocow to function with an older kernel after being balanced. Thanks, Kyle Tested-by: Kyle Gates kylega...@hotmail.com Reported-by: Kyle Gates kylega...@hotmail.com Signed-off-by: Liu Bo bo.li@oracle.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/relocation.c | 44 1 file changed, 44 insertions(+) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 395b820..934ffe6 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1305,6 +1305,7 @@ static struct btrfs_root *create_reloc_root(struct btrfs_trans_handle *trans, struct extent_buffer *eb; struct btrfs_root_item *root_item; struct btrfs_key root_key; + u64 last_snap = 0; int ret; root_item = kmalloc(sizeof(*root_item), GFP_NOFS); @@ -1320,6 +1321,7 @@ static struct btrfs_root *create_reloc_root(struct btrfs_trans_handle *trans, BTRFS_TREE_RELOC_OBJECTID); BUG_ON(ret); + last_snap = btrfs_root_last_snapshot(root-root_item); btrfs_set_root_last_snapshot(root-root_item, trans-transid - 1); } else { @@ -1345,6 +1347,12 @@ static struct btrfs_root *create_reloc_root(struct btrfs_trans_handle *trans, memset(root_item-drop_progress, 0, sizeof(struct btrfs_disk_key)); root_item-drop_level = 0; + /* + * abuse rtransid, it is safe because it is impossible to + * receive data into a relocation tree. + */ + btrfs_set_root_rtransid(root_item, last_snap); + btrfs_set_root_otransid(root_item, trans-transid); } btrfs_tree_unlock(eb); @@ -2273,8 +2281,12 @@ void free_reloc_roots(struct list_head *list) static noinline_for_stack int merge_reloc_roots(struct reloc_control *rc) { + struct btrfs_trans_handle *trans; struct btrfs_root *root; struct btrfs_root *reloc_root; + u64 last_snap; + u64 otransid; + u64 objectid; LIST_HEAD(reloc_roots); int found = 0; int ret = 0; @@ -2308,12 +2320,44 @@ again: } else { list_del_init(reloc_root-root_list); } + + /* + * we keep the old last snapshod transid in rtranid when we + * created the relocation tree. + */ + last_snap = btrfs_root_rtransid(reloc_root-root_item); + otransid = btrfs_root_otransid(reloc_root-root_item); + objectid = reloc_root-root_key.offset; + ret = btrfs_drop_snapshot(reloc_root, rc-block_rsv, 0, 1); if (ret 0) { if (list_empty(reloc_root-root_list)) list_add_tail(reloc_root-root_list, reloc_roots); goto out; + } else if (!ret) { + /* + * recover the last snapshot tranid to avoid + * the space balance break NOCOW. + */ + root = read_fs_root(rc-extent_root-fs_info, + objectid); + if (IS_ERR(root)) + continue; + + if (btrfs_root_refs(root-root_item) == 0) + continue; + + trans = btrfs_join_transaction(root); + BUG_ON(IS_ERR(trans)); + + /* Check if the fs/file tree was snapshoted or not. */ + if (btrfs_root_last_snapshot(root-root_item) == + otransid - 1) + btrfs_set_root_last_snapshot(root-root_item, + last_snap); + + btrfs_end_transaction(trans, root); } } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix broken nocow after balance
On Monday, June 03, 2013, Liu Bo wrote: Balance will create reloc_root for each fs root, and it's going to record last_snapshot to filter shared blocks. The side effect of setting last_snapshot is to break nocow attributes of files. So it turns out that checking last_snapshot does not always ensure that a node/leaf/file_extent is shared. That's why shared node/leaf needs to search extent tree for number of references even after having checked last_snapshot, and updating fs/file tree works top-down so the children will always know how many references parents put on them at the moment of checking shared status. However, our nocow path does something different, it'll firstly check if the file extent is shared, then update fs/file tree by updating inode. This ends up that the related extent record to the file extent may don't have actual multiple references when checking shared status. fs_root snap \/ leaf == refs=2 | file_extent == refs=1(but actually refs is 2) After updating fs tree(or snapshot if snapshot is not RO), it'll be fs root snap \ / cow leaf \ / file_extent == refs=2(we do have two parents) So it'll be confused by last_snapshot from balance to think that the file extent is now shared. There are actually a couple of ways to address it, but updating fs/file tree firstly might be the easiest and cleanest one. With this, updating fs/file tree will at least make a delayed ref if the file extent is really shared by several parents, we can make nocow happy again without having to check confusing last_snapshot. Works here. Extents are stable after a balance. Thanks, Kyle Tested-by: Kyle Gates kylega...@hotmail.com Reported-by: Kyle Gates kylega...@hotmail.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/extent-tree.c |4 fs/btrfs/inode.c |2 +- 2 files changed, 1 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index df472ab..d24c26c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2856,10 +2856,6 @@ static noinline int check_committed_ref(struct btrfs_trans_handle *trans, btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY)) goto out; - if (btrfs_extent_generation(leaf, ei) = - btrfs_root_last_snapshot(root-root_item)) - goto out; - iref = (struct btrfs_extent_inline_ref *)(ei + 1); if (btrfs_extent_inline_ref_type(leaf, iref) != BTRFS_EXTENT_DATA_REF_KEY) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 23c596c..0dc5c7d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1253,7 +1253,7 @@ static noinline int run_delalloc_nocow(struct inode *inode, cur_offset = start; while (1) { ret = btrfs_lookup_file_extent(trans, root, path, ino, -cur_offset, 0); +cur_offset, 1); if (ret 0) { btrfs_abort_transaction(trans, root, ret); goto error; -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nocow 'C' flag ignored after balance
On Wed, May 29, 2013 Miao Xie wrote: On wed, 29 May 2013 10:55:11 +0900, Liu Bo wrote: On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote: From: Liu Bo bo.li@oracle.com Subject: [PATCH] Btrfs: fix broken nocow after a normal balance [...] Sorry for the long wait in replying. This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu Raring kernel). I can probably try again on a newer version if you think it will help. This was my first kernel compile so I patched by hand and waited (10 hours on my old 32 bit single core machine). I did move some of the files off and back on to the filesystem to start fresh and compare but all seem to exhibit the same behavior after a balance. Thanks for testing the patch although it didn't help you. Actually I tested it to be sure that it fixed the problems in my reproducer. So anyway can you please apply this debug patch in order to nail it down? Your patch can not fix the above problem is because we may update -last_snapshot after we relocate the file data extent. For example, there are two block groups which will be relocated, One is data block group, the other is metadata block group. Then we relocate the data block group firstly, and set the new generation for the file data extent item/the relative extent item and set (new_generation - 1) for -last_snapshot. After the relocation of this block group, we will end the transaction and drop the relocation tree. If we end the space balance now, we won't break the nocow rule because -last_snapshot is less than the generation of the file data extent item/the relative extent item. But there is still one block group which will be relocated, when relocating the second block group, we will also start a new transaction, and update -last_snapshot if need. So, -last_snapshot is greater than the generation of the file data extent item we set before. And the nocow rule is broken. Back to this above problem. I don't think it is a serious problem, we only do COW once after the relocation, then we will still honour the nocow rule. The behaviour is similar to snapshot. So maybe it needn't be fixed. I would argue that for large vm workloads, running a balance or adding disks is a common practice that will result in a drastic drop in performance as well as massive increases in metadata writes and fragmentation. In my case my disks were thrashing severely, performance was poor and ntp couldn't even hold my clock stable. If the fix is nontrival please add this to the todo list. Thanks, Kyle If we must fix this problem, I think the only way is that get the generation at the beginning of the space balance, and then set it to -last_snapshot if -last_snapshot is less than it, don't use (current_generation - 1) to update the -last_snapshot. Besides that, don't forget to store the generation into btrfs_balance_item, or the problem will happen after we resume the balance. Thanks Miao thanks, liubo [...] -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nocow 'C' flag ignored after balance
From: Liu Bo bo.li@oracle.com Subject: [PATCH] Btrfs: fix broken nocow after a normal balance Balance will create reloc_root for each fs root, and it's going to record last_snapshot to filter shared blocks. The side effect of setting last_snapshot is to break nocow attributes of files. So here we update file extent's generation while walking relocated file extents in data reloc root, and use file extent's generation instead for checking if we have cross refs for the file extent. That way we can make nocow happy again and have no impact on others. Reported-by: Kyle Gates kylega...@hotmail.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/ctree.h |2 +- fs/btrfs/extent-tree.c | 18 +- fs/btrfs/inode.c | 10 -- fs/btrfs/relocation.c |1 + 4 files changed, 23 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4560052..eb2e782 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_root *root, u64 bytenr, u64 num_bytes); int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, struct btrfs_root *root, - u64 objectid, u64 offset, u64 bytenr); + u64 objectid, u64 offset, u64 bytenr, u64 gen); struct btrfs_block_group_cache *btrfs_lookup_block_group( struct btrfs_fs_info *info, u64 bytenr); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1e84c74..f3b3616 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2816,7 +2816,8 @@ out: static noinline int check_committed_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, - u64 objectid, u64 offset, u64 bytenr) + u64 objectid, u64 offset, u64 bytenr, + u64 fi_gen) { struct btrfs_root *extent_root = root-fs_info-extent_root; struct extent_buffer *leaf; @@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct btrfs_trans_handle *trans, btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY)) goto out; - if (btrfs_extent_generation(leaf, ei) = - btrfs_root_last_snapshot(root-root_item)) + /* + * Usually generation in extent item is larger than that in file extent + * item because of delay refs. But we don't want balance to break + * file's nocow behaviour, so use file_extent's generation which has + * been updates when we update fs root to point to relocated file + * extents in data reloc root. + */ + fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen); + if (fi_gen = btrfs_root_last_snapshot(root-root_item)) goto out; iref = (struct btrfs_extent_inline_ref *)(ei + 1); @@ -2886,7 +2894,7 @@ out: int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, struct btrfs_root *root, - u64 objectid, u64 offset, u64 bytenr) + u64 objectid, u64 offset, u64 bytenr, u64 gen) { struct btrfs_path *path; int ret; @@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, do { ret = check_committed_ref(trans, root, path, objectid, - offset, bytenr); + offset, bytenr, gen); if (ret ret != -ENOENT) goto out; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2cfdd33..976b045 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1727,6 +1727,8 @@ next_slot: ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi); if (extent_type == BTRFS_FILE_EXTENT_REG || extent_type == BTRFS_FILE_EXTENT_PREALLOC) { + u64 gen; + gen = btrfs_file_extent_generation(leaf, fi); disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi); extent_offset = btrfs_file_extent_offset(leaf, fi); extent_end = found_key.offset + @@ -1749,7 +1751,8 @@ next_slot: goto out_check; if (btrfs_cross_ref_exist(trans, root, ino, found_key.offset - - extent_offset, disk_bytenr)) + extent_offset, disk_bytenr, + gen)) goto out_check; disk_bytenr += extent_offset; disk_bytenr += cur_offset - found_key.offset; @@ -7002,6 +7005,7 @@ static noinline int can_nocow_odirect(struct btrfs_trans_handle *trans, struct btrfs_key key; u64 disk_bytenr; u64 backref_offset; + u64 fi_gen; u64 extent_end; u64 num_bytes; int slot; @@ -7048,6 +7052,7 @@ static noinline int can_nocow_odirect(struct btrfs_trans_handle *trans, } disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi); backref_offset = btrfs_file_extent_offset(leaf, fi); + fi_gen = btrfs_file_extent_generation(leaf, fi); *orig_start = key.offset - backref_offset; *orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi); @@ -7067,7 +7072,8 @@ static noinline int can_nocow_odirect(struct btrfs_trans_handle *trans, * find any we must cow */ if (btrfs_cross_ref_exist(trans, root, btrfs_ino(inode), - key.offset - backref_offset, disk_bytenr)) + key.offset - backref_offset, disk_bytenr, + fi_gen)) goto out; /* diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 704a1b8..07faabf 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1637,6 +1637,7 @@ int replace_file_extents
Re: nocow 'C' flag ignored after balance
On Fri, 17 May 2013 15:04:45 +0800, Liu Bo wrote: On Thu, May 16, 2013 at 02:11:41PM -0500, Kyle Gates wrote: and mounted with autodefrag Am I actually just seeing large ranges getting split while remaining contiguous on disk? This would imply crc calculation on the two outside ranges. Or perhaps there is some data being inlined for each write. I believe writes on this file are 32KiB each. Does the balance produce persistent crc values in the metadata even though the files are nocow which implies nocrc? ... I ran this test again and here's filefrag -v after about a day of use: [...] As you can see the 32KiB writes fit in the extents of size 9 and 55. Are those 9 block extents inlined? If I understand correctly, new extents are created for these nocow writes, then the old extents are basically hole punched producing three (four? because of inlining) separate extents. Something here begs for optimization. Perhaps balance should treat nocow files a little differently. That would be the time to remove the extra bits that prevent inplace overwrites. After the fact it becomes much more difficult, although removing a crc for the extent being written seems a little easier then iterating over the entire file. Thanks for taking the time to read, Kyle P.S. I'm CCing David as I believe he wrote the patch to get the 'C' flag working on empty files and directories. Hi Kyle, Can you please apply this patch and see if it helps? thanks, liubo From: Liu Bo bo.li@oracle.com Subject: [PATCH] Btrfs: fix broken nocow after a normal balance Balance will create reloc_root for each fs root, and it's going to record last_snapshot to filter shared blocks. The side effect of setting last_snapshot is to break nocow attributes of files. So here we update file extent's generation while walking relocated file extents in data reloc root, and use file extent's generation instead for checking if we have cross refs for the file extent. That way we can make nocow happy again and have no impact on others. Reported-by: Kyle Gates kylega...@hotmail.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/ctree.h |2 +- fs/btrfs/extent-tree.c | 18 +- fs/btrfs/inode.c | 10 -- fs/btrfs/relocation.c |1 + 4 files changed, 23 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4560052..eb2e782 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_root *root, u64 bytenr, u64 num_bytes); int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, struct btrfs_root *root, - u64 objectid, u64 offset, u64 bytenr); + u64 objectid, u64 offset, u64 bytenr, u64 gen); struct btrfs_block_group_cache *btrfs_lookup_block_group( struct btrfs_fs_info *info, u64 bytenr); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1e84c74..f3b3616 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2816,7 +2816,8 @@ out: static noinline int check_committed_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, - u64 objectid, u64 offset, u64 bytenr) + u64 objectid, u64 offset, u64 bytenr, + u64 fi_gen) { struct btrfs_root *extent_root = root-fs_info-extent_root; struct extent_buffer *leaf; @@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct btrfs_trans_handle *trans, btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY)) goto out; - if (btrfs_extent_generation(leaf, ei) = - btrfs_root_last_snapshot(root-root_item)) + /* + * Usually generation in extent item is larger than that in file extent + * item because of delay refs. But we don't want balance to break + * file's nocow behaviour, so use file_extent's generation which has + * been updates when we update fs root to point to relocated file + * extents in data reloc root. + */ + fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen); + if (fi_gen = btrfs_root_last_snapshot(root-root_item)) goto out; iref = (struct btrfs_extent_inline_ref *)(ei + 1); @@ -2886,7 +2894,7 @@ out: int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, struct btrfs_root *root, - u64 objectid, u64 offset, u64 bytenr) + u64 objectid, u64 offset, u64 bytenr, u64 gen) { struct btrfs_path *path; int ret; @@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, do { ret = check_committed_ref(trans, root, path, objectid, - offset, bytenr); + offset, bytenr, gen); if (ret ret != -ENOENT) goto out; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2cfdd33..976b045 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1727,6 +1727,8 @@ next_slot: ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi); if (extent_type == BTRFS_FILE_EXTENT_REG || extent_type == BTRFS_FILE_EXTENT_PREALLOC) { + u64 gen; + gen = btrfs_file_extent_generation(leaf, fi); disk_bytenr = btrfs_file_extent_disk_bytenr
Re: nocow 'C' flag ignored after balance
On Fri, May 10, 2013 Liu Bo wrote: On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote: I'll preface that I'm running Ubuntu 13.04 with the standard 3.8 series kernel so please disregard if this has been fixed in higher versions. This is on a btrfs RAID1 with 3 then 4 disks. My use case is to set the nocow 'C' flag on a directory and copy in some files, then make lots of writes (same file sizes) and note that the number of extents stays the same, good. Then run a balance (I added a disk) and start making writes again, now the number of extents starts climbing, boo. Is this standard behavior? I realize a balance will cow the files. Are they also being checksummed thereby breaking the nocow flag? I have made no snapshots and made no writes to said files while the balance was running. Hi Kyle, It's hard to say if it's standard, it is a side effect casued by balance. During balance, our reloc root works like a snapshot, so we set last_snapshot on the fs root, and this makes new nocow writes think that we have to do cow as the extent is created before taking snapshot. But the nocow 'C' flag on the file is still there, if you make new writes on the new extent after balance, you still get the same number of extents. thanks, liubo Thank you for the explanation. On my machine this didn't happen however. IIRC one ~10GiB file had 24 extents before balance, 26 extents after balance, and 1000+ and growing when I checked the following day. I'll add that I am running a relatively recent version of btrfs-tools from a ppa. and mounted with autodefrag Am I actually just seeing large ranges getting split while remaining contiguous on disk? This would imply crc calculation on the two outside ranges. Or perhaps there is some data being inlined for each write. I believe writes on this file are 32KiB each. Does the balance produce persistent crc values in the metadata even though the files are nocow which implies nocrc? ... I ran this test again and here's filefrag -v after about a day of use: Filesystem type is: 9123683e File size of /blah/blah/file is 10213265920 (2493474 blocks, blocksize 4096) ext logical physical expected length flags 0 0 675625629 9 1 9 675621279 675625638 55 2 64 674410131 675621334886 3 950 675558303 674411017 9 4 959 675583473 675558312 55 51014 674411081 675583528708 61722 675456318 674411789 9 71731 675710934 675456327 55 81786 674411853 675710989521 92307 675424433 674412374 9 102316 675471062 675424442 55 112371 674412438 675471117984 123355 676012018 674413422 9 133364 676024295 676012027 55 143419 674413486 676024350871 154290 675681138 674414357 9 164299 675618500 675681147 55 ... 13986 2486955 671627059 675876382627 13987 2487582 675677542 671627686 9 13988 2487591 675700351 675677551 55 13989 2487646 671627750 675700406 1212 13990 2488858 675932037 671628962 9 13991 2488867 675990025 675932046 55 13992 2488922 671629026 675990080220 13993 2489142 675674447 671629246 9 13994 2489151 675687864 675674456 55 13995 2489206 671629310 675687919 1821 13996 2491027 676209288 671631131 9 13997 2491036 676260767 676209297 55 13998 2491091 671631195 676260822285 13999 2491376 675650278 671631480 9 14000 2491385 675678822 675650287 55 14001 2491440 671631544 675678877 1464 14002 2492904 675534255 671633008 9 14003 2492913 675503514 675534264 55 14004 2492968 671633072 675503569506 eof /blah/blah/file: 14005 extents found As you can see the 32KiB writes fit in the extents of size 9 and 55. Are those 9 block extents inlined? If I understand correctly, new extents are created for these nocow writes, then the old extents are basically hole punched producing three (four? because of inlining) separate extents. Something here begs for optimization. Perhaps balance should treat nocow files a little differently. That would be the time to remove the extra bits that prevent inplace overwrites. After the fact it becomes much more difficult, although removing a crc for the extent being written seems a little easier then iterating over the entire file. Thanks for taking the time to read, Kyle P.S. I'm CCing David as I believe he wrote the patch to get the 'C' flag working on empty files and directories. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nocow 'C' flag ignored after balance
On Fri, May 10, 2013 Liu Bo wrote: On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote: I'll preface that I'm running Ubuntu 13.04 with the standard 3.8 series kernel so please disregard if this has been fixed in higher versions. This is on a btrfs RAID1 with 3 then 4 disks. My use case is to set the nocow 'C' flag on a directory and copy in some files, then make lots of writes (same file sizes) and note that the number of extents stays the same, good. Then run a balance (I added a disk) and start making writes again, now the number of extents starts climbing, boo. Is this standard behavior? I realize a balance will cow the files. Are they also being checksummed thereby breaking the nocow flag? I have made no snapshots and made no writes to said files while the balance was running. Hi Kyle, It's hard to say if it's standard, it is a side effect casued by balance. During balance, our reloc root works like a snapshot, so we set last_snapshot on the fs root, and this makes new nocow writes think that we have to do cow as the extent is created before taking snapshot. But the nocow 'C' flag on the file is still there, if you make new writes on the new extent after balance, you still get the same number of extents. thanks, liubo Thank you for the explanation. On my machine this didn't happen however. IIRC one 10GiB file had 24 extents before balance, 26 extents after balance, and 1000+ and growing when I checked the following day. I'll add that I am running a relatively recent version of btrfs-tools from a ppa. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nocow 'C' flag ignored after balance
I'll preface that I'm running Ubuntu 13.04 with the standard 3.8 series kernel so please disregard if this has been fixed in higher versions. This is on a btrfs RAID1 with 3 then 4 disks. My use case is to set the nocow 'C' flag on a directory and copy in some files, then make lots of writes (same file sizes) and note that the number of extents stays the same, good. Then run a balance (I added a disk) and start making writes again, now the number of extents starts climbing, boo. Is this standard behavior? I realize a balance will cow the files. Are they also being checksummed thereby breaking the nocow flag? I have made no snapshots and made no writes to said files while the balance was running. Thanks, Kyle -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: no space left on device.
So I have ended up in a state where I can't delete files with rm. the error I get is no space on device. however I'm not even close to empty. /dev/sdb1 38G 27G 9.5G 75% there is about 800k files/dirs in this filesystem extra strange is that I can in another directory create and delete files. So I tried pretty much all I could google my way to but problem persisted. So I decided to do a backup and a format. But when the backup was done I tried one more time and now it was possible to delete the directory and all content? using the 3.5 kernel in ubuntu 12.10. Is this a known issue ? is it fixed in later kernels? fsck /btrfs scrub and kernel log. nothing indicate any problem of any kind. First let's see the output of: btrfs fi df /mountpoint You're probably way over allocated in metadata so a balance should help: btrfs bal start -m /mountpoint or omit the -m option to run a full balance. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: btrfs seems to do COW while inode has NODATACOW set
Wade, thanks. Yes, with the preallocated extent I saw the behavior you describe, and it makes perfect sense to alloc a new EXTENT_DATA in this case. In my case, I did another simple test: Before: item 4 key (257 INODE_ITEM 0) itemoff 3593 itemsize 160 inode generation 5 transid 5 size 5368709120 nbytes 5368709120 owner[0:0] mode 100644 inode blockgroup 0 nlink 1 flags 0x3 seq 0 item 5 key (257 INODE_REF 256) itemoff 3578 itemsize 15 inode ref index 2 namelen 5 name: vol-1 item 6 key (257 EXTENT_DATA 0) itemoff 3525 itemsize 53 extent data disk byte 5368709120 nr 131072 extent data offset 0 nr 131072 ram 131072 extent compression 0 item 7 key (257 EXTENT_DATA 131072) itemoff 3472 itemsize 53 extent data disk byte 5905842176 nr 33423360 extent data offset 0 nr 33423360 ram 33423360 extent compression 0 ... I am going to do a single write of a 4Kib block into (257 EXTENT_DATA 131072) extent: dd if=/dev/urandom of=/mnt/src/subvol-1/vol-1 bs=4096 seek=32 count=1 conv=notrunc After: item 4 key (257 INODE_ITEM 0) itemoff 3593 itemsize 160 inode generation 5 transid 21 size 5368709120 nbytes 5368709120 owner[0:0] mode 100644 inode blockgroup 0 nlink 1 flags 0x3 seq 1 item 5 key (257 INODE_REF 256) itemoff 3578 itemsize 15 inode ref index 2 namelen 5 name: vol-1 item 6 key (257 EXTENT_DATA 0) itemoff 3525 itemsize 53 extent data disk byte 5368709120 nr 131072 extent data offset 0 nr 131072 ram 131072 extent compression 0 item 7 key (257 EXTENT_DATA 131072) itemoff 3472 itemsize 53 extent data disk byte 5368840192 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 item 8 key (257 EXTENT_DATA 135168) itemoff 3419 itemsize 53 extent data disk byte 5905842176 nr 33423360 extent data offset 4096 nr 33419264 ram 33423360 extent compression 0 We clearly see that a new extent has been allocated for some reason (bytenr=5368840192), and previous extent (bytenr=5905842176) is still there, but used at offset of 4096. This is exactly cow, I believe. Hmm, I'm pretty sure that using 'dd' in this fashion skips the first 32 4096-sized blocks and thus writes -past- the length of this extent (eg: writes from 131073 to 135168). This causes a new extent to be allocated after the previous extent. But even if using 'dd' with a 'skip' value of '31' created a new EXTENT_DATA, it would not necessarily be data CoW, since data CoW refers only to the location of the -data- (i.e., not metadata and thus not EXTENT_DATA) on disk. The key thing is to look at where the EXTENT_DATAs are pointing to, not how many EXTENT_DATAs there are. However, your hint about not being able to read into memory may be useful; it would be good if we can find the place in the code that does that decision to cow. Try looking at the callers of btrfs_cow_block(), but you'll be own your own from there :) I guess I am looking for a way to never ever allocate new EXTENT_DATAs on a fully-mapped file. Is there one? Hmm, I don't think that this exists right now. You could try a '-o autodefrag' to minimize the number of EXTENT_DATAs, though. This seems to be a start at what you're looking for: Commit: 7e97b8daf63487c20f78487bd4045f39b0d97cf4 btrfs: allow setting NOCOW for a zero sized file via ioctl In short, the nodatacow option won't be honored if any checksums have been assigned to any extents of a file. Regards, Wade Thanks! Alex. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: problem replacing failing drive
To: linux-btrfs@vger.kernel.org From: samtyg...@yahoo.co.uk Subject: Re: problem replacing failing drive Date: Thu, 25 Oct 2012 22:02:23 +0100 On 22/10/12 10:07, sam tygier wrote: hi, I have a 2 drive btrfs raid set up. It was created first with a single drive, and then adding a second and doing btrfs fi balance start -dconvert=raid1 /data the original drive is showing smart errors so i want to replace it. i dont easily have space in my desktop for an extra disk, so i decided to proceed by shutting down. taking out the old failing drive and putting in the new drive. this is similar to the description at https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_Failed_Devices (the other reason to try this is to simulate what would happen if a drive did completely fail). so after swapping the drives and rebooting, i try to mount as degraded. i instantly get a kernel panic, http://www.hep.man.ac.uk/u/sam/pub/IMG_5397_crop.png so far all this has been with 3.5 kernel. so i upgraded to 3.6.2 and tried to mount degraded again. first with just sudo mount /dev/sdd2 /mnt, then with sudo mount -o degraded /dev/sdd2 /mnt [ 582.535689] device label bdata devid 1 transid 25342 /dev/sdd2 [ 582.536196] btrfs: disk space caching is enabled [ 582.536602] btrfs: failed to read the system array on sdd2 [ 582.536860] btrfs: open_ctree failed [ 606.784176] device label bdata devid 1 transid 25342 /dev/sdd2 [ 606.784647] btrfs: allowing degraded mounts [ 606.784650] btrfs: disk space caching is enabled [ 606.785131] btrfs: failed to read chunk root on sdd2 [ 606.785331] btrfs warning page private not zero on page 392922368 [ 606.785408] btrfs: open_ctree failed [ 782.422959] device label bdata devid 1 transid 25342 /dev/sdd2 no panic is good progress, but something is still not right. my options would seem to be 1) reconnect old drive (probably in a USB caddy), see if it mounts as if nothing ever happened. or possibly try and recover it back to a working raid1. then try again with adding the new drive first, then removing the old one. 2) give up experimenting and create a new btrfs raid1, and restore from backup both leave me with a worry about what would happen if a disk in a raid 1 did die. (unless is was the panic that did some damage that borked the filesystem.) Some more details. If i reconnect the failing drive then I can mount the filesystem with no errors, a quick glance suggests that the data is all there. Label: 'bdata' uuid: 1f07081c-316b-48be-af73-49e6f76535cc Total devices 2 FS bytes used 2.50TB devid 2 size 2.73TB used 2.73TB path /dev/sde1 -- this is the drive that i wish to remove devid 1 size 2.73TB used 2.73TB path /dev/sdd2 sudo btrfs filesystem df /mnt Data, RAID1: total=2.62TB, used=2.50TB System, DUP: total=40.00MB, used=396.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=112.00GB, used=3.84GB Metadata: total=8.00MB, used=0.00 is the failure to mount when i remove sde due to it being dup, rather than raid1? Yes, I would say so. Try a btrfs balance start -mconvert=raid1 /mnt so all metadata is on each drive. is adding a second drive to a btrfs filesystem and running btrfs fi balance start -dconvert=raid1 /mnt not sufficient to create an array that can survive the loss of a disk? do i need -mconvert as well? is there an -sconvert for system? thanks Sam -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raw partition or LV for btrfs?
I'm currently running a 1GB raid1 btrfs /boot with no problems. Also, I think the current grub2 has lzo support. -Original Message- From: Fajar A. Nugraha Sent: Sunday, August 12, 2012 5:48 PM To: Daniel Pocock Cc: linux-btrfs@vger.kernel.org Subject: Re: raw partition or LV for btrfs? On Sun, Aug 12, 2012 at 11:46 PM, Daniel Pocock dan...@pocock.com.au wrote: I notice this question on the wiki/faq: https://btrfs.wiki.kernel.org/index.php/UseCases#What_is_best_practice_when_partitioning_a_device_that_holds_one_or_more_btr-filesystems and as it hasn't been answered, can anyone make any comments on the subject Various things come to mind: a) partition the disk, create an LVM partition, and create lots of small LVs, format each as btrfs b) partition the disk, create an LVM partition, and create one big LV, format as btrfs, make subvolumes c) what about using btrfs RAID1? Does either approach (a) or (b) seem better for someone who wants the RAID1 feature? IMHO when the qgroup feature is stable (i.e. adopted by distros, or at least in stable kernel) then simply creating one big partition (and letting btrfs handle RAID1, if you use it) is better. When 3.6 is out, perhaps? Until then I'd use LVM. d) what about booting from a btrfs system? Is it recommended to follow the ages-old practice of keeping a real partition of 128-500MB, formatting it as btrfs, even if all other data is in subvolumes as per (b)? You can have one single partition only and boot directly from that. However btrfs has the same problems as zfs in this regard: - grub can read both, but can't write to either. In other words, no support for grubenv - the best compression method (gzip for zfs, lzo for btrfs) is not supported by grub For the first problem, an easy workaroud is just to disable the grub configuration that uses grubenv. Easy enough, and no major functionality loss. The second one is harder for btrfs. zfs allows you to have separate dataset (i.e. subvolume, in btfs terms) with different compression, so you can have a dedicated dataset for /boot with different compression setting from the rest of the dataset. With btrfs you're currently stuck with using the same compression setting for everything, so if you love lzo this might be a major setback. There's also a btrfs-specific problem: it's hard to have a system which have /boot on a separate subvol while managing it with current automatic tools (e.g. update-grub). Due to second and third problem, I'd recommend you just use a separate partition with ext2/4 for now. -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nocow flags
I set the C (NOCOW) and z (Not_Compressed) flags on a folder but the extent counts of files contained there keep increasing. Said files are large and frequently modified but not changing in size. This does not happen when the filesystem is mounted with nodatacow. I'm using this as a workaround since subvolumes can't be mounted with different options simultaneously. ie. one with COW, one with nodatacow Any ideas why the flags are being ignored? I'm running 32bit 3.3rc4 with noatime,nodatasum,space_cache,autodefrag,inode_cache on a 3 disk RAID0 data RAID1 metadata filesystem. Thanks, Kyle -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Set nodatacow per file?
Actually it is possible. Check out David's response to my question from some time ago: http://permalink.gmane.org/gmane.comp.file-systems.btrfs/14227 this was a quick aid, please see attached file for an updated tool to set the file flags, now added 'z' for NOCOMPRESS flag, and supports chattr syntax plus all of the standard file flags. Setting and unsetting nocow is done like 'fileflags +C file' or -C for unseting. Without any + or - options it prints current state. I get the following errors when running fileflags on large (2GB) database files: open(): No such file or directory open(): Value too large for defined data type -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: btrfs-raid questions I couldn't find an answer to on the wiki
I've been having good luck with my /boot on a separate 1GB RAID1 btrfs filesystem using grub2 (2 disks only! I wouldn't try it with 3). I should note, however, that I'm NOT using compression on this volume because if I remember correctly it may not play well with grub (maybe that was just lzo though) and I'm also not using subvolumes either for the same reason. Thanks! I'm on grub2 as well. It's is still masked on gentoo, but I recently unmasked and upgraded to it, taking advantage of the fact that I have two two-spindle md/raid-1s for /boot and its backup to test and upgrade one of them first, then the other only when I was satisfied with the results on the first set. I'll be using a similar strategy for the btrfs upgrades, only most of my md/raid-1s are 4-spindle, with two sets, working and backup, and I'll upgrade one set first. I'm going to keep /boot a pair of two-spindle raid-1s, but intend to make them btrfs-raid1s instead of md/raid-1s, and will upgrade one two-spindle set at a time. More on the status of grub2 btrfs-compression support based on my research. There is support for btrfs/gzip-compression in at least grub trunk. AFAIK, it's gzip-compression in grub-1.99-release and lzo-compression in trunk only, but I may be misremembering and it's gzip in trunk only and only uncompressed in grub-1.99-release. I believe you are correct that btrfs zlib support is included in grub2 version 1.99 and lzo is in trunk. I'll try compressing the files on /boot for one installed kernel with the defrag -czlib option and see how it goes. Result: Seemed to work just fine. In any event, since I'm running 128 MB /boot md/raid-1s without compression now, and intend to increase the size to at least a quarter gig to better align the following partitions, /boot is the one set of btrfs partitions I do NOT intend to enable compression on, so that won't be an issue for me here. And since for /boot I'm running a pair of two-spindle raid1s instead of my usual quad-spindle raid1s, you've confirmed that works as well. =:^) As a side note, since I only recently did the grub2 upgrade, I've been enjoying its ability to load and read md/raid and my current reiserfs directly, thus giving me the ability to look up info in at least text- based main system config and notes files directly from grub2, without booting into Linux, if for some reason the above-grub boot is hosed or inconvenient at that moment. I just realized that if I want to maintain that direct-from-grub access, I'll need to ensure that the grub2 I'm running groks the btrfs compression scheme I'm using on any filesystem I want grub2 to be able to read. Hmm... that brings up another question: You mention a 1-gig btrfs-raid1 / boot, but do NOT mention whether you installed it before or after mixed- chunk (data/metadata) support made it into btrfs and became the default for = 1 gig filesystems. I don't think I specifically enabled mixed chunk support when I created this filesystem. It was done on a 2.6 kernel sometime in the middle of 2011 iirc. Can you confirm one way or the other whether you're running mixed-chunk on that 1-gig? I'm not sure whether grub2's btrfs module groks mixed- chunk or not, or whether that even matters to it. Also, could you confirm mbr-bios vs gpt-bios vs uefi-gpt partitions? I'm using gpt-bios partitioning here, with the special gpt-bios-reserved partition, so grub2-install can build the modules necessary for /boot access directly into its core-image and install that in the gpt-bios- reserved partition. It occurs to me that either uefi-gpt or gpt-bios with the appropriate reserved partition won't have quite the same issues with grub2 reading a btrfs /boot that either mbr-bios or gpt-bios without a reserved bios partition would. If you're running gpt-bios with a reserved bios partition, that confirms yet another aspect of your setup, compared to mine. If you're running uefi-gpt, not so much as at least in theory, that's best-case. If you're running either mbr-bios or gpt-bios without a reserved bios partition, that's a worst-case, so if it works, then the others should definitely work. Same here, gpt-bios, 1MB partition with bios_grub flag set (gdisk code EF02) for grub to reside on. Meanwhile, you're right about subvolumes. I'd not try them on a btrfs /boot, either. (I don't really see the use case for it, for a separate /boot, tho there's certainly a case for a /boot subvolume on a btrfs root, for people doing that.) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: fstab mount options ignored on subsequent subvolume mounts
I have multiple subvolumes on the same filesystem that are mounted with different options in fstab. The problem is the mount options for subsequent subvolume mounts seem to be ignored as reflected in /proc/mounts. The output of 'mount' and /proc/mounts is different. mount takes it from /etc/mtab while /proc/mounts gets the information from kernel (calls into super.c:btrfs_show_options() ) 'mtab': - contains the options in order in which they were given to mount or in /etc/fstab - /proc/mounts: - order of options is fixed (as defined in the function) - if the option has a default value which was not given to mount, it is listed here (and is not in mtab) - an implied options appear here as well (like nodatacow implies nodatasum) Now, you're giving different set of options to each subvolume, but they belong to one filesystem and thus will result in set of options given to the first mounted subvolume for every other mounted subvolume. The first subvol calls 'btrfs_fill_super' and 'btrfs_parse_options', the other do not and do not. Remount will call 'btrfs_parse_options' again and will change the options set. $ cat /etc/fstab | grep mnt UUID=REMOVED /mnt/a btrfs subvol=a,defaults,nodatacow,autodefrag,noatime,space_cache,inode_cache 0 0 UUID=REMOVED /mnt/b btrfs subvol=b,defaults,autodefrag,noatime,space_cache,inode_cache 0 0 UUID=REMOVED /mnt/c btrfs subvol=c,defaults,compress=zlib,autodefrag,noatime,space_cache,inode_cache 0 0 $ mount | grep mnt /dev/sdb2 on /mnt/a type btrfs (rw,noatime,subvol=a,nodatacow,autodefrag,space_cache,inode_cache) /dev/sdb2 on /mnt/b type btrfs (rw,noatime,subvol=b,autodefrag,space_cache,inode_cache) /dev/sdb2 on /mnt/c type btrfs (rw,noatime,subvol=c,compress=zlib,autodefrag,space_cache,inode_cache) $ cat /proc/mounts | grep mnt /dev/sdb2 /mnt/a btrfs rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0 /dev/sdb2 /mnt/b btrfs rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0 /dev/sdb2 /mnt/c btrfs rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0 continuing the example which should only change the mount options for one of the subvolumes: $ sudo mount -o remount,compress=zlib /mnt/oldhome $ cat /proc/mounts | grep mnt /dev/sdb2 /mnt/a btrfs rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache 0 0 /dev/sdb2 /mnt/b btrfs rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache 0 0 /dev/sdb2 /mnt/c btrfs rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache 0 0 I think the above explains things in general in your listings, the last one missing is subvol= in /proc/mounts. This is not implemented, but is possible (save non-default subvol name with the subvol root and print in show_options). david Thanks for the clarification. I was under the impression that mounting multiple subvolumes with different options had been implemented. Perhaps someday it will be although for now there are more pressing issues. I appreciate everyone's hard work and look forward to the continued development of btrfs. many thanks, Kyle -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
fstab mount options ignored on subsequent subvolume mounts
Greeting all, I have multiple subvolumes on the same filesystem that are mounted with different options in fstab. The problem is the mount options for subsequent subvolume mounts seem to be ignored as reflected in /proc/mounts. $ cat /etc/fstab | grep mnt UUID=REMOVED /mnt/a btrfs subvol=a,defaults,nodatacow,autodefrag,noatime,space_cache,inode_cache 0 0 UUID=REMOVED /mnt/b btrfs subvol=b,defaults,autodefrag,noatime,space_cache,inode_cache 0 0 UUID=REMOVED /mnt/c btrfs subvol=c,defaults,compress=zlib,autodefrag,noatime,space_cache,inode_cache 0 0 $ mount | grep mnt /dev/sdb2 on /mnt/a type btrfs (rw,noatime,subvol=a,nodatacow,autodefrag,space_cache,inode_cache) /dev/sdb2 on /mnt/b type btrfs (rw,noatime,subvol=b,autodefrag,space_cache,inode_cache) /dev/sdb2 on /mnt/c type btrfs (rw,noatime,subvol=c,compress=zlib,autodefrag,space_cache,inode_cache) $ cat /proc/mounts | grep mnt /dev/sdb2 /mnt/a btrfs rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0 /dev/sdb2 /mnt/b btrfs rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0 /dev/sdb2 /mnt/c btrfs rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0 continuing the example which should only change the mount options for one of the subvolumes: $ sudo mount -o remount,compress=zlib /mnt/oldhome $ cat /proc/mounts | grep mnt /dev/sdb2 /mnt/a btrfs rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache 0 0 /dev/sdb2 /mnt/b btrfs rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache 0 0 /dev/sdb2 /mnt/c btrfs rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache 0 0 Running Ubuntu mainline kernel 3.2.1 (3.2.1-030201-generic #201201121644 SMP Thu Jan 12 21:53:24 UTC 2012 i686 athlon i386 GNU/Linux) with most recent btrfs-progs (2011-12-01) from linux/kernel/git/mason/btrfs-progs.git Thanks, Kyle -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-progs compile warnings on x86
When compiling btrfs-progs (2011-12-01) from git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git on 3.2.1-030201-generic #201201121644 SMP Thu Jan 12 21:53:24 UTC 2012 i686 athlon i386 GNU/Linux I get the following warnings: ls btrfs_cmds.c btrfs_cmds.c gcc -Wp,-MMD,./.btrfs_cmds.o.d,-MT,btrfs_cmds.o -Wall -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -g -O0 -c btrfs_cmds.c btrfs_cmds.c: In function â__ino_to_path_fdâ: btrfs_cmds.c:1138:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] btrfs_cmds.c: In function âdo_logical_to_inoâ: btrfs_cmds.c:1242:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] Thanks, Kyle -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html