Re: [PATCH v2] common: get fs type again using device canonical name in _fs_type
On Fri, Aug 01, 2014 at 02:49:10PM +1000, Dave Chinner wrote: > On Fri, Aug 01, 2014 at 12:02:41PM +0800, Eryu Guan wrote: > > On Fri, Aug 01, 2014 at 10:21:59AM +1000, Dave Chinner wrote: > > > On Thu, Jul 31, 2014 at 06:52:37PM +0800, Eryu Guan wrote: > > > > When testing with lvm, a previous btrfsck run could change df output > > > > from something like > > > > > > > > /dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs 15728640 900 13602172 > > > > 1% /mnt/btrfs > > > > > > > > to > > > > > > > > /dev/dm-3 btrfs 15728640 900 13602172 1% /mnt/btrfs > > > > > > I don't follow you. Why would running btrfsck change the name of the > > > device? If the filesystem is umounted and mounted again, then the > > > device could change, but btrfsck should not be not doing the > > > unmount/mount, and so unless the TEST_DEV/SCRATCH_DEV is changing > > > the output of df should be identical... > > > > > > So before we change the _fs_type() code, can you explain exactly > > > how, when and why the device name is changing to me? > > > > Assume that we have two btrfs filesystems, kernel is 3.16.0-rc4+ > > > > [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show > > Label: none uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2 > > Total devices 1 FS bytes used 384.00KiB > > devid1 size 15.00GiB used 2.04GiB path > > /dev/mapper/rhel_hp--dl388eg8--01-testlv1 > > > > Label: none uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37 > > Total devices 2 FS bytes used 112.00KiB > > devid1 size 15.00GiB used 2.03GiB path > > /dev/mapper/rhel_hp--dl388eg8--01-testlv2 > > devid2 size 15.00GiB used 2.01GiB path > > /dev/mapper/rhel_hp--dl388eg8--01-testlv3 > > > > Btrfs v3.14.2 > > > > And testlv1 was mounted at /mnt/btrfs > > > > [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs > > FilesystemType 1024-blocks Used Available > > Capacity Mounted on > > /dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs15728640 512 13602560 > > 1% /mnt/btrfs > > > > Now run btrfsck on testlv2, btrfsck will scan all btrfs devices and > > somehow change the device name. > > > > [root@hp-dl388eg8-01 btrfs-progs]# btrfsck > > /dev/mapper/rhel_hp--dl388eg8--01-testlv2 >/dev/null 2>&1 > > > > # device name changed in df output and btrfs fi show output > > [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs > > Filesystem Type 1024-blocks Used Available Capacity Mounted on > > /dev/dm-3 btrfs15728640 512 13602560 1% /mnt/btrfs > > [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show > > Label: none uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2 > > Total devices 1 FS bytes used 384.00KiB > > devid1 size 15.00GiB used 2.04GiB path /dev/dm-3 > > > > Label: none uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37 > > Total devices 2 FS bytes used 112.00KiB > > devid1 size 15.00GiB used 2.03GiB path > > /dev/mapper/rhel_hp--dl388eg8--01-testlv2 > > devid2 size 15.00GiB used 2.01GiB path > > /dev/mapper/rhel_hp--dl388eg8--01-testlv3 > > > > Btrfs v3.14.2 > > > > This only happens when btrfsck a btrfs with multiple devices, so this > > only affects xfstests run on btrfs with SCRATCH_DEV_POOL set to lvm > > lvs. > > > > Maybe this is a bug of btrfs-progs and we should fix it there? > > Yes, that smells of a btrfs-progs bug. If your /etc/mtab a link to > /proc/mounts? If not, does the contents change when you run btrfsck, > and does the problem go away when you replace /etc/mtab with a link > to /proc/mounts? /etc/mtab is a symlink to /proc/self/mounts, so does /proc/mounts [root@hp-dl388eg8-01 btrfs-progs]# ls -l /etc/mtab lrwxrwxrwx. 1 root root 17 Sep 22 2013 /etc/mtab -> /proc/self/mounts [root@hp-dl388eg8-01 btrfs-progs]# ls -l /proc/mounts lrwxrwxrwx. 1 root root 11 Aug 1 00:59 /proc/mounts -> self/mounts And the device name also changed in /proc/mounts [root@hp-dl388eg8-01 btrfs-progs]# grep btrfs /proc/mounts /dev/dm-3 /mnt/btrfs btrfs rw,seclabel,relatime,space_cache 0 0 Thanks, Eryu > > Cheers, > > Dave. > -- > Dave Chinner > da...@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] common: get fs type again using device canonical name in _fs_type
On Fri, Aug 01, 2014 at 12:02:41PM +0800, Eryu Guan wrote: > On Fri, Aug 01, 2014 at 10:21:59AM +1000, Dave Chinner wrote: > > On Thu, Jul 31, 2014 at 06:52:37PM +0800, Eryu Guan wrote: > > > When testing with lvm, a previous btrfsck run could change df output > > > from something like > > > > > > /dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs 15728640 900 13602172 1% > > > /mnt/btrfs > > > > > > to > > > > > > /dev/dm-3 btrfs 15728640 900 13602172 1% /mnt/btrfs > > > > I don't follow you. Why would running btrfsck change the name of the > > device? If the filesystem is umounted and mounted again, then the > > device could change, but btrfsck should not be not doing the > > unmount/mount, and so unless the TEST_DEV/SCRATCH_DEV is changing > > the output of df should be identical... > > > > So before we change the _fs_type() code, can you explain exactly > > how, when and why the device name is changing to me? > > Assume that we have two btrfs filesystems, kernel is 3.16.0-rc4+ > > [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show > Label: none uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2 > Total devices 1 FS bytes used 384.00KiB > devid1 size 15.00GiB used 2.04GiB path > /dev/mapper/rhel_hp--dl388eg8--01-testlv1 > > Label: none uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37 > Total devices 2 FS bytes used 112.00KiB > devid1 size 15.00GiB used 2.03GiB path > /dev/mapper/rhel_hp--dl388eg8--01-testlv2 > devid2 size 15.00GiB used 2.01GiB path > /dev/mapper/rhel_hp--dl388eg8--01-testlv3 > > Btrfs v3.14.2 > > And testlv1 was mounted at /mnt/btrfs > > [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs > FilesystemType 1024-blocks Used Available > Capacity Mounted on > /dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs15728640 512 13602560 > 1% /mnt/btrfs > > Now run btrfsck on testlv2, btrfsck will scan all btrfs devices and > somehow change the device name. > > [root@hp-dl388eg8-01 btrfs-progs]# btrfsck > /dev/mapper/rhel_hp--dl388eg8--01-testlv2 >/dev/null 2>&1 > > # device name changed in df output and btrfs fi show output > [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs > Filesystem Type 1024-blocks Used Available Capacity Mounted on > /dev/dm-3 btrfs15728640 512 13602560 1% /mnt/btrfs > [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show > Label: none uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2 > Total devices 1 FS bytes used 384.00KiB > devid1 size 15.00GiB used 2.04GiB path /dev/dm-3 > > Label: none uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37 > Total devices 2 FS bytes used 112.00KiB > devid1 size 15.00GiB used 2.03GiB path > /dev/mapper/rhel_hp--dl388eg8--01-testlv2 > devid2 size 15.00GiB used 2.01GiB path > /dev/mapper/rhel_hp--dl388eg8--01-testlv3 > > Btrfs v3.14.2 > > This only happens when btrfsck a btrfs with multiple devices, so this > only affects xfstests run on btrfs with SCRATCH_DEV_POOL set to lvm > lvs. > > Maybe this is a bug of btrfs-progs and we should fix it there? Yes, that smells of a btrfs-progs bug. If your /etc/mtab a link to /proc/mounts? If not, does the contents change when you run btrfsck, and does the problem go away when you replace /etc/mtab with a link to /proc/mounts? Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] common: get fs type again using device canonical name in _fs_type
On Fri, Aug 01, 2014 at 10:21:59AM +1000, Dave Chinner wrote: > On Thu, Jul 31, 2014 at 06:52:37PM +0800, Eryu Guan wrote: > > When testing with lvm, a previous btrfsck run could change df output > > from something like > > > > /dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs 15728640 900 13602172 1% > > /mnt/btrfs > > > > to > > > > /dev/dm-3 btrfs 15728640 900 13602172 1% /mnt/btrfs > > I don't follow you. Why would running btrfsck change the name of the > device? If the filesystem is umounted and mounted again, then the > device could change, but btrfsck should not be not doing the > unmount/mount, and so unless the TEST_DEV/SCRATCH_DEV is changing > the output of df should be identical... > > So before we change the _fs_type() code, can you explain exactly > how, when and why the device name is changing to me? Assume that we have two btrfs filesystems, kernel is 3.16.0-rc4+ [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show Label: none uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2 Total devices 1 FS bytes used 384.00KiB devid1 size 15.00GiB used 2.04GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv1 Label: none uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37 Total devices 2 FS bytes used 112.00KiB devid1 size 15.00GiB used 2.03GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv2 devid2 size 15.00GiB used 2.01GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv3 Btrfs v3.14.2 And testlv1 was mounted at /mnt/btrfs [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs FilesystemType 1024-blocks Used Available Capacity Mounted on /dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs15728640 512 13602560 1% /mnt/btrfs Now run btrfsck on testlv2, btrfsck will scan all btrfs devices and somehow change the device name. [root@hp-dl388eg8-01 btrfs-progs]# btrfsck /dev/mapper/rhel_hp--dl388eg8--01-testlv2 >/dev/null 2>&1 # device name changed in df output and btrfs fi show output [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs Filesystem Type 1024-blocks Used Available Capacity Mounted on /dev/dm-3 btrfs15728640 512 13602560 1% /mnt/btrfs [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show Label: none uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2 Total devices 1 FS bytes used 384.00KiB devid1 size 15.00GiB used 2.04GiB path /dev/dm-3 Label: none uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37 Total devices 2 FS bytes used 112.00KiB devid1 size 15.00GiB used 2.03GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv2 devid2 size 15.00GiB used 2.01GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv3 Btrfs v3.14.2 This only happens when btrfsck a btrfs with multiple devices, so this only affects xfstests run on btrfs with SCRATCH_DEV_POOL set to lvm lvs. Maybe this is a bug of btrfs-progs and we should fix it there? If we decide to take this patch, I'll put more details in commit log in v3. Thanks, Eryu > > Cheers, > > Dave. > -- > Dave Chinner > da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: SSD related mount option dependency rework.
According to Documentations/filesystem/btrfs.txt, ssd/ssd_spread/nossd has their own dependency(See below), but only ssd_spread implying ssd is implemented. ssd_spread implies ssd, conflicts nossd. ssd conflicts nossd. nossd conflicts ssd and ssd_spread. This patch adds ssd{,_spread} confliction with nossd. Signed-off-by: Qu Wenruo --- fs/btrfs/super.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 8e16bca..2508a16 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -515,19 +515,22 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) compress_type); } break; - case Opt_ssd: - btrfs_set_and_info(root, SSD, - "use ssd allocation scheme"); - break; case Opt_ssd_spread: btrfs_set_and_info(root, SSD_SPREAD, "use spread ssd allocation scheme"); + /* suppress the ssd mount option log */ btrfs_set_opt(info->mount_opt, SSD); + /* fall through for other ssd routine */ + case Opt_ssd: + btrfs_set_and_info(root, SSD, + "use ssd allocation scheme"); + btrfs_clear_opt(info->mount_opt, NOSSD); break; case Opt_nossd: btrfs_set_and_info(root, NOSSD, "not using ssd allocation scheme"); btrfs_clear_opt(info->mount_opt, SSD); + btrfs_clear_opt(info->mount_opt, SSD_SPREAD); break; case Opt_barrier: btrfs_clear_and_info(root, NOBARRIER, -- 2.0.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help with btrfs_zero_range function
Hey Guys, I need to ask a question again, I am writing the above function and basing it off the one of punch hole. I have only started writing the function and have a few questions about how to write this. Below this message are my questions so fair and I also posting my written code in case you guys want to give any feed back. Regards and Thanks Again, Nick Questions 1. bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES); How I change this to check for a zero range or do I just remove this variable; 2. ret = find_first_non_hole(inode, &offset, &len); How do I modify the called function for ret to be for a zero range? The other parts of this function are pretty similar to the one for punch holes and seems pretty easy to move other the other parts. Code static long btrfs_zero_range(struct inode *inode, loff_t loffset, loff_t len,){ struct btrfs_root *root = BTRF_I(inode)->root; struct btrfs_path *path; struct btrfs_block_rsv *rsv; struct btrfs_trans_handle *trans; u64 lockstart; u64 lockend; u64 tail_start; u64 tail_len; u64 orig_start = offset; u64 cur_offset; u64 min_size = btrfs_calc_trunc_metadata_size(root, 1); u64 drop_end; int ret = 0; int err = 0; int rsv_count; bool same_page; bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES); u64 ino_size; ret=btrfs_wait_ordered_range(inode, offset, len); if(ret) return ret; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: mkfs: remove experimental tag
(2014/07/31 21:21), David Sterba wrote: > Make it consistent with kernel status and documentation. > > Signed-off-by: David Sterba Reviewed-by: Satoru Takeuchi I'm glad to see this patch :-) Thanks, Satoru > --- > mkfs.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/mkfs.c b/mkfs.c > index 16e92221a547..538b6e6837b2 100644 > --- a/mkfs.c > +++ b/mkfs.c > @@ -1439,8 +1439,8 @@ int main(int ac, char **av) > } > > /* if we are here that means all devs are good to btrfsify */ > - printf("\nWARNING! - %s IS EXPERIMENTAL\n", BTRFS_BUILD_VERSION); > - printf("WARNING! - see http://btrfs.wiki.kernel.org before using\n\n"); > + printf("%s\n", BTRFS_BUILD_VERSION); > + printf("See http://btrfs.wiki.kernel.org for more\n\n"); > > dev_cnt--; > > @@ -1597,7 +1597,6 @@ raid_groups: > label, first_file, nodesize, leafsize, sectorsize, > pretty_size(btrfs_super_total_bytes(root->fs_info->super_copy))); > > - printf("%s\n", BTRFS_BUILD_VERSION); > btrfs_commit_transaction(trans, root); > > if (source_dir_set) { > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs-progs: move test_isdir() to utils.c
From: Satoru Takeuchi Since test_isdir() is a utility function, it's better to move it to utils.c. In addition, "const char *" is more appropriate type as its "path" argument because this argument is not changed in this function. Signed-off-by: Satoru Takeuchi Cc: David Sterba Cc: Mike Fleetwood --- cmds-subvolume.c | 19 --- utils.c | 19 +++ utils.h | 1 + 3 files changed, 20 insertions(+), 19 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index c075fb2..e420bba 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -42,25 +42,6 @@ static const char * const subvolume_cmd_group_usage[] = { NULL }; -/* - * test if path is a directory - * this function return - * 0-> path exists but it is not a directory - * 1-> path exists and it is a directory - * -1 -> path is unaccessible - */ -static int test_isdir(char *path) -{ - struct stat st; - int res; - - res = stat(path, &st); - if(res < 0 ) - return -1; - - return S_ISDIR(st.st_mode); -} - static const char * const cmd_subvol_create_usage[] = { "btrfs subvolume create [-i ] [/]", "Create a subvolume", diff --git a/utils.c b/utils.c index d98aac8..059ed34 100644 --- a/utils.c +++ b/utils.c @@ -2697,3 +2697,22 @@ int test_issubvolname(const char *name) return name[0] != '\0' && !strchr(name, '/') && strcmp(name, ".") && strcmp(name, ".."); } + +/* + * test if path is a directory + * this function return + * 0-> path exists but it is not a directory + * 1-> path exists and it is a directory + * -1 -> path is unaccessible + */ +int test_isdir(const char *path) +{ + struct stat st; + int res; + + res = stat(path, &st); + if(res < 0 ) + return -1; + + return S_ISDIR(st.st_mode); +} diff --git a/utils.h b/utils.h index dad7d41..90fc1fe 100644 --- a/utils.h +++ b/utils.h @@ -134,6 +134,7 @@ int fsid_to_mntpt(__u8 *fsid, char *mntpt, int *mnt_cnt); int test_minimum_size(const char *file, u32 leafsize); int test_issubvolname(const char *name); +int test_isdir(const char *path); /* * Btrfs minimum size calculation is complicated, it should include at least: -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2 v3] btrfs-progs: introduce test_issubvolname() for simplicity
From: Satoru Takeuchi There are many duplicated codes to check if the given string is correct subvolume name. Introduce test_issubvolname() for this purpose for simplicity. Signed-off-by: Satoru Takeuchi Cc: David Sterba Cc: Mike Fleetwood --- changelog: v2: Move test_issubvolname() to utils.c. Change the target branch to integ-20140729. v3: Change the type of the 1st argument from "char *" to "const char *". --- cmds-subvolume.c | 9 +++-- utils.c | 12 utils.h | 1 + 3 files changed, 16 insertions(+), 6 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 8bdc447..c075fb2 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -127,8 +127,7 @@ static int cmd_subvol_create(int argc, char **argv) dupdir = strdup(dst); dstdir = dirname(dupdir); - if (!strcmp(newname, ".") || !strcmp(newname, "..") || -strchr(newname, '/') ){ + if (!test_issubvolname(newname)) { fprintf(stderr, "ERROR: incorrect subvolume name '%s'\n", newname); goto out; @@ -302,8 +301,7 @@ again: vname = basename(dupvname); free(cpath); - if (!strcmp(vname, ".") || !strcmp(vname, "..") || -strchr(vname, '/')) { + if (!test_issubvolname(vname)) { fprintf(stderr, "ERROR: incorrect subvolume name '%s'\n", vname); ret = 1; @@ -711,8 +709,7 @@ static int cmd_snapshot(int argc, char **argv) dstdir = dirname(dupdir); } - if (!strcmp(newname, ".") || !strcmp(newname, "..") || -strchr(newname, '/') ){ + if (!test_issubvolname(newname)) { fprintf(stderr, "ERROR: incorrect snapshot name '%s'\n", newname); goto out; diff --git a/utils.c b/utils.c index d61cbec..d98aac8 100644 --- a/utils.c +++ b/utils.c @@ -2685,3 +2685,15 @@ int btrfs_read_sysfs(char path[PATH_MAX]) close(fd); return atoi((const char *)&val); } + +/* + * test if name is a correct subvolume name + * this function return + * 0-> name is not a correct subvolume name + * 1-> name is a correct subvolume name + */ +int test_issubvolname(const char *name) +{ + return name[0] != '\0' && !strchr(name, '/') && + strcmp(name, ".") && strcmp(name, ".."); +} diff --git a/utils.h b/utils.h index 0c9b65f..dad7d41 100644 --- a/utils.h +++ b/utils.h @@ -133,6 +133,7 @@ int get_fslist(struct btrfs_ioctl_fslist **out_fslist, u64 *out_count); int fsid_to_mntpt(__u8 *fsid, char *mntpt, int *mnt_cnt); int test_minimum_size(const char *file, u32 leafsize); +int test_issubvolname(const char *name); /* * Btrfs minimum size calculation is complicated, it should include at least: -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implement new FALLOC_FL_* modes
On Thu, Jul 31, 2014 at 3:12 PM, Hugo Mills wrote: > On Thu, Jul 31, 2014 at 02:08:15PM -0400, Nick Krause wrote: >> I am doing this project from the btrfs wiki, since I am new after >> reading the code using lxr I am wondering if >> we can base the code off that already in ext4 for these modes as they >> seem to work rather well. I am wondering >> through as a newbie some of the data structures are ext4 based and the >> same goes for some of the functions. >> I am wondering what the equivalent structures and functions are in >> btrfs as I can't seem to find them after reading >> the code as a newbie for the last few hours in lxr. Maybe I just >> missing something? > >The fundamental on-disk structures for btrfs and ext4 are totally > different. You will get very confused if you expect the ext4 code to > work in the btrfs module, or even if you expect structures to be > similar. > >But first -- answer my questions in the reply I made to your patch > just now. Do nothing else until you can answer all three of those > questions sensibly. > >Hugo. > > -- > === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === > PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk > --- Great films about cricket: 200/1: A Pace Odyssey --- I believed I answered your questions , seems I need to fucking test my patches. And about the reply about data structures , thanks seems the other fallocate functions are there and I need to read these first , I sent out my patch by accident as I didn't realize I have removed the error return by mistake. Sorry Nick -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
dptrash posted on Thu, 31 Jul 2014 17:35:44 +0200 as excerpted: > Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, > does this "bug" still exists? > > Kernel 3.14 B: 2x HDD 1 TB C: 1x SSD 256 GB > > # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc > --cache_replacement_policy=lru > # mkfs.btrfs -d raid1 -m raid1 -L "BTRFS_RAID" /dev/bcache0 /dev/bcache1 > > I still have no "incomplete page write" messages in "dmesg | grep btrfs" > and the checksums of some manually reviewed files are okay. > > Who has more experiences about this? See the reply (not mine) to your earlier post of the question: http://permalink.gmane.org/gmane.linux.kernel.bcache.devel/2602 -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes
On Thu, Jul 31, 2014 at 3:09 PM, Hugo Mills wrote: > On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote: >> This adds checks for the stated modes as if they are crap we will return >> error >> not supported. > >You've just enabled two options, but you haven't actually > implemented the code behind it. I would tell you *NOT* to do anything > else on this work until you can answer the question: What happens if > you apply this patch, create a large file called "foo.txt", and then a > userspace program executes the following code? > > int fd = open("foo.txt", O_RDWR); > fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50); > >Try it on a btrfs filesystem, both with and without your patch. > Also try it on an ext4 filesystem. > >Once you've done all of that, reply to this mail and tell me what > the problem is with this patch. You need to make two answers: what are > the technical problems with the patch? What errors have you made in > the development process? > >*Only* if you can answer those questions sensibly, should you write > any more patches, of any kind. > >Hugo. > >> Signed-off-by: Nicholas Krause >> --- >> fs/btrfs/file.c |3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >> index 1f2b99c..599495a 100644 >> --- a/fs/btrfs/file.c >> +++ b/fs/btrfs/file.c >> @@ -2490,7 +2490,8 @@ static long btrfs_fallocate(struct file *file, int >> mode, >> alloc_end = round_up(offset + len, blocksize); >> >> /* Make sure we aren't being give some crap mode */ >> - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) >> + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE| >> + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE)) >> return -EOPNOTSUPP; >> >> if (mode & FALLOC_FL_PUNCH_HOLE) >> -- >> 1.7.10.4 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === > PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk > --- The glass is neither half-full nor half-empty; it is twice as --- > large as it needs to be. Calls are there in btrfs , therefore will either kernel panic or cause an oops. Need to test this patch as this is very easy to catch bug. Nick -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs offline deduplication
Good time of day. I have several questions about data deduplication on btrfs. Sorry if i ask stupid questions or waste you time %) What about implementation of offline data deduplication? I don't see any activity on this place, may be i need to ask a particular person? Where the problem? May be a can i try to help (testing as example)? I could be wrong, but as i understand btrfs store crc32 checksum one per file, if this is true, may be make a sense to create small worker for dedup files? Like worker for autodefrag? With simple logic like: if sum1 == sum2 && file_size1 == file_size2; then if (bit_to_bit_identical(file1,2)); then merge(file1, file2); This can be first attempt to implement per file offline dedup What you think about it? could i be wrong? or this is a horrible crutch? (as i understand it not change format of fs) (bedup and other tools, its cool, but have several problem with these tools and i think, what kernel implementation can work better). -- Best regards, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xfstests: add regression test for btrfs send with orphans
Regression test for a btrfs issue where we create a RO snapshot to use for a send operation, which fails with a -ESTALE error, due to the presence of orphan inodes accessible through the snapshot's commit root but no longer present through the main root. This issue is fixed by the following linux kernel btrfs patch: Btrfs: update commit root on snapshot creation after orphan cleanup Signed-off-by: Filipe Manana --- tests/btrfs/057 | 81 + tests/btrfs/057.out | 1 + tests/btrfs/group | 1 + 3 files changed, 83 insertions(+) create mode 100755 tests/btrfs/057 create mode 100644 tests/btrfs/057.out diff --git a/tests/btrfs/057 b/tests/btrfs/057 new file mode 100755 index 000..2174077 --- /dev/null +++ b/tests/btrfs/057 @@ -0,0 +1,81 @@ +#! /bin/bash +# FS QA Test No. btrfs/057 +# +# Regression test for a btrfs issue where we create a RO snapshot to use for +# a send operation which fails with a -ESTALE error, due to the presence of +# orphan inodes accessible through the snapshot's commit root but no longer +# present through the main root. +# +# This issue is fixed by the following linux kernel btrfs patch: +# +#Btrfs: update commit root on snapshot creation after orphan cleanup +# +#--- +# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + if [ ! -z $XFS_IO_PID ]; then + kill $XFS_IO_PID > /dev/null 2>&1 + fi + rm -fr $tmp +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +# Requiring flink command tests for the presence of the -T option used +# to pass O_TMPFILE to open(2). +_require_xfs_io_command "flink" +_need_to_be_root + +rm -f $seqres.full + +_scratch_mkfs >/dev/null 2>&1 +_scratch_mount + +# Create a tmpfile file, write some data to it and leave it open, so that our +# main subvolume has an orphan inode item. +$XFS_IO_PROG -T $SCRATCH_MNT >$seqres.full 2>&1 < <( + echo "pwrite 0 65536" + read +) & +XFS_IO_PID=$! + +# With the tmpfile open, create a RO snapshot and use it for a send operation. +# The send operation used to fail with -ESTALE due to the presence of the +# orphan inode. +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap +_run_btrfs_util_prog send $SCRATCH_MNT/mysnap -f /dev/null + +status=0 +exit diff --git a/tests/btrfs/057.out b/tests/btrfs/057.out new file mode 100644 index 000..b26eefe --- /dev/null +++ b/tests/btrfs/057.out @@ -0,0 +1 @@ +QA output created by 057 diff --git a/tests/btrfs/group b/tests/btrfs/group index 2da7127..ebc38c5 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -59,3 +59,4 @@ 054 auto quick 055 auto quick 056 auto quick +057 auto quick -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: ensure tmpfile inode is always persisted with link count of 0
If we open a file with O_TMPFILE, don't do any further operation on it (so that the inode item isn't updated) and then force a transaction commit, we get a persisted inode item with a link count of 1, and not 0 as it should be. Steps to reproduce it (requires a modern xfs_io with -T support): $ mkfs.btrfs -f /dev/sdd $ mount -o /dev/sdd /mnt $ xfs_io -T /mnt & $ sync Then btrfs-debug-tree shows the inode item with a link count of 1: $ btrfs-debug-tree /dev/sdd (...) fs tree key (FS_TREE ROOT_ITEM 0) leaf 29556736 items 4 free space 15851 generation 6 owner 5 fs uuid f164d01b-1b92-481d-a4e4-435fb0f843d0 chunk uuid 0e3d0e56-bcca-4a1c-aa5f-cec2c6f4f7a6 item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 inode generation 3 transid 6 size 0 block group 0 mode 40755 links 1 item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 inode ref index 0 namelen 2 name: .. item 2 key (257 INODE_ITEM 0) itemoff 15951 itemsize 160 inode generation 6 transid 6 size 0 block group 0 mode 100600 links 1 item 3 key (ORPHAN ORPHAN_ITEM 257) itemoff 15951 itemsize 0 orphan item checksum tree key (CSUM_TREE ROOT_ITEM 0) (...) Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4f35c6c..8ad3ea9 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5688,6 +5688,13 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, } /* +* O_TMPFILE, set link count to 0, so that after this point, +* we fill in an inode item with the correct link count. +*/ + if (!name) + set_nlink(inode, 0); + + /* * we have to initialize this early, so we can reclaim the inode * number if we fail afterwards in this function. */ @@ -9133,6 +9140,14 @@ static int btrfs_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode) if (ret) goto out; + /* +* We set number of links to 0 in btrfs_new_inode(), and here we set +* it to 1 because d_tmpfile() will issue a warning if the count is 0, +* through: +* +*d_tmpfile() -> inode_dec_link_count() -> drop_nlink() +*/ + set_nlink(inode, 1); d_tmpfile(dentry, inode); mark_inode_dirty(inode); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Implement new FALLOC_FL_* modes
On Thu, Jul 31, 2014 at 02:08:15PM -0400, Nick Krause wrote: > I am doing this project from the btrfs wiki, since I am new after > reading the code using lxr I am wondering if > we can base the code off that already in ext4 for these modes as they > seem to work rather well. I am wondering > through as a newbie some of the data structures are ext4 based and the > same goes for some of the functions. > I am wondering what the equivalent structures and functions are in > btrfs as I can't seem to find them after reading > the code as a newbie for the last few hours in lxr. Maybe I just > missing something? The fundamental on-disk structures for btrfs and ext4 are totally different. You will get very confused if you expect the ext4 code to work in the btrfs module, or even if you expect structures to be similar. But first -- answer my questions in the reply I made to your patch just now. Do nothing else until you can answer all three of those questions sensibly. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great films about cricket: 200/1: A Pace Odyssey --- signature.asc Description: Digital signature
Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes
On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote: > This adds checks for the stated modes as if they are crap we will return error > not supported. You've just enabled two options, but you haven't actually implemented the code behind it. I would tell you *NOT* to do anything else on this work until you can answer the question: What happens if you apply this patch, create a large file called "foo.txt", and then a userspace program executes the following code? int fd = open("foo.txt", O_RDWR); fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50); Try it on a btrfs filesystem, both with and without your patch. Also try it on an ext4 filesystem. Once you've done all of that, reply to this mail and tell me what the problem is with this patch. You need to make two answers: what are the technical problems with the patch? What errors have you made in the development process? *Only* if you can answer those questions sensibly, should you write any more patches, of any kind. Hugo. > Signed-off-by: Nicholas Krause > --- > fs/btrfs/file.c |3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 1f2b99c..599495a 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -2490,7 +2490,8 @@ static long btrfs_fallocate(struct file *file, int mode, > alloc_end = round_up(offset + len, blocksize); > > /* Make sure we aren't being give some crap mode */ > - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) > + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE| > + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE)) > return -EOPNOTSUPP; > > if (mode & FALLOC_FL_PUNCH_HOLE) > -- > 1.7.10.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The glass is neither half-full nor half-empty; it is twice as --- large as it needs to be. signature.asc Description: Digital signature
Implement new FALLOC_FL_* modes
I am doing this project from the btrfs wiki, since I am new after reading the code using lxr I am wondering if we can base the code off that already in ext4 for these modes as they seem to work rather well. I am wondering through as a newbie some of the data structures are ext4 based and the same goes for some of the functions. I am wondering what the equivalent structures and functions are in btrfs as I can't seem to find them after reading the code as a newbie for the last few hours in lxr. Maybe I just missing something? Regards Nick -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes
This adds checks for the stated modes as if they are crap we will return error not supported. Signed-off-by: Nicholas Krause --- fs/btrfs/file.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 1f2b99c..599495a 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2490,7 +2490,8 @@ static long btrfs_fallocate(struct file *file, int mode, alloc_end = round_up(offset + len, blocksize); /* Make sure we aren't being give some crap mode */ - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE| + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE)) return -EOPNOTSUPP; if (mode & FALLOC_FL_PUNCH_HOLE) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove certain calls for releasing page cache
On Thu, Jul 31, 2014 at 6:11 AM, Hugo Mills wrote: > On Wed, Jul 30, 2014 at 10:05:16PM -0400, Nick Krause wrote: >> On Wed, Jul 30, 2014 at 7:30 PM, Dave Airlie wrote: >> >> This patch removes the lines for releasing the page cache in certain >> >> files as this may aid in perfomance with writes in the compression >> >> rountines of btrfs. Please note that this patch has not been tested >> >> on my own hardware due to no compression based btrfs volumes of my >> >> own. >> >> >> > >> > For all that is sacred, STOP. > [snip] >> > But if you want to work on the kernel, this isn't the way to do it, and >> > nobody will ever take a patch from you seriously if you continue in this >> > fashion. >> > >> > Dave. >> Dave , >> Seems I need to have tested this code first. > >You've said this before, having made exactly the same error (not > testing a patch). Yet you do it again. You seem to be ignoring all the > advice you've been given -- or at least not learning from it, and not > learning from your experiences. Could you please, for half an hour or > so, stop thinking about the immediate goal of getting a patch into the > kernel, and take a short while to think about your process of > learning. Look at all the advice you've had (from me, from Ted, from > others), actually understand it, and consider all the things you need > to do which *aren't* hacking up a lump of C. Actually learn these > things -- have them in your mind all the time. > >I would appreciate it if you could actually engage with someone > (doesn't have to be me) about this -- why are you ignoring the advice? > Is it because you don't understand it? Is it because you think you can > cut corners? Is it because you're concetrating on the code so much that > you're forgetting it? > >The main thing you're doing which is making people angry is not > because you're submitting bad patches (although you are). It's because > you're not listening to advice, and you're not apparently learning > anything from the feedback you're given. Your behaviour is not > changing over time, which makes you look like a waste of time to all > those people trying to help you. > >Hugo. > > -- > === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === > PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk > --- That's not rain, that's a lake with slots in it --- Hugo, That makes sense seems I need to listen better and test my code first, which is bad in any environment. Regards Nick -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs on bcache
Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does this "bug" still exists? Kernel 3.14 B: 2x HDD 1 TB C: 1x SSD 256 GB # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru # mkfs.btrfs -d raid1 -m raid1 -L "BTRFS_RAID" /dev/bcache0 /dev/bcache1 I still have no "incomplete page write" messages in "dmesg | grep btrfs" and the checksums of some manually reviewed files are okay. Who has more experiences about this? Thanks, - dp -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Machine lockup due to btrfs-transaction on AWS EC2 Ubuntu 14.04
I should add that I have reproduced this even after doing `mount -o clear_cache /dev/... /mnt/...`, unmount, remount with `-o space_cache`. After the machine lockup and rebooting there are the warnings of the form: > [ 117.288248] BTRFS warning (device dm-0): block group 694165700608 has wrong > amount of free space > [ 117.288253] BTRFS warning (device dm-0): failed to load free space cache > for block group 694165700608, rebuild it now -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Machine lockup due to btrfs-transaction on AWS EC2 Ubuntu 14.04
I've now reproduced this on 3.15.7-031507-generic and 3.16.0-031600rc7-generic, and have a test case where I can reliably cause the crash after about 30 seconds of disk activity. The test case just involves taking a directory tree of ~400GB of files and copying every file to a new one with .new on the end in the same directory of the original. The soft-lockup makes a kernel thread fully occupy a CPU. On the advice of kdave from IRC I took a stack trace every 10s whilst the machine remained responsive, which was for 5-10 minutes before the machine became unresponsive to the network. The lockup seems to be reliably stuck in btrfs_find_space_cluster. kernel log with 10s traces is linked to below (3.16.0-031600rc7-generic) https://gist.github.com/pwaller/d40df3badb5cc8a574ef On 30 July 2014 11:02, Peter Waller wrote: > The crashes became more frequent. The time scale before lockup went > ~12 days, ~7 days, ~2 days, ~6 hours, ~1 hour. > > Then we upgraded to 3.15.7-031507-generic on the advice of > #ubuntu-kernel and #btrfs on IRC, and it has since been stable for > 19.5 hours. > > I dug around more in our logs and realised that a previous machine > lock was probably due to the same problem, so I have an additional > stack trace, reproduced below. > > Thanks to those who chipped in on IRC. > > - Peter > >> [1093202.136107] INFO: task kworker/u30:1:31455 blocked for more than 120 >> seconds. >> [1093202.141596] Tainted: GF3.13.0-30-generic #54-Ubuntu >> [1093202.146201] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [1093202.152408] kworker/u30:1 D 8801efc34440 0 31455 2 >> 0x >> [1093202.152416] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1) >> [1093202.152419] 880040d3b768 0002 8800880f >> 880040d3bfd8 >> [1093202.152422] 00014440 00014440 8800880f >> 8801efc34cd8 >> [1093202.152426] 8801effcefe8 880040d3b7f0 0002 >> 8114e010 >> [1093202.152429] Call Trace: >> [1093202.152435] [] ? wait_on_page_read+0x60/0x60 >> [1093202.152439] [] io_schedule+0x9d/0x140 >> [1093202.152442] [] sleep_on_page+0xe/0x20 >> [1093202.152445] [] __wait_on_bit_lock+0x48/0xb0 >> [1093202.152449] [] ? vtime_common_task_switch+0x24/0x40 >> [1093202.152452] [] __lock_page+0x6a/0x70 >> [1093202.152456] [] ? autoremove_wake_function+0x40/0x40 >> [1093202.152474] [] lock_delalloc_pages+0x13d/0x1d0 >> [btrfs] >> [1093202.152487] [] >> find_lock_delalloc_range.constprop.43+0x14b/0x1f0 [btrfs] >> [1093202.152499] [] __extent_writepage+0x131/0x750 [btrfs] >> [1093202.152509] [] ? end_extent_writepage+0x70/0x70 >> [btrfs] >> [1093202.152521] [] >> extent_write_cache_pages.isra.30.constprop.50+0x272/0x3d0 [btrfs] >> [1093202.152532] [] extent_writepages+0x4d/0x70 [btrfs] >> [1093202.152544] [] ? btrfs_real_readdir+0x5b0/0x5b0 >> [btrfs] >> [1093202.152555] [] btrfs_writepages+0x28/0x30 [btrfs] >> [1093202.152559] [] do_writepages+0x1e/0x40 >> [1093202.152562] [] __writeback_single_inode+0x40/0x210 >> [1093202.152565] [] writeback_sb_inodes+0x247/0x3e0 >> [1093202.152568] [] __writeback_inodes_wb+0x9f/0xd0 >> [1093202.152571] [] wb_writeback+0x243/0x2c0 >> [1093202.152574] [] bdi_writeback_workfn+0x108/0x430 >> [1093202.152577] [] ? finish_task_switch+0x128/0x170 >> [1093202.152581] [] process_one_work+0x182/0x450 >> [1093202.152585] [] worker_thread+0x121/0x410 >> [1093202.152588] [] ? rescuer_thread+0x3e0/0x3e0 >> [1093202.152591] [] kthread+0xd2/0xf0 >> [1093202.152594] [] ? kthread_create_on_node+0x1d0/0x1d0 >> [1093202.152598] [] ret_from_fork+0x7c/0xb0 >> [1093202.152601] [] ? kthread_create_on_node+0x1d0/0x1d0 > > On 29 July 2014 10:20, Peter Waller wrote: >> Someone on IRC suggested that I clear the free cache: >> >>> sudo mount -o remount,clear_cache /path/to/dev /path/to/mount >>> sudo mount -o remount,space_cache /path/to/dev /path/to/mount >> >> >> The former command printed `btrfs: disk space caching is enabled` and >> the latter repeated it, making me think that maybe the latter was >> unnecessary. >> >> On 29 July 2014 09:04, Peter Waller wrote: >>> Hi All, >>> >>> I've reported a bug with Ubuntu here: >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349711 >>> >>> The machine in question has one BTRFS volume which is 87% full and >>> lives on an Logical Volume Manager (LVM) block device on top of one >>> Amazon Elastic Block Store (EBS) device. >>> >>> We have other machines in a similar configuration which have not >>> displayed this behaviour. >>> >>> The one thing which makes this machine different is that it has >>> directories which contain many thousands of files. We don't make heavy >>> use of subvolumes or snapshots. >>> >>> More details follow: >>> >>> # cat /proc/version_signature >>> Ubuntu 3.13.0-32.57-generic 3.13.11.4 >>> >>> The machine had a soft-lockup with messages like this appearing on t
[PATCH v2] Btrfs: race free update of commit root for ro snapshots
This is a better solution for the problem addressed in the following commit: Btrfs: update commit root on snapshot creation after orphan cleanup (3821f348889e506efbd268cc8149e0ebfa47c4e5) The previous solution wasn't the best because of 2 reasons: 1) It added another full transaction commit, which is more expensive than just swapping the commit root with the root; 2) If a reboot happened after the first transaction commit (the one that creates the snapshot) and before the second transaction commit, then we would end up with the same problem if a send using that snapshot was requested before the first transaction commit after the reboot. This change addresses those 2 issues. The second issue is addressed by switching the commit root in the dentry lookup VFS callback, which is also called by the snapshot/subvol creation ioctl and performs orphan cleanup if needed. Like the vfs, the ioctl locks the parent inode too, preventing race issues between a dentry lookup and snapshot creation. Cc: Alex Lyakas Signed-off-by: Filipe Manana --- V2: Updated commit message, as original second issue was not correct. Removed redundant btrfs_orphan_cleanup() call in the snapshot creation ioctl, as it's performed by btrfs_lookup_dentry() which is called by the ioctl. fs/btrfs/inode.c | 36 fs/btrfs/ioctl.c | 33 - 2 files changed, 36 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1d5f0b3..4f35c6c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5227,6 +5227,42 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry) iput(inode); inode = ERR_PTR(ret); } + /* +* If orphan cleanup did remove any orphans, it means the tree +* was modified and therefore the commit root is not the same as +* the current root anymore. This is a problem, because send +* uses the commit root and therefore can see inode items that +* don't exist in the current root anymore, and for example make +* calls to btrfs_iget, which will do tree lookups based on the +* current root and not on the commit root. Those lookups will +* fail, returning a -ESTALE error, and making send fail with +* that error. So make sure a send does not see any orphans we +* have just removed, and that it will see the same inodes +* regardless of whether a transaction commit happened before +* it started (meaning that the commit root will be the same as +* the current root) or not. +*/ + if (sub_root->node != sub_root->commit_root) { + u64 sub_flags = btrfs_root_flags(&sub_root->root_item); + + if (sub_flags & BTRFS_ROOT_SUBVOL_RDONLY) { + struct extent_buffer *eb; + + /* +* Assert we can't have races between dentry +* lookup called through the snapshot creation +* ioctl and the VFS. +*/ + ASSERT(mutex_is_locked(&dir->i_mutex)); + + down_write(&root->fs_info->commit_root_sem); + eb = sub_root->commit_root; + sub_root->commit_root = + btrfs_root_node(sub_root); + up_write(&root->fs_info->commit_root_sem); + free_extent_buffer(eb); + } + } } return inode; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 2a30ac1..ef2e073 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -711,39 +711,6 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, if (ret) goto fail; - ret = btrfs_orphan_cleanup(pending_snapshot->snap); - if (ret) - goto fail; - - /* -* If orphan cleanup did remove any orphans, it means the tree was -* modified and therefore the commit root is not the same as the -* current root anymore. This is a problem, because send uses the -* commit root and therefore can see inode items that don't exist -* in the current root anymore, and for example make calls to -* btrfs_iget, which will do tree lookups based on the current root -* and not on the commit root. Those lookups will fail, returning a -* -ESTALE error, and making send fail with that error. So make sure -* a send does not see any orphans we have just rem
Re: [PATCH] Btrfs: update commit root on snapshot creation after orphan cleanup
On Mon, Jul 28, 2014 at 6:31 PM, Filipe David Manana wrote: > On Sat, Jul 19, 2014 at 8:11 PM, Alex Lyakas > wrote: >> Hi Filipe, >> It's quite possible I don't fully understand the issue. It seems that >> we are creating a read-only snapshot, commit a transaction, and then >> go and modify the snapshot once again, by deleting all the >> ORPHAN_ITEMs we have in its file tree (btrfs_orphan_cleanup). >> Shouldn't all this be part of snapshot creation, so that after we >> commit, we have a clean file tree with no orphans there? (not sure if >> this makes sense though). >> >> With your patch we do this additional commit after the cleanup. But >> nothing prevents "send" from starting before this additional commit, >> correct? And it would still see the orphans through the commit root. >> You say that it is not a problem, but I am not sure why (probably I am >> missing something here). So for me it looks like your patch closes a >> race window significantly (at the cost of an additional commit), but >> does not close it fully. > > Hi Alex, > > That's right, after the transaction commit finishes, the snapshot will > be visible and accessible to user space - so someone may start a send > before the orphan cleanup starts. It was ok only for the serialized > case (create snapshot, wait for ioctl to return, call send ioctl). Actually no. If after the 1st transaction commit (the one that creates the snapshot and makes it visible to user space) and before the orphan cleanup is called another task attempts to use the snapshot for a send operation, it will block when doing the snapshot dentry lookup - because both tasks acquire the parent inode's mutex (implicitly through the vfs and explicitly via the snapshot/subvol ioctl entry point). Nevertheless, it's better to move the commit root switch part to the dentry lookup function (as the new patch does), since after the first transaction commit and before the second one commits, a reboot might happen, and after that we would get into the same issue until the first transaction commit happens after the reboot. I'll update the new patch's comment. thanks > >> >> But most important: perhaps "send" should look for ORPHAN_ITEMs and >> treat those inodes as "deleted"? > > There are other cases were orphans can exist, like for file truncates > for example, where ignoring the inode wouldn't be very correct. > Tried that approach initially, but it's actually more complex to > implement and adds some additional overhead (tree searches - and the > orphan items are normally too far from the inode items, due to a very > high objectid (-5ULL)). > > I've reworked this with a different approach and CC'ed you > (https://patchwork.kernel.org/patch/4635471/). > > thanks > >> >> Thanks, >> Alex. >> >> >> >> On Tue, Jun 3, 2014 at 2:41 PM, Filipe David Borba Manana >> wrote: >>> On snapshot creation (either writable or read-only), we do orphan cleanup >>> against the root of the snapshot. If the cleanup did remove any orphans, >>> then the current root node will be different from the commit root node >>> until the next transaction commit happens. >>> >>> A send operation always uses the commit root of a snapshot - this means >>> it will see the orphans if it starts computing the send stream before the >>> next transaction commit happens (triggered by a timer or sync() for .e.g), >>> which is when the commit root gets assigned a reference to current root, >>> where the orphans are not visible anymore. The consequence of send seeing >>> the orphans is explained below. >>> >>> For example: >>> >>> mkfs.btrfs -f /dev/sdd >>> mount -o commit=999 /dev/sdd /mnt >>> >>> # open a file with O_TMPFILE and leave it open >>> # write some data to the file >>> btrfs subvolume snapshot -r /mnt /mnt/snap1 >>> >>> btrfs send /mnt/snap1 -f /tmp/send.data >>> >>> The send operation will fail with the following error: >>> >>> ERROR: send ioctl failed with -116: Stale file handle >>> >>> What happens here is that our snapshot has an orphan inode still visible >>> through the commit root, that corresponds to the tmpfile. However send >>> will attempt to call inode.c:btrfs_iget(), with the goal of reading the >>> file's data, which will return -ESTALE because it will use the current >>> root (and not the commit root) of the snapshot. >>> >>> Of course, there are other cases where we can get orphans, but this >>> example using a tmpfile makes it much easier to reproduce the issue. >>> >>> Therefore on snapshot creation, after calling btrfs_orphan_cleanup, if >>> the commit root is different from the current root, just commit the >>> transaction associated with the snapshot's root (if it exists), so that >>> a send will not see any orphans that don't exist anymore. This also >>> guarantees a send will always see the same content regardless of whether >>> a transaction commit happened already before the send was requested and >>> after the orphan cleanup (meaning the commit root and cur
Re: [PATCH v2] btrfs-progs: correct manpage option description for scrub
On Thu, Jul 31, 2014 at 09:25:07AM +0800, Gui Hecheng wrote: > On Wed, 2014-07-30 at 10:26 +, Duncan wrote: > > Gui Hecheng posted on Wed, 30 Jul 2014 15:32:05 +0800 as excerpted: > > > > > Also, add description of the potential failure as follows > > > "If a scrub is already running running, it fails." > > > > Just a typo (running running?) in the description... > > > Hmmm, that's my fault. Thanks very much, I will correct it soon. I'll fix the changelog, no need to resend the patch. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: mkfs: remove experimental tag
Make it consistent with kernel status and documentation. Signed-off-by: David Sterba --- mkfs.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mkfs.c b/mkfs.c index 16e92221a547..538b6e6837b2 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1439,8 +1439,8 @@ int main(int ac, char **av) } /* if we are here that means all devs are good to btrfsify */ - printf("\nWARNING! - %s IS EXPERIMENTAL\n", BTRFS_BUILD_VERSION); - printf("WARNING! - see http://btrfs.wiki.kernel.org before using\n\n"); + printf("%s\n", BTRFS_BUILD_VERSION); + printf("See http://btrfs.wiki.kernel.org for more\n\n"); dev_cnt--; @@ -1597,7 +1597,6 @@ raid_groups: label, first_file, nodesize, leafsize, sectorsize, pretty_size(btrfs_super_total_bytes(root->fs_info->super_copy))); - printf("%s\n", BTRFS_BUILD_VERSION); btrfs_commit_transaction(trans, root); if (source_dir_set) { -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Work Queue for btrfs compression writes
On Wed, Jul 30, 2014 at 10:36:57AM -0400, Peter Hurley wrote: > > Where is that git tree? I've been planning to set up a unit test and > regression suite for tty/serial, and wouldn't mind cribbing the > infrastructure from someone's existing work. https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-porgs: fix xfstest btrfs/023 random failure
On Thu, Jul 31, 2014 at 03:07:39PM +0800, Anand Jain wrote: > > and the fsync just made the race window smaller. If you still think > > the fsync is useful, please update the changelog. > > You mean I should put fsync at more correct places ? The fsync should not be necessary, but I haven't looked very closely. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is Btrfs stable ?
On Thu, Jul 31, 2014 at 12:32:27PM +0200, Cyril Scetbon wrote: > I'm sorry cause I suppose it's a recurrent question. However I've been > looking into the linux kernel code and didn't see anything concerning > an experimental status for Btrfs. However, when I format a device with > mkfs.btrfs I get a message saying it's experimental that comes from > http://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/tree/mkfs.c#n1442 The mkfs.btrfs tool is probably the last instance where the experimental tag exists, but should be removed so it's consistent with kernel and the documentation. Thanks for the notice. > But you say that it's no longer unstable at > https://btrfs.wiki.kernel.org/index.php/Main_Page#Stability_status, > right ? That's right. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: return path with unlocked nodes in btrfs_next_leaf
Calling unlock_up() to release our new path doesn't release the read lock on the node at level 1, because our return path has path->slots[0] == 0, which makes unlock_up() skip unlocking that node. Since we don't need to return that node locked, call btrfs_unlock_up_safe() instead of unlock_up(), which will release all nodes in the path (except the leaf of course). For any level N >= 2, the corresponding node lock isn't released by unlock_up() too if path->slots[N - 1] == 0. Releasing the read lock immediately will allow concurrent writers to write lock that node at level 1 (or higher levels if applicable) while the btrfs_next_leaf() caller processes the leaf. Signed-off-by: Filipe Manana --- fs/btrfs/ctree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 783ea3b..8ca6761 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -5833,7 +5833,7 @@ again: } ret = 0; done: - unlock_up(path, 0, 1, 0, NULL); + btrfs_unlock_up_safe(path, 1); path->leave_spinning = old_spinning; if (!old_spinning) btrfs_set_path_blocking(path); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is Btrfs stable ?
Hi, I'm sorry cause I suppose it's a recurrent question. However I've been looking into the linux kernel code and didn't see anything concerning an experimental status for Btrfs. However, when I format a device with mkfs.btrfs I get a message saying it's experimental that comes from http://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/tree/mkfs.c#n1442 But you say that it's no longer unstable at https://btrfs.wiki.kernel.org/index.php/Main_Page#Stability_status, right ? If someone can explain me that. Thanks -- Cyril SCETBON -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove certain calls for releasing page cache
On Wed, Jul 30, 2014 at 10:05:16PM -0400, Nick Krause wrote: > On Wed, Jul 30, 2014 at 7:30 PM, Dave Airlie wrote: > >> This patch removes the lines for releasing the page cache in certain > >> files as this may aid in perfomance with writes in the compression > >> rountines of btrfs. Please note that this patch has not been tested > >> on my own hardware due to no compression based btrfs volumes of my > >> own. > >> > > > > For all that is sacred, STOP. [snip] > > But if you want to work on the kernel, this isn't the way to do it, and > > nobody will ever take a patch from you seriously if you continue in this > > fashion. > > > > Dave. > Dave , > Seems I need to have tested this code first. You've said this before, having made exactly the same error (not testing a patch). Yet you do it again. You seem to be ignoring all the advice you've been given -- or at least not learning from it, and not learning from your experiences. Could you please, for half an hour or so, stop thinking about the immediate goal of getting a patch into the kernel, and take a short while to think about your process of learning. Look at all the advice you've had (from me, from Ted, from others), actually understand it, and consider all the things you need to do which *aren't* hacking up a lump of C. Actually learn these things -- have them in your mind all the time. I would appreciate it if you could actually engage with someone (doesn't have to be me) about this -- why are you ignoring the advice? Is it because you don't understand it? Is it because you think you can cut corners? Is it because you're concetrating on the code so much that you're forgetting it? The main thing you're doing which is making people angry is not because you're submitting bad patches (although you are). It's because you're not listening to advice, and you're not apparently learning anything from the feedback you're given. Your behaviour is not changing over time, which makes you look like a waste of time to all those people trying to help you. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- That's not rain, that's a lake with slots in it --- signature.asc Description: Digital signature
Re: [PATCH] btrfs: replace seed device followed by unmount causes kernel WARNING
On 30/07/2014 15:42, Miao Xie wrote: On Fri, 25 Jul 2014 20:33:34 +0800, Anand Jain wrote: After the seed device has been replaced the new target device is no more a seed device. So we need to bring that state in the fs_devices. reproducer: mount /dev/sdb /btrfs btrfs dev add /dev/sdc /btrfs btrfs rep start -B /dev/sdb /dev/sdd /btrfs umount /btrfs WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 __btrfs_close_devices+0x1b0/0x200 [btrfs]() :: __btrfs_close_devices() :: WARN_ON(fs_devices->open_devices); WARN_ON(fs_devices->rw_devices); per the btrfs-devlist tool (to dump fs_devices and btrfs_device from the kernel) the num_device, open_devices, rw_devices are still at 1 but the total_device is at 2, even after the seed device has been replaced in the above example. Signed-off-by: Anand Jain --- fs/btrfs/dev-replace.c | 13 + 1 file changed, 13 insertions(+) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index eea26e1..a144bb1 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -569,6 +569,19 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, btrfs_rm_dev_replace_blocked(fs_info); + /* +* if we are replacing a seed device with a writable device +* then FS won't be a seeding FS any more. +*/ + if (src_device->fs_devices->seeding && !src_device->writeable) { First, why not move this code into btrfs_rm_dev_replace_srcdev()? Then if the first condition is true, the second one(!src_device->writeable) must be true because all the devices in the seed fs_device must be read-only. so only the first check is enough. + fs_info->fs_devices->rw_devices++; If src is missing dev, we would increase it twice. + fs_info->fs_devices->num_devices++; + fs_info->fs_devices->open_devices++; + + fs_info->fs_devices->seeding = 0; + fs_info->fs_devices->seed = NULL; In fact, we may have several seed fs_devices in one fs, and the seed fs_device which includes src might not the first one, so assign seed to be NULL would break the seed fs_device list. Yep I had question when writing this patch but later decided to reset seed and seeding. if I am not wrong don't reset seeding and seed will do as well. Thanks for reviewing. Anand Thanks Miao + } + btrfs_rm_dev_replace_srcdev(fs_info, src_device); btrfs_rm_dev_replace_unblocked(fs_info); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-porgs: fix xfstest btrfs/023 random failure
On 29/07/2014 20:57, David Sterba wrote: On Thu, Jul 17, 2014 at 05:28:24PM +0800, Anand Jain wrote: xfstest btrfs/023 which does the following tests create_group_profile "raid0" check_group_profile "RAID0" create_group_profile "raid1" check_group_profile "RAID1" create_group_profile "raid10" check_group_profile "RAID10" create_group_profile "raid5" check_group_profile "RAID5" create_group_profile "raid6" check_group_profile "RAID6" fails randomly with the error as below ERROR: device scan failed '/dev/sde' - Invalid argument since failure is at random group profile it indicates to me that btrfs kernel did not see the newly created btrfs on the device To note: I have the following patch on the kernel which is not yet integrated, but its not related to this bug. btrfs: RFC: code optimize use btrfs_get_bdev_and_sb() at btrfs_scan_one_device I guess the error was caused by this patch, Yep. I got that understanding when you mentioned about the patch btrfs: access superblock via pagecache in scan_one_device > and the fsync just made the race window smaller. If you still think > the fsync is useful, please update the changelog. You mean I should put fsync at more correct places ? Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html