Re: [PATCH v2 3/3] btrfs-progs: handle error in the btrfs_prepare_device
On Tue, 17 Dec 2013 10:33:36 +0800, Anand Jain wrote: this patch will handle the strerror reporting of the error instead of printing errno, and also replaced the BUG_ON with the error handling Signed-off-by: Anand Jain anand.j...@oracle.com --- v2: commit update --- cmds-device.c | 7 +++ cmds-replace.c | 10 -- mkfs.c | 9 - utils.c| 30 +++--- 4 files changed, 34 insertions(+), 22 deletions(-) [...] diff --git a/cmds-replace.c b/cmds-replace.c index d9b0940..8160107 100644 --- a/cmds-replace.c +++ b/cmds-replace.c @@ -276,13 +276,11 @@ static int cmd_start_replace(int argc, char **argv) } strncpy((char *)start_args.start.tgtdev_name, dstdev, BTRFS_DEVICE_PATH_NAME_MAX); - if (btrfs_prepare_device(fddstdev, dstdev, 1, dstdev_block_count, 0, - mixed, 0)) { - fprintf(stderr, Error: Failed to prepare device '%s'\n, - dstdev); - goto leave_with_error; - } + ret = btrfs_prepare_device(fddstdev, dstdev, 1, dstdev_block_count, 0, + mixed, 0); close(fddstdev); + if (ret) + goto leave_with_error; fddstdev = -1; You change the code to call close(fddstdev) twice. [...] +zero_dev_error: + if (ret) { + ret 0 ? + fprintf(stderr, ERROR: failed to zero device start '%s' - %s\n, + file, strerror(-ret)) : + fprintf(stderr, ERROR: failed to zero device start '%s' - %d\n, + file, ret); This is not funny. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] btrfs-progs: handle error in the btrfs_prepare_device
+ ret = btrfs_prepare_device(fddstdev, dstdev, 1, dstdev_block_count, 0, +mixed, 0); close(fddstdev); + if (ret) + goto leave_with_error; fddstdev = -1; yeah moved this 3 lines up. thanks. You change the code to call close(fddstdev) twice. [...] +zero_dev_error: + if (ret) { + ret 0 ? + fprintf(stderr, ERROR: failed to zero device start '%s' - %s\n, + file, strerror(-ret)) : + fprintf(stderr, ERROR: failed to zero device start '%s' - %d\n, + file, ret); This is not funny. hmm. I am not sure what you mean ? Thanks, Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Understanding subvolume hierarchy
Mhmm. Thanks. I'm begining to understand ;) Still : how can I see/know that id 5 is mapped to id 0 ? And why doing this? For what purpose? (is it a default btrfs behavior or is it set by the Ubuntu installer?) 2013/12/17 Chris Murphy li...@colorremedies.com: On Dec 16, 2013, at 4:54 PM, Nicolas Michel be.nicolas.mic...@gmail.com wrote: OK. thanks for your pretty fast answer :) Now my last question is: in this case it was easy as I know that I created all these subvolumes as parts of volume 0. But in the btrfs subv list / I don't see any information that tells me they belongs to id 0. If I have to debug a server/desktop and I don't know the hierarchy that has been made, how can I know that my tmp subvolume is indeed a child of id 0 ? When you do a subvol list it shows you what its top level is. Example: # btrfs subvol list / ID 256 gen 1047 top level 5 path root ID 258 gen 983 top level 5 path home ID 259 gen 983 top level 5 path data ID 276 gen 1012 top level 5 path root_ro root is mounted at / home is mounted at /hoome data is mounted at /data root_ro is not mounted at all #cd /data # btrfs subvol create data2 Create subvolume './data2' # btrfs subvol list / ID 256 gen 1047 top level 5 path root ID 258 gen 983 top level 5 path home ID 259 gen 1048 top level 5 path data ID 276 gen 1012 top level 5 path root_ro ID 277 gen 1048 top level 5 path data/data2 # btrfs subvol list /data ID 256 gen 1047 top level 5 path root ID 258 gen 983 top level 5 path home ID 259 gen 1048 top level 5 path data ID 276 gen 1012 top level 5 path root_ro ID 277 gen 1048 top level 259 path data2 So notice that top level 5 data/data2 means the same as top level 259 data2 because top level 259 implies data. You can also use btrfs subvol show subvol and it will give you more information including whether it's a snapshot, what the parent is; and if it's a parent that has snapshots it'll list the snapshots. # btrfs subvol show /data /data Name: data uuid: bc45f4be-51c9-2848-bb68-d6e922b8e2bd Parent uuid:- Creation time: 2013-12-12 16:18:13 Object ID: 259 Generation (Gen): 1048 Gen at creation:11 Parent: 5 Top Level: 5 Flags: - Snapshot(s): # btrfs subvol show /data/data2 /data/data2 Name: data2 uuid: a66eddf9-107f-0448-a021-da417a982827 Parent uuid:- Creation time: 2013-12-16 17:11:36 Object ID: 277 Generation (Gen): 1048 Gen at creation:1048 Parent: 259 Top Level: 259 Flags: - Snapshot(s): Chris Murphy -- Nicolas MICHEL -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Understanding subvolume hierarchy
On Tue, Dec 17, 2013 at 09:30:11AM +0100, Nicolas Michel wrote: Mhmm. Thanks. I'm begining to understand ;) Still : how can I see/know that id 5 is mapped to id 0 ? And why doing this? For what purpose? (is it a default btrfs behavior or is it set by the Ubuntu installer?) It's always mapped. Internally, every tree is identified by a number. FS trees (e.g. subvolumes) start with numbers allocated dynamically from 256 upwards. Other trees (chunk tree, extent tree, and all the others) have fixed well-known numbers between 1 and 255, and the top-level FS tree is given the number 5. To make things marginally simpler for the user to remember, there's special-case code which looks at subvolume IDs passed from userspace, and converts 0 to 5. This is all btrfs behaviour -- nothing to do with Ubuntu. Hugo. 2013/12/17 Chris Murphy li...@colorremedies.com: On Dec 16, 2013, at 4:54 PM, Nicolas Michel be.nicolas.mic...@gmail.com wrote: OK. thanks for your pretty fast answer :) Now my last question is: in this case it was easy as I know that I created all these subvolumes as parts of volume 0. But in the btrfs subv list / I don't see any information that tells me they belongs to id 0. If I have to debug a server/desktop and I don't know the hierarchy that has been made, how can I know that my tmp subvolume is indeed a child of id 0 ? When you do a subvol list it shows you what its top level is. Example: # btrfs subvol list / ID 256 gen 1047 top level 5 path root ID 258 gen 983 top level 5 path home ID 259 gen 983 top level 5 path data ID 276 gen 1012 top level 5 path root_ro root is mounted at / home is mounted at /hoome data is mounted at /data root_ro is not mounted at all #cd /data # btrfs subvol create data2 Create subvolume './data2' # btrfs subvol list / ID 256 gen 1047 top level 5 path root ID 258 gen 983 top level 5 path home ID 259 gen 1048 top level 5 path data ID 276 gen 1012 top level 5 path root_ro ID 277 gen 1048 top level 5 path data/data2 # btrfs subvol list /data ID 256 gen 1047 top level 5 path root ID 258 gen 983 top level 5 path home ID 259 gen 1048 top level 5 path data ID 276 gen 1012 top level 5 path root_ro ID 277 gen 1048 top level 259 path data2 So notice that top level 5 data/data2 means the same as top level 259 data2 because top level 259 implies data. You can also use btrfs subvol show subvol and it will give you more information including whether it's a snapshot, what the parent is; and if it's a parent that has snapshots it'll list the snapshots. # btrfs subvol show /data /data Name: data uuid: bc45f4be-51c9-2848-bb68-d6e922b8e2bd Parent uuid:- Creation time: 2013-12-12 16:18:13 Object ID: 259 Generation (Gen): 1048 Gen at creation:11 Parent: 5 Top Level: 5 Flags: - Snapshot(s): # btrfs subvol show /data/data2 /data/data2 Name: data2 uuid: a66eddf9-107f-0448-a021-da417a982827 Parent uuid:- Creation time: 2013-12-16 17:11:36 Object ID: 277 Generation (Gen): 1048 Gen at creation:1048 Parent: 259 Top Level: 259 Flags: - Snapshot(s): Chris Murphy -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You can't expect a boy to be depraved until he's gone to --- a good school. signature.asc Description: Digital signature
Re: [PATCH] Btrfs: fix double initialization of the raid kobject
On tue, 17 Dec 2013 12:01:12 +0800, Miao Xie wrote: We met the following oops when doing space balance: kobject (88081b590278): tried to init an initialized object, something is seriously wrong. ... Call Trace: [81937262] dump_stack+0x49/0x5f [8137d259] kobject_init+0x89/0xa0 [8137d36a] kobject_init_and_add+0x2a/0x70 [a009bd79] ? clear_extent_bit+0x199/0x470 [btrfs] [a005e82c] __link_block_group+0xfc/0x120 [btrfs] [a006b9db] btrfs_make_block_group+0x24b/0x370 [btrfs] [a00a899b] __btrfs_alloc_chunk+0x54b/0x7e0 [btrfs] [a00a8c6f] btrfs_alloc_chunk+0x3f/0x50 [btrfs] [a0060123] do_chunk_alloc+0x363/0x440 [btrfs] [a00633d4] btrfs_check_data_free_space+0x104/0x310 [btrfs] [a0069f4d] btrfs_write_dirty_block_groups+0x48d/0x600 [btrfs] [a007aad4] commit_cowonly_roots+0x184/0x250 [btrfs] ... Steps to reproduce: # mkfs.btrfs -f dev # mount -o nospace_cache dev mnt # btrfs balance start mnt # dd if=/dev/zero of=mnt/tmpfile bs=1M count=1 The reason of this problem is that we initialized the raid kobject when we added a block group into a empty raid list. As we know, when we mounted a btrfs filesystem, the raid list was empty, we would initialize the raid kobject when we added the first block group. But if there was not data stored in the block group, the block group would be freed when doing balance, and the raid list would be empty. And then if we allocated a new block group and added it into the raid list, we would initialize the raid kobject again, the oops happened. Fix this problem by initializing the raid kobject just when mounting the fs. Signed-off-by: Miao Xie mi...@cn.fujitsu.com This bug was reported by Wang Shilong, so add Reported-by: Wang Shilong wangsl.f...@cn.fujitsu.com Thanks Miao --- fs/btrfs/extent-tree.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index cd4d9ca..d667aad 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3464,8 +3464,10 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, return ret; } - for (i = 0; i BTRFS_NR_RAID_TYPES; i++) + for (i = 0; i BTRFS_NR_RAID_TYPES; i++) { INIT_LIST_HEAD(found-block_groups[i]); + kobject_init(found-block_group_kobjs[i], btrfs_raid_ktype); + } init_rwsem(found-groups_sem); spin_lock_init(found-lock); found-flags = flags BTRFS_BLOCK_GROUP_TYPE_MASK; @@ -8423,9 +8425,8 @@ static void __link_block_group(struct btrfs_space_info *space_info, int ret; kobject_get(space_info-kobj); /* put in release */ - ret = kobject_init_and_add(kobj, btrfs_raid_ktype, -space_info-kobj, %s, -get_raid_name(index)); + ret = kobject_add(kobj, space_info-kobj, %s, + get_raid_name(index)); if (ret) { pr_warn(btrfs: failed to add kobject for block cache. ignoring.\n); kobject_put(space_info-kobj); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume
If we change our default subvolume, btrfs receive will fail to find subvolume. To fix this problem, i have two ideas. 1.make btrfs snapshot ioctl support passing source subvolume's objectid 2.when we want to using interval subvolume path, we mount it other place that use subvolume 5 as its default subvolume. We'd better use the second approach because it won't bother kernel change. Reported-by: Michael Welsh Duggan m...@md5i.com Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- cmds-receive.c | 51 +++ utils.c| 28 utils.h| 1 + 3 files changed, 76 insertions(+), 4 deletions(-) diff --git a/cmds-receive.c b/cmds-receive.c index ed44107..c2cf8a3 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -40,6 +40,7 @@ #include sys/types.h #include sys/xattr.h #include uuid/uuid.h +#include sys/mount.h #include ctree.h #include ioctl.h @@ -199,6 +200,10 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, char uuid_str[BTRFS_UUID_UNPARSED_SIZE]; struct btrfs_ioctl_vol_args_v2 args_v2; struct subvol_info *parent_subvol = NULL; + char *dev = NULL; + char tmp_name[15] = btrfs-XX; + char tmp_dir[30] = /tmp; + char *full_path = NULL; ret = finish_subvol(r); if (ret 0) @@ -253,13 +258,47 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, } }*/ - args_v2.fd = openat(r-mnt_fd, parent_subvol-path, - O_RDONLY | O_NOATIME); + ret = mnt_to_dev(r-root_path, dev); + if (ret) + goto out; + if (!mktemp(tmp_name)) { + fprintf(stderr, ERROR: fail to generate a tmp file\n); + goto out; + } + strncat(tmp_dir, /, 1); + strncat(tmp_dir, tmp_name, strlen(tmp_name)); + + ret = mkdir(tmp_dir, 0777); + if (ret) { + fprintf(stderr, ERROR: fail to make dir: %s\n, tmp_dir); + goto out; + } + /* if we change default subvolume, using btrfs interval +* subvolume path to lookup may return us ENOENT.To handle +* such case, we mount this btrfs filesystem other place +* where we use fs tree as our default subvolume. +*/ + ret = mount(dev, tmp_dir, btrfs, 0, -o subvolid=5); + if (ret) { + fprintf(stderr, ERROR: fail to mount dev: %s, dev); + goto out; + } + + full_path = calloc(1, strlen(parent_subvol-path) + strlen(tmp_dir)); + if (!full_path) { + ret = -ENOMEM; + goto out_umount; + } + strncat(full_path, tmp_dir, strlen(tmp_dir)); + strncat(full_path, /, 1); + strncat(full_path, parent_subvol-path, strlen(parent_subvol-path)); + + args_v2.fd = open(full_path, O_RDONLY | O_NOATIME); if (args_v2.fd 0) { ret = -errno; fprintf(stderr, ERROR: open %s failed. %s\n, parent_subvol-path, strerror(-ret)); - goto out; + goto out_umount; } ret = ioctl(r-dest_dir_fd, BTRFS_IOC_SNAP_CREATE_V2, args_v2); @@ -269,10 +308,14 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, fprintf(stderr, ERROR: creating snapshot %s - %s failed. %s\n, parent_subvol-path, path, strerror(-ret)); - goto out; } +out_umount: + umount(tmp_dir); + rmdir(tmp_dir); out: + free(full_path); + free(dev); if (parent_subvol) { free(parent_subvol-path); free(parent_subvol); diff --git a/utils.c b/utils.c index a92696e..da5291b 100644 --- a/utils.c +++ b/utils.c @@ -2194,6 +2194,34 @@ out: return ret; } +/* + * Given mount point, this function will return + * its corresponding device + */ +int mnt_to_dev(const char *mnt_dir, char **dev) +{ + struct mntent *mnt; + FILE *f; + int ret = -1; + + f = setmntent(/proc/self/mounts, r); + if (f == NULL) + return ret; + while ((mnt = getmntent(f)) != NULL) { + if (strcmp(mnt-mnt_type, btrfs)) + continue; + if (strcmp(mnt-mnt_dir, mnt_dir)) + continue; + *dev = strdup(mnt-mnt_fsname); + if (*dev) + ret = 0; + break; + } + endmntent(f); + + return ret; +} + /* This finds the mount point for a given fsid, * subvols of the same fs/fsid can be mounted * so here this picks and lowest subvol id diff --git a/utils.h b/utils.h index 00f1c18..9b2f79c 100644 --- a/utils.h +++ b/utils.h @@ -98,5 +98,6 @@ int btrfs_scan_lblkid(int
[PATCH v4 01/18] btrfs: Cleanup the unused struct async_sched.
The struct async_sched is not used by any codes and can be removed. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com Reviewed-by: Josef Bacik jba...@fusionio.com --- Changelog: v1-v2: None. v2-v3: None. v3-v4: None: --- fs/btrfs/volumes.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 92303f4..c63ed39 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5322,13 +5322,6 @@ static void btrfs_end_bio(struct bio *bio, int err) } } -struct async_sched { - struct bio *bio; - int rw; - struct btrfs_fs_info *info; - struct btrfs_work work; -}; - /* * see run_scheduled_bios for a description of why bios are collected for * async submit. -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue
Add a new btrfs_workqueue_struct which use kernel workqueue to implement most of the original btrfs_workers, to replace btrfs_workers. With this patchset, redundant workqueue codes are replaced with kernel workqueue infrastructure, which not only reduces the code size but also the effort to maintain it. The result from sysbench shows minor improvement on the following server: CPU: two-way Xeon X5660 RAM: 4G HDD: SAS HDD, 150G total, 100G partition for btrfs test Test result on default mount option: https://docs.google.com/spreadsheet/ccc?key=0AhpkL3ehzX3pdENjajJTWFg5d1BWbExnYWFpMTJxeUEusp=sharing Test result on -o compress mount option: https://docs.google.com/spreadsheet/ccc?key=0AhpkL3ehzX3pdHdTTEJ6OW96SXJFaDR5enB1SzMzc0Eusp=sharing Changelog: v1-v2: - Fix some workqueue flags. v2-v3: - Add the thresholding mechanism to simulate the old behavior - Convert all the btrfs_workers to btrfs_workrqueue_struct. - Fix some potential deadlock when executed in IRQ handler. v3-v4: - Change the ordered workqueue implement to fix the performance drop in 32K multi thread random write. - Change the high priority workqueue implement to get an independent high workqueue without starving problem. - Simplify the btrfs_alloc_workqueue parameters. - Coding style cleanup. - Remove the redundant _struct suffix. Qu Wenruo (18): btrfs: Cleanup the unused struct async_sched. btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue btrfs: Add high priority workqueue support for btrfs_workqueue_struct btrfs: Add threshold workqueue based on kernel workqueue btrfs: Replace fs_info-workers with btrfs_workqueue. btrfs: Replace fs_info-delalloc_workers with btrfs_workqueue btrfs: Replace fs_info-submit_workers with btrfs_workqueue. btrfs: Replace fs_info-flush_workers with btrfs_workqueue. btrfs: Replace fs_info-endio_* workqueue with btrfs_workqueue. btrfs: Replace fs_info-rmw_workers workqueue with btrfs_workqueue. btrfs: Replace fs_info-cache_workers workqueue with btrfs_workqueue. btrfs: Replace fs_info-readahead_workers workqueue with btrfs_workqueue. btrfs: Replace fs_info-fixup_workers workqueue with btrfs_workqueue. btrfs: Replace fs_info-delayed_workers workqueue with btrfs_workqueue. btrfs: Replace fs_info-qgroup_rescan_worker workqueue with btrfs_workqueue. btrfs: Replace fs_info-scrub_* workqueue with btrfs_workqueue. btrfs: Cleanup the old btrfs_worker. btrfs: Cleanup the _struct suffix in btrfs_workequeue fs/btrfs/async-thread.c | 821 --- fs/btrfs/async-thread.h | 117 ++- fs/btrfs/ctree.h | 39 ++- fs/btrfs/delayed-inode.c | 6 +- fs/btrfs/disk-io.c | 212 +--- fs/btrfs/extent-tree.c | 4 +- fs/btrfs/inode.c | 38 +-- fs/btrfs/ordered-data.c | 11 +- fs/btrfs/qgroup.c| 15 +- fs/btrfs/raid56.c| 21 +- fs/btrfs/reada.c | 4 +- fs/btrfs/scrub.c | 70 ++-- fs/btrfs/super.c | 36 +-- fs/btrfs/volumes.c | 16 +- 14 files changed, 430 insertions(+), 980 deletions(-) -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 04/18] btrfs: Add threshold workqueue based on kernel workqueue
The original btrfs_workers has thresholding functions to dynamically create or destroy kthreads. Though there is no such function in kernel workqueue because the worker is not created manually, we can still use the workqueue_set_max_active to simulated the behavior, mainly to achieve a better HDD performance by setting a high threshold on submit_workers. (Sadly, no resource can be saved) So in this patch, extra workqueue pending counters are introduced to dynamically change the max active of each btrfs_workqueue_struct, hoping to restore the behavior of the original thresholding function. Also, workqueue_set_max_active use a mutex to protect workqueue_struct, which is not meant to be called too frequently, so a new interval mechanism is applied, that will only call workqueue_set_max_active after a count of work is queued. Hoping to balance both the random and sequence performance on HDD. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v2-v3: - Add thresholding mechanism to simulate the old thresholding mechanism. - Will not enable thresholding when thresh is set to small value. v3-v4: None --- fs/btrfs/async-thread.c | 107 fs/btrfs/async-thread.h | 3 +- 2 files changed, 101 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 73b9f94..a986be7 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -30,6 +30,9 @@ #define WORK_ORDER_DONE_BIT 2 #define WORK_HIGH_PRIO_BIT 3 +#define NO_THRESHOLD (-1) +#define DFT_THRESHOLD (32) + /* * container for the kthread task pointer and the list of pending work * One of these is allocated per thread. @@ -736,6 +739,14 @@ struct __btrfs_workqueue_struct { /* Spinlock for ordered_list */ spinlock_t list_lock; + + /* Thresholding related variants */ + atomic_t pending; + int max_active; + int current_max; + int thresh; + unsigned int count; + spinlock_t thres_lock; }; struct btrfs_workqueue_struct { @@ -744,19 +755,34 @@ struct btrfs_workqueue_struct { }; static inline struct __btrfs_workqueue_struct -*__btrfs_alloc_workqueue(char *name, int flags, int max_active) +*__btrfs_alloc_workqueue(char *name, int flags, int max_active, int thresh) { struct __btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); if (unlikely(!ret)) return NULL; + ret-max_active = max_active; + atomic_set(ret-pending, 0); + if (thresh == 0) + thresh = DFT_THRESHOLD; + /* For low threshold, disabling threshold is a better choice */ + if (thresh DFT_THRESHOLD) { + ret-current_max = max_active; + ret-thresh = NO_THRESHOLD; + } else { + ret-current_max = 1; + ret-thresh = thresh; + } + if (flags WQ_HIGHPRI) ret-normal_wq = alloc_workqueue(%s-%s-high, flags, -max_active, btrfs, name); +ret-max_active, +btrfs, name); else ret-normal_wq = alloc_workqueue(%s-%s, flags, -max_active, btrfs, name); +ret-max_active, btrfs, +name); if (unlikely(!ret-normal_wq)) { kfree(ret); return NULL; @@ -764,6 +790,7 @@ static inline struct __btrfs_workqueue_struct INIT_LIST_HEAD(ret-ordered_list); spin_lock_init(ret-list_lock); + spin_lock_init(ret-thres_lock); return ret; } @@ -772,7 +799,8 @@ __btrfs_destroy_workqueue(struct __btrfs_workqueue_struct *wq); struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, int flags, -int max_active) +int max_active, +int thresh) { struct btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); @@ -780,14 +808,15 @@ struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, return NULL; ret-normal = __btrfs_alloc_workqueue(name, flags ~WQ_HIGHPRI, - max_active); + max_active, thresh); if (unlikely(!ret-normal)) { kfree(ret); return NULL; } if (flags WQ_HIGHPRI) { - ret-high = __btrfs_alloc_workqueue(name, flags, max_active); + ret-high = __btrfs_alloc_workqueue(name, flags, max_active, + thresh); if
[PATCH v4 07/18] btrfs: Replace fs_info-submit_workers with btrfs_workqueue.
Much like the fs_info-workers, replace the fs_info-submit_workers use the same btrfs_workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: None v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 17 + fs/btrfs/super.c | 2 +- fs/btrfs/volumes.c | 11 ++- fs/btrfs/volumes.h | 2 +- 5 files changed, 18 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a86c9a1..4411a2b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1499,7 +1499,7 @@ struct btrfs_fs_info { struct btrfs_workers endio_meta_write_workers; struct btrfs_workers endio_write_workers; struct btrfs_workers endio_freespace_worker; - struct btrfs_workers submit_workers; + struct btrfs_workqueue_struct *submit_workers; struct btrfs_workers caching_workers; struct btrfs_workers readahead_workers; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1098435..cda9766 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2017,7 +2017,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_stop_workers(fs_info-endio_meta_write_workers); btrfs_stop_workers(fs_info-endio_write_workers); btrfs_stop_workers(fs_info-endio_freespace_worker); - btrfs_stop_workers(fs_info-submit_workers); + btrfs_destroy_workqueue(fs_info-submit_workers); btrfs_stop_workers(fs_info-delayed_workers); btrfs_stop_workers(fs_info-caching_workers); btrfs_stop_workers(fs_info-readahead_workers); @@ -2482,18 +2482,19 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(fs_info-flush_workers, flush_delalloc, fs_info-thread_pool_size, NULL); - btrfs_init_workers(fs_info-submit_workers, submit, - min_t(u64, fs_devices-num_devices, - fs_info-thread_pool_size), NULL); btrfs_init_workers(fs_info-caching_workers, cache, fs_info-thread_pool_size, NULL); - /* a higher idle thresh on the submit workers makes it much more + /* +* a higher idle thresh on the submit workers makes it much more * likely that bios will be send down in a sane order to the * devices */ - fs_info-submit_workers.idle_thresh = 64; + fs_info-submit_workers = + btrfs_alloc_workqueue(submit, flags, + min_t(u64, fs_devices-num_devices, + max_active), 64); btrfs_init_workers(fs_info-fixup_workers, fixup, 1, fs_info-generic_worker); @@ -2544,7 +2545,6 @@ int open_ctree(struct super_block *sb, * return -ENOMEM if any of these fail. */ ret = btrfs_start_workers(fs_info-generic_worker); - ret |= btrfs_start_workers(fs_info-submit_workers); ret |= btrfs_start_workers(fs_info-fixup_workers); ret |= btrfs_start_workers(fs_info-endio_workers); ret |= btrfs_start_workers(fs_info-endio_meta_workers); @@ -2562,7 +2562,8 @@ int open_ctree(struct super_block *sb, err = -ENOMEM; goto fail_sb_buffer; } - if (!(fs_info-workers fs_info-delalloc_workers)) { + if (!(fs_info-workers fs_info-delalloc_workers + fs_info-submit_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 875560e..9f1d0a5 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1247,7 +1247,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, btrfs_set_max_workers(fs_info-generic_worker, new_pool_size); btrfs_workqueue_set_max(fs_info-workers, new_pool_size); btrfs_workqueue_set_max(fs_info-delalloc_workers, new_pool_size); - btrfs_set_max_workers(fs_info-submit_workers, new_pool_size); + btrfs_workqueue_set_max(fs_info-submit_workers, new_pool_size); btrfs_set_max_workers(fs_info-caching_workers, new_pool_size); btrfs_set_max_workers(fs_info-fixup_workers, new_pool_size); btrfs_set_max_workers(fs_info-endio_workers, new_pool_size); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c63ed39..e07bd64 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -415,7 +415,8 @@ loop_lock: device-running_pending = 1; spin_unlock(device-io_lock); - btrfs_requeue_work(device-work); + btrfs_queue_work(fs_info-submit_workers, +device-work); goto done; } /* unplug every 64 requests just for good measure */ @@ -439,7 +440,7 @@ done: blk_finish_plug(plug);
[PATCH v4 16/18] btrfs: Replace fs_info-scrub_* workqueue with btrfs_workqueue.
Replace the fs_info-scrub_* with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: - Use the btrfs_workqueue_struct to replace scrub_*. v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h | 6 ++-- fs/btrfs/scrub.c | 93 ++-- fs/btrfs/super.c | 4 +-- 3 files changed, 55 insertions(+), 48 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index df51fa3..5d71258 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1587,9 +1587,9 @@ struct btrfs_fs_info { atomic_t scrub_cancel_req; wait_queue_head_t scrub_pause_wait; int scrub_workers_refcnt; - struct btrfs_workers scrub_workers; - struct btrfs_workers scrub_wr_completion_workers; - struct btrfs_workers scrub_nocow_workers; + struct btrfs_workqueue_struct *scrub_workers; + struct btrfs_workqueue_struct *scrub_wr_completion_workers; + struct btrfs_workqueue_struct *scrub_nocow_workers; #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY u32 check_integrity_print_mask; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 561e2f1..1618d6d 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -96,7 +96,8 @@ struct scrub_bio { #endif int page_count; int next_free; - struct btrfs_work work; + struct btrfs_work_struct + work; }; struct scrub_block { @@ -154,7 +155,8 @@ struct scrub_fixup_nodatasum { struct btrfs_device *dev; u64 logical; struct btrfs_root *root; - struct btrfs_work work; + struct btrfs_work_struct + work; int mirror_num; }; @@ -172,7 +174,8 @@ struct scrub_copy_nocow_ctx { int mirror_num; u64 physical_for_dev_replace; struct list_headinodes; - struct btrfs_work work; + struct btrfs_work_struct + work; }; struct scrub_warning { @@ -232,7 +235,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, u64 gen, int mirror_num, u8 *csum, int force, u64 physical_for_dev_replace); static void scrub_bio_end_io(struct bio *bio, int err); -static void scrub_bio_end_io_worker(struct btrfs_work *work); +static void scrub_bio_end_io_worker(struct btrfs_work_struct *work); static void scrub_block_complete(struct scrub_block *sblock); static void scrub_remap_extent(struct btrfs_fs_info *fs_info, u64 extent_logical, u64 extent_len, @@ -249,14 +252,14 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, struct scrub_page *spage); static void scrub_wr_submit(struct scrub_ctx *sctx); static void scrub_wr_bio_end_io(struct bio *bio, int err); -static void scrub_wr_bio_end_io_worker(struct btrfs_work *work); +static void scrub_wr_bio_end_io_worker(struct btrfs_work_struct *work); static int write_page_nocow(struct scrub_ctx *sctx, u64 physical_for_dev_replace, struct page *page); static int copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root, struct scrub_copy_nocow_ctx *ctx); static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len, int mirror_num, u64 physical_for_dev_replace); -static void copy_nocow_pages_worker(struct btrfs_work *work); +static void copy_nocow_pages_worker(struct btrfs_work_struct *work); static void scrub_pending_bio_inc(struct scrub_ctx *sctx) @@ -394,7 +397,8 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace) sbio-index = i; sbio-sctx = sctx; sbio-page_count = 0; - sbio-work.func = scrub_bio_end_io_worker; + btrfs_init_work(sbio-work, scrub_bio_end_io_worker, + NULL, NULL); if (i != SCRUB_BIOS_PER_SCTX - 1) sctx-bios[i]-next_free = i + 1; @@ -699,7 +703,7 @@ out: return -EIO; } -static void scrub_fixup_nodatasum(struct btrfs_work *work) +static void scrub_fixup_nodatasum(struct btrfs_work_struct *work) { int ret; struct scrub_fixup_nodatasum *fixup; @@ -965,9 +969,10 @@ nodatasum_case: fixup_nodatasum-root = fs_info-extent_root; fixup_nodatasum-mirror_num = failed_mirror_index + 1; scrub_pending_trans_workers_inc(sctx); - fixup_nodatasum-work.func = scrub_fixup_nodatasum; - btrfs_queue_worker(fs_info-scrub_workers, - fixup_nodatasum-work); +
[PATCH v4 18/18] btrfs: Cleanup the _struct suffix in btrfs_workequeue
Since the _struct suffix is mainly used for distinguish the differnt btrfs_work between the original and the newly created one, there is no need using the suffix since all btrfs_workers are changed into btrfs_workqueue. Also this patch fixed some codes whose code style is changed due to the too long _struct suffix. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v3-v4: - Remove the _struct suffix. --- fs/btrfs/async-thread.c | 64 fs/btrfs/async-thread.h | 34 - fs/btrfs/ctree.h | 44 - fs/btrfs/delayed-inode.c | 4 +-- fs/btrfs/disk-io.c | 14 +-- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/inode.c | 18 +++--- fs/btrfs/ordered-data.c | 2 +- fs/btrfs/ordered-data.h | 4 +-- fs/btrfs/qgroup.c| 2 +- fs/btrfs/raid56.c| 14 +-- fs/btrfs/reada.c | 5 ++-- fs/btrfs/scrub.c | 23 - fs/btrfs/volumes.c | 2 +- fs/btrfs/volumes.h | 2 +- 15 files changed, 115 insertions(+), 119 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 16a5eec..f896426 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -32,7 +32,7 @@ #define NO_THRESHOLD (-1) #define DFT_THRESHOLD (32) -struct __btrfs_workqueue_struct { +struct __btrfs_workqueue { struct workqueue_struct *normal_wq; /* List head pointing to ordered work list */ struct list_head ordered_list; @@ -49,15 +49,15 @@ struct __btrfs_workqueue_struct { spinlock_t thres_lock; }; -struct btrfs_workqueue_struct { - struct __btrfs_workqueue_struct *normal; - struct __btrfs_workqueue_struct *high; +struct btrfs_workqueue { + struct __btrfs_workqueue *normal; + struct __btrfs_workqueue *high; }; -static inline struct __btrfs_workqueue_struct +static inline struct __btrfs_workqueue *__btrfs_alloc_workqueue(char *name, int flags, int max_active, int thresh) { - struct __btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); + struct __btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_NOFS); if (unlikely(!ret)) return NULL; @@ -95,14 +95,14 @@ static inline struct __btrfs_workqueue_struct } static inline void -__btrfs_destroy_workqueue(struct __btrfs_workqueue_struct *wq); +__btrfs_destroy_workqueue(struct __btrfs_workqueue *wq); -struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, -int flags, -int max_active, -int thresh) +struct btrfs_workqueue *btrfs_alloc_workqueue(char *name, + int flags, + int max_active, + int thresh) { - struct btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); + struct btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_NOFS); if (unlikely(!ret)) return NULL; @@ -131,7 +131,7 @@ struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, * This hook WILL be called in IRQ handler context, * so workqueue_set_max_active MUST NOT be called in this hook */ -static inline void thresh_queue_hook(struct __btrfs_workqueue_struct *wq) +static inline void thresh_queue_hook(struct __btrfs_workqueue *wq) { if (wq-thresh == NO_THRESHOLD) return; @@ -143,7 +143,7 @@ static inline void thresh_queue_hook(struct __btrfs_workqueue_struct *wq) * This hook is called in kthread content. * So workqueue_set_max_active is called here. */ -static inline void thresh_exec_hook(struct __btrfs_workqueue_struct *wq) +static inline void thresh_exec_hook(struct __btrfs_workqueue *wq) { int new_max_active; long pending; @@ -186,10 +186,10 @@ out: } } -static void run_ordered_work(struct __btrfs_workqueue_struct *wq) +static void run_ordered_work(struct __btrfs_workqueue *wq) { struct list_head *list = wq-ordered_list; - struct btrfs_work_struct *work; + struct btrfs_work *work; spinlock_t *lock = wq-list_lock; unsigned long flags; @@ -197,7 +197,7 @@ static void run_ordered_work(struct __btrfs_workqueue_struct *wq) spin_lock_irqsave(lock, flags); if (list_empty(list)) break; - work = list_entry(list-next, struct btrfs_work_struct, + work = list_entry(list-next, struct btrfs_work, ordered_list); if (!test_bit(WORK_DONE_BIT, work-flags)) break; @@ -229,10 +229,10 @@ static void run_ordered_work(struct __btrfs_workqueue_struct *wq) static void normal_work_helper(struct work_struct
[PATCH v4 12/18] btrfs: Replace fs_info-readahead_workers workqueue with btrfs_workqueue.
Replace the fs_info-readahead_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: - Use the btrfs_workqueue_struct to replace readahead_workers. v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 12 fs/btrfs/reada.c | 9 + fs/btrfs/super.c | 2 +- 4 files changed, 11 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8630986..302dc46 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1501,7 +1501,7 @@ struct btrfs_fs_info { struct btrfs_workqueue_struct *endio_freespace_worker; struct btrfs_workqueue_struct *submit_workers; struct btrfs_workqueue_struct *caching_workers; - struct btrfs_workers readahead_workers; + struct btrfs_workqueue_struct *readahead_workers; /* * fixup workers take dirty pages that didn't properly go through diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index d8f42d2..4d49d87 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2019,7 +2019,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info-submit_workers); btrfs_stop_workers(fs_info-delayed_workers); btrfs_destroy_workqueue(fs_info-caching_workers); - btrfs_stop_workers(fs_info-readahead_workers); + btrfs_destroy_workqueue(fs_info-readahead_workers); btrfs_destroy_workqueue(fs_info-flush_workers); btrfs_stop_workers(fs_info-qgroup_rescan_workers); } @@ -2518,14 +2518,11 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(fs_info-delayed_workers, delayed-meta, fs_info-thread_pool_size, fs_info-generic_worker); - btrfs_init_workers(fs_info-readahead_workers, readahead, - fs_info-thread_pool_size, - fs_info-generic_worker); + fs_info-readahead_workers = + btrfs_alloc_workqueue(readahead, flags, max_active, 2); btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1, fs_info-generic_worker); - fs_info-readahead_workers.idle_thresh = 2; - /* * btrfs_start_workers can really only fail because of ENOMEM so just * return -ENOMEM if any of these fail. @@ -2533,7 +2530,6 @@ int open_ctree(struct super_block *sb, ret = btrfs_start_workers(fs_info-generic_worker); ret |= btrfs_start_workers(fs_info-fixup_workers); ret |= btrfs_start_workers(fs_info-delayed_workers); - ret |= btrfs_start_workers(fs_info-readahead_workers); ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { err = -ENOMEM; @@ -2545,7 +2541,7 @@ int open_ctree(struct super_block *sb, fs_info-endio_meta_write_workers fs_info-endio_write_workers fs_info-endio_raid56_workers fs_info-endio_freespace_worker fs_info-rmw_workers - fs_info-caching_workers)) { + fs_info-caching_workers fs_info-readahead_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index 1031b69..854b69a 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -91,7 +91,8 @@ struct reada_zone { }; struct reada_machine_work { - struct btrfs_work work; + struct btrfs_work_struct + work; struct btrfs_fs_info*fs_info; }; @@ -732,7 +733,7 @@ static int reada_start_machine_dev(struct btrfs_fs_info *fs_info, } -static void reada_start_machine_worker(struct btrfs_work *work) +static void reada_start_machine_worker(struct btrfs_work_struct *work) { struct reada_machine_work *rmw; struct btrfs_fs_info *fs_info; @@ -792,10 +793,10 @@ static void reada_start_machine(struct btrfs_fs_info *fs_info) /* FIXME we cannot handle this properly right now */ BUG(); } - rmw-work.func = reada_start_machine_worker; + btrfs_init_work(rmw-work, reada_start_machine_worker, NULL, NULL); rmw-fs_info = fs_info; - btrfs_queue_worker(fs_info-readahead_workers, rmw-work); + btrfs_queue_work(fs_info-readahead_workers, rmw-work); } #ifdef DEBUG diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 5bfe566..7a46e23 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1257,7 +1257,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, btrfs_workqueue_set_max(fs_info-endio_write_workers, new_pool_size); btrfs_workqueue_set_max(fs_info-endio_freespace_worker, new_pool_size); btrfs_set_max_workers(fs_info-delayed_workers, new_pool_size); - btrfs_set_max_workers(fs_info-readahead_workers,
[PATCH v4 17/18] btrfs: Cleanup the old btrfs_worker.
Since all the btrfs_worker is replaced with the newly created btrfs_workqueue, the old codes can be easily remove. Signed-off-by: Quwenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: - Reuse the old async-thred.[ch] files. v3-v4: - Reuse the old WORK_* bits. --- fs/btrfs/async-thread.c | 706 +--- fs/btrfs/async-thread.h | 100 --- fs/btrfs/ctree.h| 1 - fs/btrfs/disk-io.c | 12 - fs/btrfs/super.c| 8 - 5 files changed, 3 insertions(+), 824 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index a986be7..16a5eec 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -25,713 +25,13 @@ #include linux/workqueue.h #include async-thread.h -#define WORK_QUEUED_BIT 0 -#define WORK_DONE_BIT 1 -#define WORK_ORDER_DONE_BIT 2 -#define WORK_HIGH_PRIO_BIT 3 +#define WORK_DONE_BIT 0 +#define WORK_ORDER_DONE_BIT 1 +#define WORK_HIGH_PRIO_BIT 2 #define NO_THRESHOLD (-1) #define DFT_THRESHOLD (32) -/* - * container for the kthread task pointer and the list of pending work - * One of these is allocated per thread. - */ -struct btrfs_worker_thread { - /* pool we belong to */ - struct btrfs_workers *workers; - - /* list of struct btrfs_work that are waiting for service */ - struct list_head pending; - struct list_head prio_pending; - - /* list of worker threads from struct btrfs_workers */ - struct list_head worker_list; - - /* kthread */ - struct task_struct *task; - - /* number of things on the pending list */ - atomic_t num_pending; - - /* reference counter for this struct */ - atomic_t refs; - - unsigned long sequence; - - /* protects the pending list. */ - spinlock_t lock; - - /* set to non-zero when this thread is already awake and kicking */ - int working; - - /* are we currently idle */ - int idle; -}; - -static int __btrfs_start_workers(struct btrfs_workers *workers); - -/* - * btrfs_start_workers uses kthread_run, which can block waiting for memory - * for a very long time. It will actually throttle on page writeback, - * and so it may not make progress until after our btrfs worker threads - * process all of the pending work structs in their queue - * - * This means we can't use btrfs_start_workers from inside a btrfs worker - * thread that is used as part of cleaning dirty memory, which pretty much - * involves all of the worker threads. - * - * Instead we have a helper queue who never has more than one thread - * where we scheduler thread start operations. This worker_start struct - * is used to contain the work and hold a pointer to the queue that needs - * another worker. - */ -struct worker_start { - struct btrfs_work work; - struct btrfs_workers *queue; -}; - -static void start_new_worker_func(struct btrfs_work *work) -{ - struct worker_start *start; - start = container_of(work, struct worker_start, work); - __btrfs_start_workers(start-queue); - kfree(start); -} - -/* - * helper function to move a thread onto the idle list after it - * has finished some requests. - */ -static void check_idle_worker(struct btrfs_worker_thread *worker) -{ - if (!worker-idle atomic_read(worker-num_pending) - worker-workers-idle_thresh / 2) { - unsigned long flags; - spin_lock_irqsave(worker-workers-lock, flags); - worker-idle = 1; - - /* the list may be empty if the worker is just starting */ - if (!list_empty(worker-worker_list) - !worker-workers-stopping) { - list_move(worker-worker_list, -worker-workers-idle_list); - } - spin_unlock_irqrestore(worker-workers-lock, flags); - } -} - -/* - * helper function to move a thread off the idle list after new - * pending work is added. - */ -static void check_busy_worker(struct btrfs_worker_thread *worker) -{ - if (worker-idle atomic_read(worker-num_pending) = - worker-workers-idle_thresh) { - unsigned long flags; - spin_lock_irqsave(worker-workers-lock, flags); - worker-idle = 0; - - if (!list_empty(worker-worker_list) - !worker-workers-stopping) { - list_move_tail(worker-worker_list, - worker-workers-worker_list); - } - spin_unlock_irqrestore(worker-workers-lock, flags); - } -} - -static void check_pending_worker_creates(struct btrfs_worker_thread *worker) -{ - struct btrfs_workers *workers = worker-workers; - struct worker_start *start; - unsigned long flags; - - rmb(); - if (!workers-atomic_start_pending) - return; - - start = kzalloc(sizeof(*start), GFP_NOFS);
[PATCH v4 09/18] btrfs: Replace fs_info-endio_* workqueue with btrfs_workqueue.
Replace the fs_info-endio_* workqueues with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: - Use the btrfs_workqueue_struct to replace endio_*. v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h| 12 +++--- fs/btrfs/disk-io.c | 104 +--- fs/btrfs/inode.c| 20 +- fs/btrfs/ordered-data.h | 2 +- fs/btrfs/super.c| 11 ++--- 5 files changed, 68 insertions(+), 81 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 097364d..5096164 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1492,13 +1492,13 @@ struct btrfs_fs_info { struct btrfs_workqueue_struct *workers; struct btrfs_workqueue_struct *delalloc_workers; struct btrfs_workqueue_struct *flush_workers; - struct btrfs_workers endio_workers; - struct btrfs_workers endio_meta_workers; - struct btrfs_workers endio_raid56_workers; + struct btrfs_workqueue_struct *endio_workers; + struct btrfs_workqueue_struct *endio_meta_workers; + struct btrfs_workqueue_struct *endio_raid56_workers; struct btrfs_workers rmw_workers; - struct btrfs_workers endio_meta_write_workers; - struct btrfs_workers endio_write_workers; - struct btrfs_workers endio_freespace_worker; + struct btrfs_workqueue_struct *endio_meta_write_workers; + struct btrfs_workqueue_struct *endio_write_workers; + struct btrfs_workqueue_struct *endio_freespace_worker; struct btrfs_workqueue_struct *submit_workers; struct btrfs_workers caching_workers; struct btrfs_workers readahead_workers; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 139960f..4f8591a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -54,7 +54,7 @@ #endif static struct extent_io_ops btree_extent_io_ops; -static void end_workqueue_fn(struct btrfs_work *work); +static void end_workqueue_fn(struct btrfs_work_struct *work); static void free_fs_root(struct btrfs_root *root); static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info, int read_only); @@ -85,7 +85,7 @@ struct end_io_wq { int error; int metadata; struct list_head list; - struct btrfs_work work; + struct btrfs_work_struct work; }; /* @@ -681,32 +681,31 @@ static void end_workqueue_bio(struct bio *bio, int err) fs_info = end_io_wq-info; end_io_wq-error = err; - end_io_wq-work.func = end_workqueue_fn; - end_io_wq-work.flags = 0; + btrfs_init_work(end_io_wq-work, end_workqueue_fn, NULL, NULL); if (bio-bi_rw REQ_WRITE) { if (end_io_wq-metadata == BTRFS_WQ_ENDIO_METADATA) - btrfs_queue_worker(fs_info-endio_meta_write_workers, - end_io_wq-work); + btrfs_queue_work(fs_info-endio_meta_write_workers, +end_io_wq-work); else if (end_io_wq-metadata == BTRFS_WQ_ENDIO_FREE_SPACE) - btrfs_queue_worker(fs_info-endio_freespace_worker, - end_io_wq-work); + btrfs_queue_work(fs_info-endio_freespace_worker, +end_io_wq-work); else if (end_io_wq-metadata == BTRFS_WQ_ENDIO_RAID56) - btrfs_queue_worker(fs_info-endio_raid56_workers, - end_io_wq-work); + btrfs_queue_work(fs_info-endio_raid56_workers, +end_io_wq-work); else - btrfs_queue_worker(fs_info-endio_write_workers, - end_io_wq-work); + btrfs_queue_work(fs_info-endio_write_workers, +end_io_wq-work); } else { if (end_io_wq-metadata == BTRFS_WQ_ENDIO_RAID56) - btrfs_queue_worker(fs_info-endio_raid56_workers, - end_io_wq-work); + btrfs_queue_work(fs_info-endio_raid56_workers, +end_io_wq-work); else if (end_io_wq-metadata) - btrfs_queue_worker(fs_info-endio_meta_workers, - end_io_wq-work); + btrfs_queue_work(fs_info-endio_meta_workers, +end_io_wq-work); else - btrfs_queue_worker(fs_info-endio_workers, - end_io_wq-work); + btrfs_queue_work(fs_info-endio_workers, +end_io_wq-work); } }
[PATCH v4 03/18] btrfs: Add high priority workqueue support for btrfs_workqueue_struct
Add high priority function to btrfs_workqueue. This is implemented by embedding a btrfs_workqueue into a btrfs_workqueue and use some helper functions to differ the normal priority wq and high priority wq. So the high priority wq is completely independent from the normal workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: None v3-v4: - Implement high priority workqueue independently. Now high priority wq is implemented as a normal btrfs_workqueue, with independent ordering/thresholding mechanism. This fixed the problem that high priority wq and normal wq shared one ordered wq. --- fs/btrfs/async-thread.c | 89 +++-- fs/btrfs/async-thread.h | 5 ++- 2 files changed, 82 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index f05d57e..73b9f94 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -729,7 +729,7 @@ void btrfs_queue_worker(struct btrfs_workers *workers, struct btrfs_work *work) spin_unlock_irqrestore(worker-lock, flags); } -struct btrfs_workqueue_struct { +struct __btrfs_workqueue_struct { struct workqueue_struct *normal_wq; /* List head pointing to ordered work list */ struct list_head ordered_list; @@ -738,6 +738,38 @@ struct btrfs_workqueue_struct { spinlock_t list_lock; }; +struct btrfs_workqueue_struct { + struct __btrfs_workqueue_struct *normal; + struct __btrfs_workqueue_struct *high; +}; + +static inline struct __btrfs_workqueue_struct +*__btrfs_alloc_workqueue(char *name, int flags, int max_active) +{ + struct __btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); + + if (unlikely(!ret)) + return NULL; + + if (flags WQ_HIGHPRI) + ret-normal_wq = alloc_workqueue(%s-%s-high, flags, +max_active, btrfs, name); + else + ret-normal_wq = alloc_workqueue(%s-%s, flags, +max_active, btrfs, name); + if (unlikely(!ret-normal_wq)) { + kfree(ret); + return NULL; + } + + INIT_LIST_HEAD(ret-ordered_list); + spin_lock_init(ret-list_lock); + return ret; +} + +static inline void +__btrfs_destroy_workqueue(struct __btrfs_workqueue_struct *wq); + struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, int flags, int max_active) @@ -747,19 +779,25 @@ struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, if (unlikely(!ret)) return NULL; - ret-normal_wq = alloc_workqueue(%s-%s, flags, max_active, -btrfs, name); - if (unlikely(!ret-normal_wq)) { + ret-normal = __btrfs_alloc_workqueue(name, flags ~WQ_HIGHPRI, + max_active); + if (unlikely(!ret-normal)) { kfree(ret); return NULL; } - INIT_LIST_HEAD(ret-ordered_list); - spin_lock_init(ret-list_lock); + if (flags WQ_HIGHPRI) { + ret-high = __btrfs_alloc_workqueue(name, flags, max_active); + if (unlikely(!ret-high)) { + __btrfs_destroy_workqueue(ret-normal); + kfree(ret); + return NULL; + } + } return ret; } -static void run_ordered_work(struct btrfs_workqueue_struct *wq) +static void run_ordered_work(struct __btrfs_workqueue_struct *wq) { struct list_head *list = wq-ordered_list; struct btrfs_work_struct *work; @@ -832,8 +870,8 @@ void btrfs_init_work(struct btrfs_work_struct *work, work-flags = 0; } -void btrfs_queue_work(struct btrfs_workqueue_struct *wq, - struct btrfs_work_struct *work) +static inline void __btrfs_queue_work(struct __btrfs_workqueue_struct *wq, + struct btrfs_work_struct *work) { unsigned long flags; @@ -846,13 +884,42 @@ void btrfs_queue_work(struct btrfs_workqueue_struct *wq, queue_work(wq-normal_wq, work-normal_work); } -void btrfs_destroy_workqueue(struct btrfs_workqueue_struct *wq) +void btrfs_queue_work(struct btrfs_workqueue_struct *wq, + struct btrfs_work_struct *work) +{ + struct __btrfs_workqueue_struct *dest_wq; + + if (test_bit(WORK_HIGH_PRIO_BIT, work-flags) wq-high) + dest_wq = wq-high; + else + dest_wq = wq-normal; + __btrfs_queue_work(dest_wq, work); +} + +static inline void +__btrfs_destroy_workqueue(struct __btrfs_workqueue_struct *wq) { destroy_workqueue(wq-normal_wq); kfree(wq); } +void btrfs_destroy_workqueue(struct
[PATCH v4 14/18] btrfs: Replace fs_info-delayed_workers workqueue with btrfs_workqueue.
Replace the fs_info-delayed_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: None v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h | 2 +- fs/btrfs/delayed-inode.c | 10 +- fs/btrfs/disk-io.c | 10 -- fs/btrfs/super.c | 2 +- 4 files changed, 11 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 845615e..698cebc 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1509,7 +1509,7 @@ struct btrfs_fs_info { * for the sys_munmap function call path */ struct btrfs_workqueue_struct *fixup_workers; - struct btrfs_workers delayed_workers; + struct btrfs_workqueue_struct *delayed_workers; struct task_struct *transaction_kthread; struct task_struct *cleaner_kthread; int thread_pool_size; diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 8d292fb..e4ad5ea 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1260,10 +1260,10 @@ void btrfs_remove_delayed_node(struct inode *inode) struct btrfs_async_delayed_work { struct btrfs_delayed_root *delayed_root; int nr; - struct btrfs_work work; + struct btrfs_work_struct work; }; -static void btrfs_async_run_delayed_root(struct btrfs_work *work) +static void btrfs_async_run_delayed_root(struct btrfs_work_struct *work) { struct btrfs_async_delayed_work *async_work; struct btrfs_delayed_root *delayed_root; @@ -1361,11 +1361,11 @@ static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root, return -ENOMEM; async_work-delayed_root = delayed_root; - async_work-work.func = btrfs_async_run_delayed_root; - async_work-work.flags = 0; + btrfs_init_work(async_work-work, btrfs_async_run_delayed_root, + NULL, NULL); async_work-nr = nr; - btrfs_queue_worker(root-fs_info-delayed_workers, async_work-work); + btrfs_queue_work(root-fs_info-delayed_workers, async_work-work); return 0; } diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e5dec5a..9053df8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2017,7 +2017,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info-endio_write_workers); btrfs_destroy_workqueue(fs_info-endio_freespace_worker); btrfs_destroy_workqueue(fs_info-submit_workers); - btrfs_stop_workers(fs_info-delayed_workers); + btrfs_destroy_workqueue(fs_info-delayed_workers); btrfs_destroy_workqueue(fs_info-caching_workers); btrfs_destroy_workqueue(fs_info-readahead_workers); btrfs_destroy_workqueue(fs_info-flush_workers); @@ -2515,9 +2515,8 @@ int open_ctree(struct super_block *sb, btrfs_alloc_workqueue(endio-write, flags, max_active, 2); fs_info-endio_freespace_worker = btrfs_alloc_workqueue(freespace-write, flags, max_active, 0); - btrfs_init_workers(fs_info-delayed_workers, delayed-meta, - fs_info-thread_pool_size, - fs_info-generic_worker); + fs_info-delayed_workers = + btrfs_alloc_workqueue(delayed-meta, flags, max_active, 0); fs_info-readahead_workers = btrfs_alloc_workqueue(readahead, flags, max_active, 2); btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1, @@ -2528,7 +2527,6 @@ int open_ctree(struct super_block *sb, * return -ENOMEM if any of these fail. */ ret = btrfs_start_workers(fs_info-generic_worker); - ret |= btrfs_start_workers(fs_info-delayed_workers); ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { err = -ENOMEM; @@ -2541,7 +2539,7 @@ int open_ctree(struct super_block *sb, fs_info-endio_write_workers fs_info-endio_raid56_workers fs_info-endio_freespace_worker fs_info-rmw_workers fs_info-caching_workers fs_info-readahead_workers - fs_info-fixup_workers)) { + fs_info-fixup_workers fs_info-delayed_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index f7fd00c..83d3477 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1255,7 +1255,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, new_pool_size); btrfs_workqueue_set_max(fs_info-endio_write_workers, new_pool_size); btrfs_workqueue_set_max(fs_info-endio_freespace_worker, new_pool_size); - btrfs_set_max_workers(fs_info-delayed_workers, new_pool_size); + btrfs_workqueue_set_max(fs_info-delayed_workers, new_pool_size);
[PATCH v4 13/18] btrfs: Replace fs_info-fixup_workers workqueue with btrfs_workqueue.
Replace the fs_info-fixup_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: - Use the btrfs_workqueue_struct to replace fixup_workers. v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 10 +- fs/btrfs/inode.c | 8 fs/btrfs/super.c | 1 - 4 files changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 302dc46..845615e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1508,7 +1508,7 @@ struct btrfs_fs_info { * the cow mechanism and make them safe to write. It happens * for the sys_munmap function call path */ - struct btrfs_workers fixup_workers; + struct btrfs_workqueue_struct *fixup_workers; struct btrfs_workers delayed_workers; struct task_struct *transaction_kthread; struct task_struct *cleaner_kthread; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 4d49d87..e5dec5a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2006,7 +2006,7 @@ static noinline int next_root_backup(struct btrfs_fs_info *info, static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) { btrfs_stop_workers(fs_info-generic_worker); - btrfs_stop_workers(fs_info-fixup_workers); + btrfs_destroy_workqueue(fs_info-fixup_workers); btrfs_destroy_workqueue(fs_info-delalloc_workers); btrfs_destroy_workqueue(fs_info-workers); btrfs_destroy_workqueue(fs_info-endio_workers); @@ -2494,8 +2494,8 @@ int open_ctree(struct super_block *sb, min_t(u64, fs_devices-num_devices, max_active), 64); - btrfs_init_workers(fs_info-fixup_workers, fixup, 1, - fs_info-generic_worker); + fs_info-fixup_workers = + btrfs_alloc_workqueue(fixup, flags, 1, 0); /* * endios are largely parallel and should have a very @@ -2528,7 +2528,6 @@ int open_ctree(struct super_block *sb, * return -ENOMEM if any of these fail. */ ret = btrfs_start_workers(fs_info-generic_worker); - ret |= btrfs_start_workers(fs_info-fixup_workers); ret |= btrfs_start_workers(fs_info-delayed_workers); ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { @@ -2541,7 +2540,8 @@ int open_ctree(struct super_block *sb, fs_info-endio_meta_write_workers fs_info-endio_write_workers fs_info-endio_raid56_workers fs_info-endio_freespace_worker fs_info-rmw_workers - fs_info-caching_workers fs_info-readahead_workers)) { + fs_info-caching_workers fs_info-readahead_workers + fs_info-fixup_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d4f8dfb..62e4fc2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1727,10 +1727,10 @@ int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, /* see btrfs_writepage_start_hook for details on why this is required */ struct btrfs_writepage_fixup { struct page *page; - struct btrfs_work work; + struct btrfs_work_struct work; }; -static void btrfs_writepage_fixup_worker(struct btrfs_work *work) +static void btrfs_writepage_fixup_worker(struct btrfs_work_struct *work) { struct btrfs_writepage_fixup *fixup; struct btrfs_ordered_extent *ordered; @@ -1821,9 +1821,9 @@ static int btrfs_writepage_start_hook(struct page *page, u64 start, u64 end) SetPageChecked(page); page_cache_get(page); - fixup-work.func = btrfs_writepage_fixup_worker; + btrfs_init_work(fixup-work, btrfs_writepage_fixup_worker, NULL, NULL); fixup-page = page; - btrfs_queue_worker(root-fs_info-fixup_workers, fixup-work); + btrfs_queue_work(root-fs_info-fixup_workers, fixup-work); return -EBUSY; } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 7a46e23..f7fd00c 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1249,7 +1249,6 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, btrfs_workqueue_set_max(fs_info-delalloc_workers, new_pool_size); btrfs_workqueue_set_max(fs_info-submit_workers, new_pool_size); btrfs_workqueue_set_max(fs_info-caching_workers, new_pool_size); - btrfs_set_max_workers(fs_info-fixup_workers, new_pool_size); btrfs_workqueue_set_max(fs_info-endio_workers, new_pool_size); btrfs_workqueue_set_max(fs_info-endio_meta_workers, new_pool_size); btrfs_workqueue_set_max(fs_info-endio_meta_write_workers, -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More
[PATCH v4 15/18] btrfs: Replace fs_info-qgroup_rescan_worker workqueue with btrfs_workqueue.
Replace the fs_info-qgroup_rescan_worker with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: - Use the btrfs_workqueue_struct to replace qgroup_rescan_workers. v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/disk-io.c | 10 +- fs/btrfs/qgroup.c | 17 + 3 files changed, 16 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 698cebc..df51fa3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1630,9 +1630,9 @@ struct btrfs_fs_info { /* qgroup rescan items */ struct mutex qgroup_rescan_lock; /* protects the progress item */ struct btrfs_key qgroup_rescan_progress; - struct btrfs_workers qgroup_rescan_workers; + struct btrfs_workqueue_struct *qgroup_rescan_workers; struct completion qgroup_rescan_completion; - struct btrfs_work qgroup_rescan_work; + struct btrfs_work_struct qgroup_rescan_work; /* filesystem state */ unsigned long fs_state; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9053df8..fb94e94 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2021,7 +2021,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info-caching_workers); btrfs_destroy_workqueue(fs_info-readahead_workers); btrfs_destroy_workqueue(fs_info-flush_workers); - btrfs_stop_workers(fs_info-qgroup_rescan_workers); + btrfs_destroy_workqueue(fs_info-qgroup_rescan_workers); } static void free_root_extent_buffers(struct btrfs_root *root) @@ -2519,15 +2519,14 @@ int open_ctree(struct super_block *sb, btrfs_alloc_workqueue(delayed-meta, flags, max_active, 0); fs_info-readahead_workers = btrfs_alloc_workqueue(readahead, flags, max_active, 2); - btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1, - fs_info-generic_worker); + fs_info-qgroup_rescan_workers = + btrfs_alloc_workqueue(qgroup-rescan, flags, 1, 0); /* * btrfs_start_workers can really only fail because of ENOMEM so just * return -ENOMEM if any of these fail. */ ret = btrfs_start_workers(fs_info-generic_worker); - ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { err = -ENOMEM; goto fail_sb_buffer; @@ -2539,7 +2538,8 @@ int open_ctree(struct super_block *sb, fs_info-endio_write_workers fs_info-endio_raid56_workers fs_info-endio_freespace_worker fs_info-rmw_workers fs_info-caching_workers fs_info-readahead_workers - fs_info-fixup_workers fs_info-delayed_workers)) { + fs_info-fixup_workers fs_info-delayed_workers + fs_info-qgroup_rescan_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 4e6ef49..521144e 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1516,8 +1516,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans, ret = qgroup_rescan_init(fs_info, 0, 1); if (!ret) { qgroup_rescan_zero_tracking(fs_info); - btrfs_queue_worker(fs_info-qgroup_rescan_workers, - fs_info-qgroup_rescan_work); + btrfs_queue_work(fs_info-qgroup_rescan_workers, +fs_info-qgroup_rescan_work); } ret = 0; } @@ -1981,7 +1981,7 @@ out: return ret; } -static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) +static void btrfs_qgroup_rescan_worker(struct btrfs_work_struct *work) { struct btrfs_fs_info *fs_info = container_of(work, struct btrfs_fs_info, qgroup_rescan_work); @@ -2092,7 +2092,8 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, memset(fs_info-qgroup_rescan_work, 0, sizeof(fs_info-qgroup_rescan_work)); - fs_info-qgroup_rescan_work.func = btrfs_qgroup_rescan_worker; + btrfs_init_work(fs_info-qgroup_rescan_work, + btrfs_qgroup_rescan_worker, NULL, NULL); if (ret) { err: @@ -2155,8 +2156,8 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info) qgroup_rescan_zero_tracking(fs_info); - btrfs_queue_worker(fs_info-qgroup_rescan_workers, - fs_info-qgroup_rescan_work); + btrfs_queue_work(fs_info-qgroup_rescan_workers, +fs_info-qgroup_rescan_work); return 0; } @@ -2187,6 +2188,6 @@ void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info) { if
[PATCH v4 10/18] btrfs: Replace fs_info-rmw_workers workqueue with btrfs_workqueue.
Replace the fs_info-rmw_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: - Use the btrfs_workqueue_struct to replace rmw_workers. v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 12 fs/btrfs/raid56.c | 35 --- 3 files changed, 21 insertions(+), 28 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 5096164..294b373 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1495,7 +1495,7 @@ struct btrfs_fs_info { struct btrfs_workqueue_struct *endio_workers; struct btrfs_workqueue_struct *endio_meta_workers; struct btrfs_workqueue_struct *endio_raid56_workers; - struct btrfs_workers rmw_workers; + struct btrfs_workqueue_struct *rmw_workers; struct btrfs_workqueue_struct *endio_meta_write_workers; struct btrfs_workqueue_struct *endio_write_workers; struct btrfs_workqueue_struct *endio_freespace_worker; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 4f8591a..8b2977b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2012,7 +2012,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info-endio_workers); btrfs_destroy_workqueue(fs_info-endio_meta_workers); btrfs_destroy_workqueue(fs_info-endio_raid56_workers); - btrfs_stop_workers(fs_info-rmw_workers); + btrfs_destroy_workqueue(fs_info-rmw_workers); btrfs_destroy_workqueue(fs_info-endio_meta_write_workers); btrfs_destroy_workqueue(fs_info-endio_write_workers); btrfs_destroy_workqueue(fs_info-endio_freespace_worker); @@ -2509,9 +2509,8 @@ int open_ctree(struct super_block *sb, btrfs_alloc_workqueue(endio-meta-write, flags, max_active, 2); fs_info-endio_raid56_workers = btrfs_alloc_workqueue(endio-raid56, flags, max_active, 4); - btrfs_init_workers(fs_info-rmw_workers, - rmw, fs_info-thread_pool_size, - fs_info-generic_worker); + fs_info-rmw_workers = + btrfs_alloc_workqueue(rmw, flags, max_active, 2); fs_info-endio_write_workers = btrfs_alloc_workqueue(endio-write, flags, max_active, 2); fs_info-endio_freespace_worker = @@ -2525,8 +2524,6 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1, fs_info-generic_worker); - fs_info-rmw_workers.idle_thresh = 2; - fs_info-readahead_workers.idle_thresh = 2; /* @@ -2535,7 +2532,6 @@ int open_ctree(struct super_block *sb, */ ret = btrfs_start_workers(fs_info-generic_worker); ret |= btrfs_start_workers(fs_info-fixup_workers); - ret |= btrfs_start_workers(fs_info-rmw_workers); ret |= btrfs_start_workers(fs_info-delayed_workers); ret |= btrfs_start_workers(fs_info-caching_workers); ret |= btrfs_start_workers(fs_info-readahead_workers); @@ -2549,7 +2545,7 @@ int open_ctree(struct super_block *sb, fs_info-endio_workers fs_info-endio_meta_workers fs_info-endio_meta_write_workers fs_info-endio_write_workers fs_info-endio_raid56_workers - fs_info-endio_freespace_worker)) { + fs_info-endio_freespace_worker fs_info-rmw_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 24ac218..5afa564 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -87,7 +87,7 @@ struct btrfs_raid_bio { /* * for scheduling work in the helper threads */ - struct btrfs_work work; + struct btrfs_work_struct work; /* * bio list and bio_list_lock are used @@ -166,8 +166,8 @@ struct btrfs_raid_bio { static int __raid56_parity_recover(struct btrfs_raid_bio *rbio); static noinline void finish_rmw(struct btrfs_raid_bio *rbio); -static void rmw_work(struct btrfs_work *work); -static void read_rebuild_work(struct btrfs_work *work); +static void rmw_work(struct btrfs_work_struct *work); +static void read_rebuild_work(struct btrfs_work_struct *work); static void async_rmw_stripe(struct btrfs_raid_bio *rbio); static void async_read_rebuild(struct btrfs_raid_bio *rbio); static int fail_bio_stripe(struct btrfs_raid_bio *rbio, struct bio *bio); @@ -1416,20 +1416,18 @@ cleanup: static void async_rmw_stripe(struct btrfs_raid_bio *rbio) { - rbio-work.flags = 0; - rbio-work.func = rmw_work; + btrfs_init_work(rbio-work, rmw_work, NULL, NULL); - btrfs_queue_worker(rbio-fs_info-rmw_workers, - rbio-work); + btrfs_queue_work(rbio-fs_info-rmw_workers, +rbio-work);
[PATCH v4 06/18] btrfs: Replace fs_info-delalloc_workers with btrfs_workqueue
Much like the fs_info-workers, replace the fs_info-delalloc_workers use the same btrfs_workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: None v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 12 fs/btrfs/inode.c | 18 -- fs/btrfs/super.c | 2 +- 4 files changed, 14 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b3093c3..a86c9a1 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1490,7 +1490,7 @@ struct btrfs_fs_info { */ struct btrfs_workers generic_worker; struct btrfs_workqueue_struct *workers; - struct btrfs_workers delalloc_workers; + struct btrfs_workqueue_struct *delalloc_workers; struct btrfs_workers flush_workers; struct btrfs_workers endio_workers; struct btrfs_workers endio_meta_workers; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 258c59a..1098435 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2008,7 +2008,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) { btrfs_stop_workers(fs_info-generic_worker); btrfs_stop_workers(fs_info-fixup_workers); - btrfs_stop_workers(fs_info-delalloc_workers); + btrfs_destroy_workqueue(fs_info-delalloc_workers); btrfs_destroy_workqueue(fs_info-workers); btrfs_stop_workers(fs_info-endio_workers); btrfs_stop_workers(fs_info-endio_meta_workers); @@ -2476,8 +2476,8 @@ int open_ctree(struct super_block *sb, btrfs_alloc_workqueue(worker, flags | WQ_HIGHPRI, max_active, 16); - btrfs_init_workers(fs_info-delalloc_workers, delalloc, - fs_info-thread_pool_size, NULL); + fs_info-delalloc_workers = + btrfs_alloc_workqueue(delalloc, flags, max_active, 2); btrfs_init_workers(fs_info-flush_workers, flush_delalloc, fs_info-thread_pool_size, NULL); @@ -2495,9 +2495,6 @@ int open_ctree(struct super_block *sb, */ fs_info-submit_workers.idle_thresh = 64; - fs_info-delalloc_workers.idle_thresh = 2; - fs_info-delalloc_workers.ordered = 1; - btrfs_init_workers(fs_info-fixup_workers, fixup, 1, fs_info-generic_worker); btrfs_init_workers(fs_info-endio_workers, endio, @@ -2548,7 +2545,6 @@ int open_ctree(struct super_block *sb, */ ret = btrfs_start_workers(fs_info-generic_worker); ret |= btrfs_start_workers(fs_info-submit_workers); - ret |= btrfs_start_workers(fs_info-delalloc_workers); ret |= btrfs_start_workers(fs_info-fixup_workers); ret |= btrfs_start_workers(fs_info-endio_workers); ret |= btrfs_start_workers(fs_info-endio_meta_workers); @@ -2566,7 +2562,7 @@ int open_ctree(struct super_block *sb, err = -ENOMEM; goto fail_sb_buffer; } - if (!(fs_info-workers)) { + if (!(fs_info-workers fs_info-delalloc_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f1a7744..220db71 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -305,7 +305,7 @@ struct async_cow { u64 start; u64 end; struct list_head extents; - struct btrfs_work work; + struct btrfs_work_struct work; }; static noinline int add_async_extent(struct async_cow *cow, @@ -980,7 +980,7 @@ out_unlock: /* * work queue call back to started compression on a file and pages */ -static noinline void async_cow_start(struct btrfs_work *work) +static noinline void async_cow_start(struct btrfs_work_struct *work) { struct async_cow *async_cow; int num_added = 0; @@ -998,7 +998,7 @@ static noinline void async_cow_start(struct btrfs_work *work) /* * work queue call back to submit previously compressed pages */ -static noinline void async_cow_submit(struct btrfs_work *work) +static noinline void async_cow_submit(struct btrfs_work_struct *work) { struct async_cow *async_cow; struct btrfs_root *root; @@ -1019,7 +1019,7 @@ static noinline void async_cow_submit(struct btrfs_work *work) submit_compressed_extents(async_cow-inode, async_cow); } -static noinline void async_cow_free(struct btrfs_work *work) +static noinline void async_cow_free(struct btrfs_work_struct *work) { struct async_cow *async_cow; async_cow = container_of(work, struct async_cow, work); @@ -1056,17 +1056,15 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, async_cow-end = cur_end; INIT_LIST_HEAD(async_cow-extents); - async_cow-work.func = async_cow_start; - async_cow-work.ordered_func = async_cow_submit; -
[PATCH v4 05/18] btrfs: Replace fs_info-workers with btrfs_workqueue.
Use the newly created btrfs_workqueue_struct to replace the original fs_info-workers Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: None v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 41 + fs/btrfs/super.c | 2 +- 3 files changed, 23 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 54ab861..b3093c3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1489,7 +1489,7 @@ struct btrfs_fs_info { * two */ struct btrfs_workers generic_worker; - struct btrfs_workers workers; + struct btrfs_workqueue_struct *workers; struct btrfs_workers delalloc_workers; struct btrfs_workers flush_workers; struct btrfs_workers endio_workers; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8072cfa..258c59a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -107,7 +107,7 @@ struct async_submit_bio { * can't tell us where in the file the bio should go */ u64 bio_offset; - struct btrfs_work work; + struct btrfs_work_struct work; int error; }; @@ -741,12 +741,12 @@ int btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct bio *bio, unsigned long btrfs_async_submit_limit(struct btrfs_fs_info *info) { unsigned long limit = min_t(unsigned long, - info-workers.max_workers, + info-thread_pool_size, info-fs_devices-open_devices); return 256 * limit; } -static void run_one_async_start(struct btrfs_work *work) +static void run_one_async_start(struct btrfs_work_struct *work) { struct async_submit_bio *async; int ret; @@ -759,7 +759,7 @@ static void run_one_async_start(struct btrfs_work *work) async-error = ret; } -static void run_one_async_done(struct btrfs_work *work) +static void run_one_async_done(struct btrfs_work_struct *work) { struct btrfs_fs_info *fs_info; struct async_submit_bio *async; @@ -786,7 +786,7 @@ static void run_one_async_done(struct btrfs_work *work) async-bio_offset); } -static void run_one_async_free(struct btrfs_work *work) +static void run_one_async_free(struct btrfs_work_struct *work) { struct async_submit_bio *async; @@ -814,11 +814,9 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct inode *inode, async-submit_bio_start = submit_bio_start; async-submit_bio_done = submit_bio_done; - async-work.func = run_one_async_start; - async-work.ordered_func = run_one_async_done; - async-work.ordered_free = run_one_async_free; + btrfs_init_work(async-work, run_one_async_start, + run_one_async_done, run_one_async_free); - async-work.flags = 0; async-bio_flags = bio_flags; async-bio_offset = bio_offset; @@ -827,9 +825,9 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct inode *inode, atomic_inc(fs_info-nr_async_submits); if (rw REQ_SYNC) - btrfs_set_work_high_prio(async-work); + btrfs_set_work_high_priority(async-work); - btrfs_queue_worker(fs_info-workers, async-work); + btrfs_queue_work(fs_info-workers, async-work); while (atomic_read(fs_info-async_submit_draining) atomic_read(fs_info-nr_async_submits)) { @@ -2011,7 +2009,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_stop_workers(fs_info-generic_worker); btrfs_stop_workers(fs_info-fixup_workers); btrfs_stop_workers(fs_info-delalloc_workers); - btrfs_stop_workers(fs_info-workers); + btrfs_destroy_workqueue(fs_info-workers); btrfs_stop_workers(fs_info-endio_workers); btrfs_stop_workers(fs_info-endio_meta_workers); btrfs_stop_workers(fs_info-endio_raid56_workers); @@ -2109,6 +2107,8 @@ int open_ctree(struct super_block *sb, int err = -EINVAL; int num_backups_tried = 0; int backup_index = 0; + int max_active; + int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND; bool create_uuid_tree; bool check_uuid_tree; @@ -2468,12 +2468,13 @@ int open_ctree(struct super_block *sb, goto fail_alloc; } + max_active = fs_info-thread_pool_size; btrfs_init_workers(fs_info-generic_worker, genwork, 1, NULL); - btrfs_init_workers(fs_info-workers, worker, - fs_info-thread_pool_size, - fs_info-generic_worker); + fs_info-workers = + btrfs_alloc_workqueue(worker, flags | WQ_HIGHPRI, + max_active, 16);
[PATCH v4 08/18] btrfs: Replace fs_info-flush_workers with btrfs_workqueue.
Replace the fs_info-submit_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: - Use the btrfs_workqueue_struct to replace submit_workers. v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h| 4 ++-- fs/btrfs/disk-io.c | 10 -- fs/btrfs/inode.c| 8 fs/btrfs/ordered-data.c | 13 +++-- fs/btrfs/ordered-data.h | 2 +- 5 files changed, 18 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4411a2b..097364d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1491,7 +1491,7 @@ struct btrfs_fs_info { struct btrfs_workers generic_worker; struct btrfs_workqueue_struct *workers; struct btrfs_workqueue_struct *delalloc_workers; - struct btrfs_workers flush_workers; + struct btrfs_workqueue_struct *flush_workers; struct btrfs_workers endio_workers; struct btrfs_workers endio_meta_workers; struct btrfs_workers endio_raid56_workers; @@ -3622,7 +3622,7 @@ struct btrfs_delalloc_work { int delay_iput; struct completion completion; struct list_head list; - struct btrfs_work work; + struct btrfs_work_struct work; }; struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index cda9766..139960f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2021,7 +2021,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_stop_workers(fs_info-delayed_workers); btrfs_stop_workers(fs_info-caching_workers); btrfs_stop_workers(fs_info-readahead_workers); - btrfs_stop_workers(fs_info-flush_workers); + btrfs_destroy_workqueue(fs_info-flush_workers); btrfs_stop_workers(fs_info-qgroup_rescan_workers); } @@ -2479,9 +2479,8 @@ int open_ctree(struct super_block *sb, fs_info-delalloc_workers = btrfs_alloc_workqueue(delalloc, flags, max_active, 2); - btrfs_init_workers(fs_info-flush_workers, flush_delalloc, - fs_info-thread_pool_size, NULL); - + fs_info-flush_workers = + btrfs_alloc_workqueue(flush_delalloc, flags, max_active, 0); btrfs_init_workers(fs_info-caching_workers, cache, fs_info-thread_pool_size, NULL); @@ -2556,14 +2555,13 @@ int open_ctree(struct super_block *sb, ret |= btrfs_start_workers(fs_info-delayed_workers); ret |= btrfs_start_workers(fs_info-caching_workers); ret |= btrfs_start_workers(fs_info-readahead_workers); - ret |= btrfs_start_workers(fs_info-flush_workers); ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { err = -ENOMEM; goto fail_sb_buffer; } if (!(fs_info-workers fs_info-delalloc_workers - fs_info-submit_workers)) { + fs_info-submit_workers fs_info-flush_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 220db71..929f1ee 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8163,7 +8163,7 @@ out_notrans: return ret; } -static void btrfs_run_delalloc_work(struct btrfs_work *work) +static void btrfs_run_delalloc_work(struct btrfs_work_struct *work) { struct btrfs_delalloc_work *delalloc_work; struct inode *inode; @@ -8201,7 +8201,7 @@ struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode, work-inode = inode; work-wait = wait; work-delay_iput = delay_iput; - work-work.func = btrfs_run_delalloc_work; + btrfs_init_work(work-work, btrfs_run_delalloc_work, NULL, NULL); return work; } @@ -8253,8 +8253,8 @@ static int __start_delalloc_inodes(struct btrfs_root *root, int delay_iput) goto out; } list_add_tail(work-list, works); - btrfs_queue_worker(root-fs_info-flush_workers, - work-work); + btrfs_queue_work(root-fs_info-flush_workers, +work-work); cond_resched(); spin_lock(root-delalloc_lock); diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 69582d5..e0c3cf0 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -552,7 +552,7 @@ void btrfs_remove_ordered_extent(struct inode *inode, wake_up(entry-wait); } -static void btrfs_run_ordered_extent_work(struct btrfs_work *work) +static void btrfs_run_ordered_extent_work(struct btrfs_work_struct *work) { struct btrfs_ordered_extent *ordered; @@ -585,10 +585,11 @@ int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr) atomic_inc(ordered-refs);
[PATCH v4 02/18] btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue
Use kernel workqueue to implement a new btrfs_workqueue_struct, which has the ordering execution feature like the btrfs_worker. The func is executed in a concurrency way, and the ordred_func/ordered_free is executed in the sequence them are queued after the corresponding func is done. The new btrfs_workqueue works much like the original one, one workqueue for normal work and a list for ordered work. When a work is queued, ordered work will be added to the list and helper function will be queued into the workqueue. The helper function will execute a normal work and then check and execute as many ordered work as possible in the sequence they were queued. At this patch, high priority work queue or thresholding is not added yet. The high priority feature and thresholding will be added in the following patches. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com Cc: Josef Bacik jba...@fusionio.com --- Changelog: v1-v2: None. v2-v3: - Fix the potential deadline discovered by kernel lockdep. - Reuse the async-thread.[ch] files. - Make the ordered_func optional, which makes it adaptable to all btrfs_workers. v3-v4: - Use the old list method to implement ordered workqueue. Previous 3 wq implement needs extra time waiting for scheduling, which caused up to 40% performance drop in compress tests. The old list method(after executing a normal work, check the order_list and executing) does not need the extra scheduling things. - Simplify the btrfs_alloc_workqueue parameters. Now only one name is needed, and ordered work mechanism is determined using work-ordered_func. - Fix memory leak in btrfs_destroy_workqueue. --- fs/btrfs/async-thread.c | 130 fs/btrfs/async-thread.h | 27 ++ 2 files changed, 157 insertions(+) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index c1e0b0c..f05d57e 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -1,5 +1,6 @@ /* * Copyright (C) 2007 Oracle. All rights reserved. + * Copyright (C) 2013 Fujitsu. All rights reserved. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public @@ -21,6 +22,7 @@ #include linux/list.h #include linux/spinlock.h #include linux/freezer.h +#include linux/workqueue.h #include async-thread.h #define WORK_QUEUED_BIT 0 @@ -726,3 +728,131 @@ void btrfs_queue_worker(struct btrfs_workers *workers, struct btrfs_work *work) wake_up_process(worker-task); spin_unlock_irqrestore(worker-lock, flags); } + +struct btrfs_workqueue_struct { + struct workqueue_struct *normal_wq; + /* List head pointing to ordered work list */ + struct list_head ordered_list; + + /* Spinlock for ordered_list */ + spinlock_t list_lock; +}; + +struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, +int flags, +int max_active) +{ + struct btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); + + if (unlikely(!ret)) + return NULL; + + ret-normal_wq = alloc_workqueue(%s-%s, flags, max_active, +btrfs, name); + if (unlikely(!ret-normal_wq)) { + kfree(ret); + return NULL; + } + + INIT_LIST_HEAD(ret-ordered_list); + spin_lock_init(ret-list_lock); + return ret; +} + +static void run_ordered_work(struct btrfs_workqueue_struct *wq) +{ + struct list_head *list = wq-ordered_list; + struct btrfs_work_struct *work; + spinlock_t *lock = wq-list_lock; + unsigned long flags; + + while (1) { + spin_lock_irqsave(lock, flags); + if (list_empty(list)) + break; + work = list_entry(list-next, struct btrfs_work_struct, + ordered_list); + if (!test_bit(WORK_DONE_BIT, work-flags)) + break; + + /* +* we are going to call the ordered done function, but +* we leave the work item on the list as a barrier so +* that later work items that are done don't have their +* functions called before this one returns +*/ + if (test_and_set_bit(WORK_ORDER_DONE_BIT, work-flags)) + break; + spin_unlock_irqrestore(lock, flags); + work-ordered_func(work); + + /* now take the lock again and drop our item from the list */ + spin_lock_irqsave(lock, flags); + list_del(work-ordered_list); + spin_unlock_irqrestore(lock, flags); + + /* +* we don't want to call the ordered free functions +* with the lock held though +
[PATCH v4 11/18] btrfs: Replace fs_info-cache_workers workqueue with btrfs_workqueue.
Replace the fs_info-cache_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- Changelog: v1-v2: None v2-v3: - Use the btrfs_workqueue_struct to replace caching_workers. v3-v4: - Use the simplified btrfs_alloc_workqueue API. --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/disk-io.c | 10 +- fs/btrfs/extent-tree.c | 6 +++--- fs/btrfs/super.c | 2 +- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 294b373..8630986 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1205,7 +1205,7 @@ struct btrfs_caching_control { struct list_head list; struct mutex mutex; wait_queue_head_t wait; - struct btrfs_work work; + struct btrfs_work_struct work; struct btrfs_block_group_cache *block_group; u64 progress; atomic_t count; @@ -1500,7 +1500,7 @@ struct btrfs_fs_info { struct btrfs_workqueue_struct *endio_write_workers; struct btrfs_workqueue_struct *endio_freespace_worker; struct btrfs_workqueue_struct *submit_workers; - struct btrfs_workers caching_workers; + struct btrfs_workqueue_struct *caching_workers; struct btrfs_workers readahead_workers; /* diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8b2977b..d8f42d2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2018,7 +2018,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info-endio_freespace_worker); btrfs_destroy_workqueue(fs_info-submit_workers); btrfs_stop_workers(fs_info-delayed_workers); - btrfs_stop_workers(fs_info-caching_workers); + btrfs_destroy_workqueue(fs_info-caching_workers); btrfs_stop_workers(fs_info-readahead_workers); btrfs_destroy_workqueue(fs_info-flush_workers); btrfs_stop_workers(fs_info-qgroup_rescan_workers); @@ -2481,8 +2481,8 @@ int open_ctree(struct super_block *sb, fs_info-flush_workers = btrfs_alloc_workqueue(flush_delalloc, flags, max_active, 0); - btrfs_init_workers(fs_info-caching_workers, cache, - fs_info-thread_pool_size, NULL); + fs_info-caching_workers = + btrfs_alloc_workqueue(cache, flags, max_active, 0); /* * a higher idle thresh on the submit workers makes it much more @@ -2533,7 +2533,6 @@ int open_ctree(struct super_block *sb, ret = btrfs_start_workers(fs_info-generic_worker); ret |= btrfs_start_workers(fs_info-fixup_workers); ret |= btrfs_start_workers(fs_info-delayed_workers); - ret |= btrfs_start_workers(fs_info-caching_workers); ret |= btrfs_start_workers(fs_info-readahead_workers); ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { @@ -2545,7 +2544,8 @@ int open_ctree(struct super_block *sb, fs_info-endio_workers fs_info-endio_meta_workers fs_info-endio_meta_write_workers fs_info-endio_write_workers fs_info-endio_raid56_workers - fs_info-endio_freespace_worker fs_info-rmw_workers)) { + fs_info-endio_freespace_worker fs_info-rmw_workers + fs_info-caching_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 45d98d0..80ecc14 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -377,7 +377,7 @@ static u64 add_new_free_space(struct btrfs_block_group_cache *block_group, return total_added; } -static noinline void caching_thread(struct btrfs_work *work) +static noinline void caching_thread(struct btrfs_work_struct *work) { struct btrfs_block_group_cache *block_group; struct btrfs_fs_info *fs_info; @@ -547,7 +547,7 @@ static int cache_block_group(struct btrfs_block_group_cache *cache, caching_ctl-block_group = cache; caching_ctl-progress = cache-key.objectid; atomic_set(caching_ctl-count, 1); - caching_ctl-work.func = caching_thread; + btrfs_init_work(caching_ctl-work, caching_thread, NULL, NULL); spin_lock(cache-lock); /* @@ -638,7 +638,7 @@ static int cache_block_group(struct btrfs_block_group_cache *cache, btrfs_get_block_group(cache); - btrfs_queue_worker(fs_info-caching_workers, caching_ctl-work); + btrfs_queue_work(fs_info-caching_workers, caching_ctl-work); return ret; } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index da3ec84..5bfe566 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1248,7 +1248,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, btrfs_workqueue_set_max(fs_info-workers, new_pool_size); btrfs_workqueue_set_max(fs_info-delalloc_workers, new_pool_size);
[PATCH v2 1/3] btrfs-progs: don't replicate the stripe_len defines
a clean up patch, the BTRFS_STRIPE_LEN is been duplicated across btrfs-progs, the kernel defines it in volume.h so do the same for progs. Signed-off-by: Anand Jain anand.j...@oracle.com --- v2: commit update btrfs-convert.c | 19 +-- chunk-recover.c |1 - cmds-chunk.c|1 - volumes.h |2 ++ 4 files changed, 11 insertions(+), 12 deletions(-) diff --git a/btrfs-convert.c b/btrfs-convert.c index ae10eed..65fe707 100644 --- a/btrfs-convert.c +++ b/btrfs-convert.c @@ -43,7 +43,6 @@ #include ext2fs/ext2_ext_attr.h #define INO_OFFSET (BTRFS_FIRST_FREE_OBJECTID - EXT2_ROOT_INO) -#define STRIPE_LEN (64 * 1024) #define EXT2_IMAGE_SUBVOL_OBJECTID BTRFS_FIRST_FREE_OBJECTID /* @@ -134,11 +133,11 @@ static int cache_free_extents(struct btrfs_root *root, ext2_filsys ext2_fs) for (i = 0; i BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); - bytenr = ~((u64)STRIPE_LEN - 1); + bytenr = ~((u64)BTRFS_STRIPE_LEN - 1); if (bytenr = blocksize * ext2_fs-super-s_blocks_count) break; clear_extent_dirty(root-fs_info-free_space_cache, bytenr, - bytenr + STRIPE_LEN - 1, 0); + bytenr + BTRFS_STRIPE_LEN - 1, 0); } clear_extent_dirty(root-fs_info-free_space_cache, @@ -207,9 +206,9 @@ static int intersect_with_sb(u64 bytenr, u64 num_bytes) for (i = 0; i BTRFS_SUPER_MIRROR_MAX; i++) { offset = btrfs_sb_offset(i); - offset = ~((u64)STRIPE_LEN - 1); + offset = ~((u64)BTRFS_STRIPE_LEN - 1); - if (bytenr offset + STRIPE_LEN + if (bytenr offset + BTRFS_STRIPE_LEN bytenr + num_bytes offset) return 1; } @@ -450,8 +449,8 @@ static int block_iterate_proc(ext2_filsys ext2_fs, } if (sb_region) { - bytenr += STRIPE_LEN - 1; - bytenr = ~((u64)STRIPE_LEN - 1); + bytenr += BTRFS_STRIPE_LEN - 1; + bytenr = ~((u64)BTRFS_STRIPE_LEN - 1); } else { cache = btrfs_lookup_block_group(root-fs_info, bytenr); BUG_ON(!cache); @@ -1523,7 +1522,7 @@ static int create_chunk_mapping(struct btrfs_trans_handle *trans, btrfs_set_stack_chunk_length(chunk, cache-key.offset); btrfs_set_stack_chunk_owner(chunk, extent_root-root_key.objectid); - btrfs_set_stack_chunk_stripe_len(chunk, STRIPE_LEN); + btrfs_set_stack_chunk_stripe_len(chunk, BTRFS_STRIPE_LEN); btrfs_set_stack_chunk_type(chunk, cache-flags); btrfs_set_stack_chunk_io_align(chunk, device-io_align); btrfs_set_stack_chunk_io_width(chunk, device-io_width); @@ -2098,10 +2097,10 @@ static int cleanup_sys_chunk(struct btrfs_root *fs_root, } for (i = 0; i BTRFS_SUPER_MIRROR_MAX; i++) { offset = btrfs_sb_offset(i); - offset = ~((u64)STRIPE_LEN - 1); + offset = ~((u64)BTRFS_STRIPE_LEN - 1); ret = relocate_extents_range(fs_root, ext2_root, -offset, offset + STRIPE_LEN); +offset, offset + BTRFS_STRIPE_LEN); if (ret) goto fail; } diff --git a/chunk-recover.c b/chunk-recover.c index bcde39e..b072ba6 100644 --- a/chunk-recover.c +++ b/chunk-recover.c @@ -41,7 +41,6 @@ #include btrfsck.h #include commands.h -#define BTRFS_STRIPE_LEN (64 * 1024) #define BTRFS_NUM_MIRRORS 2 struct recover_control { diff --git a/cmds-chunk.c b/cmds-chunk.c index 4d7fce0..348229c 100644 --- a/cmds-chunk.c +++ b/cmds-chunk.c @@ -42,7 +42,6 @@ #include commands.h #define BTRFS_CHUNK_TREE_REBUILD_ABORTED -7500 -#define BTRFS_STRIPE_LEN (64 * 1024) #define BTRFS_NUM_MIRRORS 2 struct recover_control { diff --git a/volumes.h b/volumes.h index 2802cb0..b1ff3d0 100644 --- a/volumes.h +++ b/volumes.h @@ -19,6 +19,8 @@ #ifndef __BTRFS_VOLUMES_ #define __BTRFS_VOLUMES_ +#define BTRFS_STRIPE_LEN (64 * 1024) + struct btrfs_device { struct list_head dev_list; struct btrfs_root *dev_root; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/3] btrfs-progs: handle error in the btrfs_prepare_device
this patch will handle the strerror reporting of the error instead of printing errno, and also replaced the BUG_ON with the error handling Signed-off-by: Anand Jain anand.j...@oracle.com --- v3: fix per Stefan review, update error message v2: commit update cmds-device.c |7 +++ cmds-replace.c |9 - mkfs.c |9 - utils.c| 30 +++--- 4 files changed, 34 insertions(+), 21 deletions(-) diff --git a/cmds-device.c b/cmds-device.c index bc4a8dc..ada0bcd 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -111,13 +111,11 @@ static int cmd_add_dev(int argc, char **argv) res = btrfs_prepare_device(devfd, argv[i], 1, dev_block_count, 0, mixed, discard); + close(devfd); if (res) { - fprintf(stderr, ERROR: Unable to init '%s'\n, argv[i]); - close(devfd); ret++; - continue; + goto error_out; } - close(devfd); strncpy_null(ioctl_args.name, argv[i]); res = ioctl(fdmnt, BTRFS_IOC_ADD_DEV, ioctl_args); @@ -130,6 +128,7 @@ static int cmd_add_dev(int argc, char **argv) } +error_out: close_file_or_dir(fdmnt, dirstream); return !!ret; } diff --git a/cmds-replace.c b/cmds-replace.c index d9b0940..c683d6c 100644 --- a/cmds-replace.c +++ b/cmds-replace.c @@ -276,12 +276,11 @@ static int cmd_start_replace(int argc, char **argv) } strncpy((char *)start_args.start.tgtdev_name, dstdev, BTRFS_DEVICE_PATH_NAME_MAX); - if (btrfs_prepare_device(fddstdev, dstdev, 1, dstdev_block_count, 0, -mixed, 0)) { - fprintf(stderr, Error: Failed to prepare device '%s'\n, - dstdev); + ret = btrfs_prepare_device(fddstdev, dstdev, 1, dstdev_block_count, 0, +mixed, 0); + if (ret) goto leave_with_error; - } + close(fddstdev); fddstdev = -1; diff --git a/mkfs.c b/mkfs.c index 33369f9..18df087 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1446,6 +1446,10 @@ int main(int ac, char **av) first_file = file; ret = btrfs_prepare_device(fd, file, zero_end, dev_block_count, block_count, mixed, discard); + if (ret) { + close(fd); + exit(1); + } if (block_count block_count dev_block_count) { fprintf(stderr, %s is smaller than requested size\n, file); exit(1); @@ -1553,8 +1557,11 @@ int main(int ac, char **av) } ret = btrfs_prepare_device(fd, file, zero_end, dev_block_count, block_count, mixed, discard); + if (ret) { + close(fd); + exit(1); + } mixed = old_mixed; - BUG_ON(ret); ret = btrfs_add_to_fsid(trans, root, fd, file, dev_block_count, sectorsize, sectorsize, sectorsize); diff --git a/utils.c b/utils.c index f499023..03947bd 100644 --- a/utils.c +++ b/utils.c @@ -581,13 +581,13 @@ int btrfs_prepare_device(int fd, char *file, int zero_end, u64 *block_count_ret, ret = fstat(fd, st); if (ret 0) { fprintf(stderr, unable to stat %s\n, file); - exit(1); + return 1; } block_count = btrfs_device_size(fd, st); if (block_count == 0) { fprintf(stderr, unable to find %s size\n, file); - exit(1); + return 1; } if (max_block_count) block_count = min(block_count, max_block_count); @@ -612,26 +612,34 @@ int btrfs_prepare_device(int fd, char *file, int zero_end, u64 *block_count_ret, } ret = zero_dev_start(fd); - if (ret) { - fprintf(stderr, failed to zero device start %d\n, ret); - exit(1); - } + if (ret) + goto zero_dev_error; for (i = 0 ; i BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); if (bytenr = block_count) break; - zero_blocks(fd, bytenr, BTRFS_SUPER_INFO_SIZE); + ret = zero_blocks(fd, bytenr, BTRFS_SUPER_INFO_SIZE); + if (ret) + goto zero_dev_error; } if (zero_end) { ret = zero_dev_end(fd, block_count); - if (ret) { - fprintf(stderr, failed to zero device end %d\n, ret); - exit(1); - } +
[PATCH v3 2/3] btrfs-progs: use stripe_len define here
Signed-off-by: Anand Jain anand.j...@oracle.com --- v3: volume.c needs BTRFS_STRIPE_LEN as well v2: commit update btrfs-convert.c |2 +- btrfs-image.c |2 +- volumes.c |2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/btrfs-convert.c b/btrfs-convert.c index 65fe707..df20c15 100644 --- a/btrfs-convert.c +++ b/btrfs-convert.c @@ -1715,7 +1715,7 @@ static int prepare_system_chunk_sb(struct btrfs_super_block *super) btrfs_set_stack_chunk_length(chunk, btrfs_super_total_bytes(super)); btrfs_set_stack_chunk_owner(chunk, BTRFS_EXTENT_TREE_OBJECTID); - btrfs_set_stack_chunk_stripe_len(chunk, 64 * 1024); + btrfs_set_stack_chunk_stripe_len(chunk, BTRFS_STRIPE_LEN); btrfs_set_stack_chunk_type(chunk, BTRFS_BLOCK_GROUP_SYSTEM); btrfs_set_stack_chunk_io_align(chunk, sectorsize); btrfs_set_stack_chunk_io_width(chunk, sectorsize); diff --git a/btrfs-image.c b/btrfs-image.c index 7bcfc06..1b2831a 100644 --- a/btrfs-image.c +++ b/btrfs-image.c @@ -1350,7 +1350,7 @@ static void update_super_old(u8 *buffer) btrfs_set_stack_chunk_length(chunk, (u64)-1); btrfs_set_stack_chunk_owner(chunk, BTRFS_EXTENT_TREE_OBJECTID); - btrfs_set_stack_chunk_stripe_len(chunk, 64 * 1024); + btrfs_set_stack_chunk_stripe_len(chunk, BTRFS_STRIPE_LEN); btrfs_set_stack_chunk_type(chunk, BTRFS_BLOCK_GROUP_SYSTEM); btrfs_set_stack_chunk_io_align(chunk, sectorsize); btrfs_set_stack_chunk_io_width(chunk, sectorsize); diff --git a/volumes.c b/volumes.c index c38da6c..7a9c955 100644 --- a/volumes.c +++ b/volumes.c @@ -773,7 +773,7 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int looped = 0; int ret; int index; - int stripe_len = 64 * 1024; + int stripe_len = BTRFS_STRIPE_LEN; struct btrfs_key key; u64 offset; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send in 3.12 : can't find snapshot?
Hello Michael, I sent a patch to fix the issue(cc you already), can you have a try and see if it fix your problem. Thanks, Wang On 12/17/2013 09:28 AM, Michael Welsh Duggan wrote: Wang Shilong wangshilong1...@gmail.com writes: Hello Michael, I built the new btrfs-progs 3.12 recently. I note that the version information doesn't seem to match this: # ./btrfs --version Btrfs v0.20-rc1-358-g194aa4a Regardless, I was trying to use btrfs send (which worked in the older btrfs), and failed. Here's an example: # ./btrfs send -vvv -p /snapshots/bo /snapshots/bp /dev/null At subvol /snapshots/bp ERROR: open @/snapshots/bp failed. No such file or directory Any idea what might be going on here? Here's the volume information: # ./btrfs sub show / / Name: @ uuid: e5e505d6-1309-8447-b51e-73f08c9401d1 Parent uuid:156f93b9-1175-dc42-a1ee-65c00c5dcc2a Creation time: 2013-07-17 20:44:46 Object ID: 259 Generation (Gen): 296321 Gen at creation:20 Parent: 5 Top Level: 5 Flags: - Snapshot(s): snapshots/bo snapshots/bp Kernel information: Here it seemed that you changed your default sub-volume.(259 not 5) I sent a patch before to fix this problem, it has not been pushed into chris's master branch, patch url is: https://patchwork.kernel.org/patch/3258971/ But is has been pushed into david's latest integration branch , you can pull from: git pull http://github.com/kdave/btrfs-progs.git integration-20131211 After compiling this version the above tests works. Now, however, the receive fails: # ./btrfs send -p /snapshots/bo /snapshots/bp | ./btrfs receive /backup/snapshots/root/ At subvol /snapshots/bp At snapshot bp ioctl(BTRFS_IOC_TREE_SEARCH, uuid, key 48f0ebae83fd32f1, UUID_KEY, 90139d8200afeaab) ret=-1, error: No such file or directory ioctl(BTRFS_IOC_TREE_SEARCH, uuid, key 48f0ebae83fd32f1, UUID_KEY, 90139d8200afeaab) ret=-1, error: No such file or directory ERROR: could not find parent subvolume More volume information: # ./btrfs sub show /backup/snapshots/root/bo /backup/snapshots/root/bo Name: bo uuid: 5e15ef24-f2d0-194f-886d-3f7afc7413a4 Parent uuid:9a226af3-8497-744b-90f7-d7e54d58946d Creation time: 2013-12-13 17:51:57 Object ID: 1030 Generation (Gen): 1046 Gen at creation:1042 Parent: 5 Top Level: 5 Flags: readonly Snapshot(s): # ./btrfs sub show /snapshots/bo /snapshots/bo Name: bo uuid: f132fd83-aeeb-f048-abea-af00829d1390 Parent uuid:e5e505d6-1309-8447-b51e-73f08c9401d1 Creation time: 2013-12-13 17:50:15 Object ID: 404 Generation (Gen): 296977 Gen at creation:291623 Parent: 259 Top Level: 259 Flags: readonly Snapshot(s): # ./btrfs sub show /snapshots/bp /snapshots/bp Name: bp uuid: 6f73d3f2-5f9b-4944-b2d2-3003331b2d10 Parent uuid:e5e505d6-1309-8447-b51e-73f08c9401d1 Creation time: 2013-12-15 22:24:57 Object ID: 405 Generation (Gen): 296977 Gen at creation:296301 Parent: 259 Top Level: 259 Flags: readonly Snapshot(s): -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs: Check read-only status of roots during send
Hello David, Nice work, Before this patch for btrfs send. we have to join a transaction to avoid commit root changed. I send a plus patch that remove transaction protection from btrfs send. and a little comment below. [...] On 12/17/2013 12:34 AM, David Sterba wrote: All the subvolues that are involved in send must be read-only during the s via SUBVOL_SETFLAGS +*/ + int send_in_progress; Why not use u32 here? Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: remove transaction from btrfs send
Since daivd did the work that force us to use readonly snapshot, we can safely remove transaction protection from btrfs send. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- fs/btrfs/send.c | 33 - 1 file changed, 33 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 945d1db..9e832f2 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4522,7 +4522,6 @@ out: static int full_send_tree(struct send_ctx *sctx) { int ret; - struct btrfs_trans_handle *trans = NULL; struct btrfs_root *send_root = sctx-send_root; struct btrfs_key key; struct btrfs_key found_key; @@ -4544,19 +4543,6 @@ static int full_send_tree(struct send_ctx *sctx) key.type = BTRFS_INODE_ITEM_KEY; key.offset = 0; -join_trans: - /* -* We need to make sure the transaction does not get committed -* while we do anything on commit roots. Join a transaction to prevent -* this. -*/ - trans = btrfs_join_transaction(send_root); - if (IS_ERR(trans)) { - ret = PTR_ERR(trans); - trans = NULL; - goto out; - } - /* * Make sure the tree has not changed after re-joining. We detect this * by comparing start_ctransid and ctransid. They should always match. @@ -4580,19 +4566,6 @@ join_trans: goto out_finish; while (1) { - /* -* When someone want to commit while we iterate, end the -* joined transaction and rejoin. -*/ - if (btrfs_should_end_transaction(trans, send_root)) { - ret = btrfs_end_transaction(trans, send_root); - trans = NULL; - if (ret 0) - goto out; - btrfs_release_path(path); - goto join_trans; - } - eb = path-nodes[0]; slot = path-slots[0]; btrfs_item_key_to_cpu(eb, found_key, slot); @@ -4620,12 +4593,6 @@ out_finish: out: btrfs_free_path(path); - if (trans) { - if (!ret) - ret = btrfs_end_transaction(trans, send_root); - else - btrfs_end_transaction(trans, send_root); - } return ret; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs: Check read-only status of roots during send
On Tue, Dec 17, 2013 at 07:58:24PM +0800, Wang Shilong wrote: Nice work, Before this patch for btrfs send. we have to join a transaction to avoid commit root changed. That sounds like a good improvement. I send a plus patch that remove transaction protection from btrfs send. and a little comment below. [...] On 12/17/2013 12:34 AM, David Sterba wrote: All the subvolues that are involved in send must be read-only during the s via SUBVOL_SETFLAGS + */ +int send_in_progress; Why not use u32 here? The int type should be enough to hold refs for all running sends, if this is your concern. I thought it's a refcount, it should not go below 0 but if it does, then it should be caught. I'll update the patch to check if send_in_progress is not negative after the decrements. thanks, david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: move the extent buffer radix tree into the fs_info
On Tue, Dec 17, 2013 at 01:19:39AM +, Chris Mason wrote: On Tue, 2013-12-17 at 02:06 +0100, David Sterba wrote: On Mon, Dec 16, 2013 at 01:26:26PM -0500, Josef Bacik wrote: I need to create a fake tree to test qgroups and I don't want to have to setup a fake btree_inode. The fact is we only use the radix tree for the fs_info, so everybody else who allocates an extent_io_tree is just wasting the space anyway. This patch moves the radix tree and its lock into btrfs_fs_info so there is less stuff I have to fake to do qgroup sanity tests. Thanks, This would make the fs_info::buffer_lock a global hotspot if alloc_extent_buffer and release_extent_buffer are called frequently. But since the only place that was really using it was the metadata btree, the lock shouldn't be hotter than before right? Right. What confused me first is that the number of trees that are initialized by extent_io_tree_init is higher, but the only user is metadata btree. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Subject: [PATCH 4/3] btrfs: check balance of send_in_progress Warn if the balance goes below zero, which appears to be unlikely though. Otherwise cleans up the code a bit. Signed-off-by: David Sterba dste...@suse.cz --- A followup to 3/3 that adds the check if send_in_progress is not going below zero. It's a separate patch rather than folded into 3/3 so the change is clearly visible. I'm not convinced that it's necessary to be that cautious because it looks almost impossible to happen, but on the other hand we'd never know that it happened. fs/btrfs/send.c | 38 -- 1 files changed, 20 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 468eba26ad8c..46ea0cdfb88b 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4618,6 +4618,21 @@ out: return ret; } +static void btrfs_root_dec_send_in_progress(struct btrfs_root* root) +{ + spin_lock(root-root_item_lock); + root-send_in_progress--; + /* +* Not much left to do, we don't know why it's unbalanced and +* can't blindly reset it to 0. +*/ + if (root-send_in_progress 0) + btrfs_err(root-fs_info, + send_in_progres unbalanced %d root %llu\n, + root-send_in_progress, root-root_key.objectid); + spin_unlock(root-root_item_lock); +} + long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_) { int ret = 0; @@ -4835,24 +4850,11 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_) } out: - for (i = 0; i clone_sources_to_rollback; i++) { - struct btrfs_root *r = sctx-clone_roots[i].root; - - spin_lock(r-root_item_lock); - r-send_in_progress--; - spin_unlock(r-root_item_lock); - } - if (!IS_ERR(sctx-parent_root)) { - struct btrfs_root *r = sctx-parent_root; - - spin_lock(r-root_item_lock); - r-send_in_progress--; - spin_unlock(r-root_item_lock); - } - - spin_lock(send_root-root_item_lock); - send_root-send_in_progress--; - spin_unlock(send_root-root_item_lock); + for (i = 0; i clone_sources_to_rollback; i++) + btrfs_root_dec_send_in_progress(sctx-clone_roots[i].root); + if (!IS_ERR(sctx-parent_root)) + btrfs_root_dec_send_in_progress(sctx-parent_root); + btrfs_root_dec_send_in_progress(send_root); kfree(arg); vfree(clone_sources_tmp); -- 1.7.9 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] btrfs-progs: handle error in the btrfs_prepare_device
On Tue, Dec 17, 2013 at 04:37:35PM +0800, Anand Jain wrote: +zero_dev_error: + if (ret) { + ret 0 ? + fprintf(stderr, ERROR: failed to zero device start '%s' - %s\n, + file, strerror(-ret)) : + fprintf(stderr, ERROR: failed to zero device start '%s' - %d\n, + file, ret); This is not funny. hmm. I am not sure what you mean ? It's rather obfuscated, though it's a valid C. No need to minimize the source line count but the time to understand them. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: fix btrfstune silence on failure
On Fri, Dec 13, 2013 at 05:59:46PM +0800, Gui Hecheng wrote: Originally, btrfstune will fail without any options and just exit with no failure prompt. Works for me: $ ./btrfstune usage: btrfstune [options] device -S valueenable/disable seeding -r enable extended inode refs -x enable skinny metadata extent refs Now, the number of arguments are checked before parse options and error msg will show up upon failure. No, the arguments should be parsed first. The btrfstune utility does not use the same parser helpers like check_argc_exact and actually the bug you see could be caused by missing optind = 1 before the while () loop. Can you please test if this helps? --- a/btrfstune.c +++ b/btrfstune.c @@ -115,6 +115,7 @@ int main(int argc, char *argv[]) int skinny_flag = 0; int ret; + optind = 1; while(1) { int c = getopt(argc, argv, S:rx); if (c 0) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: move the extent buffer radix tree into the fs_info
On 12/16/2013 08:06 PM, David Sterba wrote: On Mon, Dec 16, 2013 at 01:26:26PM -0500, Josef Bacik wrote: I need to create a fake tree to test qgroups and I don't want to have to setup a fake btree_inode. The fact is we only use the radix tree for the fs_info, so everybody else who allocates an extent_io_tree is just wasting the space anyway. This patch moves the radix tree and its lock into btrfs_fs_info so there is less stuff I have to fake to do qgroup sanity tests. Thanks, This would make the fs_info::buffer_lock a global hotspot if alloc_extent_buffer and release_extent_buffer are called frequently. But, you can get rid of the buffer_lock completely, because the radix tree can be safely protected by rcu_read_lock/_unlock: * alloc_extent_buffer uses radix_preload that turns off preepmtion by itself, so the lock here would be pointless Except you still need a lock for other inserts. * release_extent_buffer locks around radix_tree_delete, here a rcu locking will be ok as well No it won't. RCU just makes sure readers don't get screwed, you still need to have real locking around the insertions/deletions, look at pagecache, we have mapping-tree_lock for this even though it uses rcu for the lookups. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: fix btrfstune silence on failure
Hi dave, On Fri, Dec 13, 2013 at 05:59:46PM +0800, Gui Hecheng wrote: Originally, btrfstune will fail without any options and just exit with no failure prompt. Works for me: $ ./btrfstune usage: btrfstune [options] device -S valueenable/disable seeding -r enable extended inode refs -x enable skinny metadata extent refs This is not the problem that this patch addressed, you can try this: # btrfstune /dev/sdb This will not print out anything though it return 1. Now, the number of arguments are checked before parse options and error msg will show up upon failure. No, the arguments should be parsed first. The btrfstune utility does not use the same parser helpers like check_argc_exact and actually the bug you see could be caused by missing optind = 1 before the while () loop. Can you please test if this helps? --- a/btrfstune.c +++ b/btrfstune.c @@ -115,6 +115,7 @@ int main(int argc, char *argv[]) int skinny_flag = 0; int ret; + optind = 1; The default value of optind is 1, though we'd better assign the value. I think Gui Hecheng s patch is right way to fix the problem, but maybe we can a check after arg passing, something like: if (!(seeding_flag + exrefs_flag + skinny_flag)) fprintf(stderr , You should assign at least one option for btrfstune); What is your idea^_^ Thanks, Wang while(1) { int c = getopt(argc, argv, S:rx); if (c 0) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume
On Tue, Dec 17, 2013 at 05:13:49PM +0800, Wang Shilong wrote: If we change our default subvolume, btrfs receive will fail to find subvolume. To fix this problem, i have two ideas. 1.make btrfs snapshot ioctl support passing source subvolume's objectid 2.when we want to using interval subvolume path, we mount it other place that use subvolume 5 as its default subvolume. 3. Tell the user to mount the toplevel subvol by himself and run receive again We'd better use the second approach because it won't bother kernel change. I don't think that the silent mount is the right way to fix it, that way the btrfs tool tooks responsibility not to break anything. Like the unhandled umount failure below. I think admins and power users do not like to see some random tool mess with the system like this. @@ -199,6 +200,10 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, char uuid_str[BTRFS_UUID_UNPARSED_SIZE]; struct btrfs_ioctl_vol_args_v2 args_v2; struct subvol_info *parent_subvol = NULL; + char *dev = NULL; + char tmp_name[15] = btrfs-XX; + char tmp_dir[30] = /tmp; Mounting valuable data under /tmp is dangerous, what if some /tmp cleaner starts to remove old files. I've seen that happen in practice. @@ -269,10 +308,14 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, fprintf(stderr, ERROR: creating snapshot %s - %s failed. %s\n, parent_subvol-path, path, strerror(-ret)); - goto out; } +out_umount: + umount(tmp_dir); umount fails for whatever reason, + rmdir(tmp_dir); at least this does not delete the files recursively. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: move the extent buffer radix tree into the fs_info
On Tue, Dec 17, 2013 at 09:56:07AM -0500, Josef Bacik wrote: * alloc_extent_buffer uses radix_preload that turns off preepmtion by itself, so the lock here would be pointless Except you still need a lock for other inserts. * release_extent_buffer locks around radix_tree_delete, here a rcu locking will be ok as well No it won't. RCU just makes sure readers don't get screwed, you still need to have real locking around the insertions/deletions, look at pagecache, we have mapping-tree_lock for this even though it uses rcu for the lookups. Oh, my bad sorry, that would be too easy. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume
Hi dave, On Tue, Dec 17, 2013 at 05:13:49PM +0800, Wang Shilong wrote: If we change our default subvolume, btrfs receive will fail to find subvolume. To fix this problem, i have two ideas. 1.make btrfs snapshot ioctl support passing source subvolume's objectid 2.when we want to using interval subvolume path, we mount it other place that use subvolume 5 as its default subvolume. 3. Tell the user to mount the toplevel subvol by himself and run receive again If we really don't want to bother kernel change, i think we can add a option for btrfs receive(for example -f) to force tool to resolve such ENOENT and at the same time, we output something like: fprintf(stderr, Default subvolume is changed,……….) if -f is not assigned, we will fail here. We'd better use the second approach because it won't bother kernel change. I don't think that the silent mount is the right way to fix it, that way the btrfs tool tooks responsibility not to break anything. Like the unhandled umount failure below. I think admins and power users do not like to see some random tool mess with the system like this. @@ -199,6 +200,10 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, char uuid_str[BTRFS_UUID_UNPARSED_SIZE]; struct btrfs_ioctl_vol_args_v2 args_v2; struct subvol_info *parent_subvol = NULL; +char *dev = NULL; +char tmp_name[15] = btrfs-XX; +char tmp_dir[30] = /tmp; Mounting valuable data under /tmp is dangerous, what if some /tmp cleaner starts to remove old files. I've seen that happen in practice. Agree with this. @@ -269,10 +308,14 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, fprintf(stderr, ERROR: creating snapshot %s - %s failed. %s\n, parent_subvol-path, path, strerror(-ret)); -goto out; } +out_umount: +umount(tmp_dir); umount fails for whatever reason, will fix it. +rmdir(tmp_dir); at least this does not delete the files recursively. Why we need delete the files recursively here, I only create dir ,something like /tmp/btrfs-X, and i only want to delete the temp dir btrfs- here… Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix use of uninitialized err variable
On Tue, Dec 17, 2013 at 3:27 PM, David Sterba dste...@suse.cz wrote: On Mon, Dec 16, 2013 at 05:03:25PM +, Filipe David Manana wrote: On Mon, Dec 16, 2013 at 2:34 PM, David Sterba dste...@suse.cz wrote: On Fri, Dec 13, 2013 at 07:39:34PM +, Filipe David Borba Manana wrote: From the compiler: fs/btrfs/file.c: In function ‘prepare_pages.isra.18’: fs/btrfs/file.c:1265:6: warning: ‘err’ may be used uninitialized in this function [-Wuninitialized] My gcc 4.8.1 does not see this warning, nor do I while inspecting the souces in current next-master. Here it's gcc 4.6.3. I've seen that some versions of gcc produce bogus warnings of that sort and manual review is needed, but I haven't found a code path that would lead to uninitialized use of err. The warning points to 1259 if (i == 0) 1260 err = prepare_uptodate_page(pages[i], pos, 1261 force_uptodate); 1262 if (i == num_pages - 1) 1263 err = prepare_uptodate_page(pages[i], 1264 pos + write_bytes, false); 1265 if (err) { 1266 page_cache_release(pages[i]); 1267 faili = i - 1; 1268 goto fail; 1269 } But the loop starts from i = 0 and the variable is initialized before the check. So ti's gcc that does not see that, not a real error. Right, my intention was to silence a compiler warning. Should have made it more explicit in the commit message title. -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] btrfs: subpagesize-blocksize: Define extent_buffer_head
On Mon, Dec 16, 2013 at 10:17:18AM -0600, Chandra Seetharaman wrote: On Mon, 2013-12-16 at 14:32 +0200, saeed bishara wrote: On Thu, Dec 12, 2013 at 1:38 AM, Chandra Seetharaman sekha...@us.ibm.com wrote: In order to handle multiple extent buffers per page, first we need to create a way to handle all the extent buffers that are attached to a page. This patch creates a new data structure eb_head, and moves fields that are common to all extent buffers in a page from extent buffer to eb_head. This also adds changes that are needed to handle multiple extent buffers per page case. Signed-off-by: Chandra Seetharaman sekha...@us.ibm.com --- snip diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 54ab861..02de448 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2106,14 +2106,16 @@ static inline void btrfs_set_token_##name(struct extent_buffer *eb, \ #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)\ static inline u##bits btrfs_##name(struct extent_buffer *eb) \ { \ - type *p = page_address(eb-pages[0]); \ + type *p = page_address(eb_head(eb)-pages[0]) + \ + (eb-start (PAGE_CACHE_SIZE -1)); \ you can use PAGE_CACHE_MASK instead of PAGE_CACHE_SIZE - 1 PAGE_CACHE_MASK get the page part of the value, not the offset in the page, i.e it is defined as #define PAGE_MASK (~(PAGE_SIZE-1)) Use ~PAGE_CACHE_MASK to get the offset. It's common, though not obvious at first. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix use of uninitialized err variable
On Mon, Dec 16, 2013 at 05:03:25PM +, Filipe David Manana wrote: On Mon, Dec 16, 2013 at 2:34 PM, David Sterba dste...@suse.cz wrote: On Fri, Dec 13, 2013 at 07:39:34PM +, Filipe David Borba Manana wrote: From the compiler: fs/btrfs/file.c: In function ‘prepare_pages.isra.18’: fs/btrfs/file.c:1265:6: warning: ‘err’ may be used uninitialized in this function [-Wuninitialized] My gcc 4.8.1 does not see this warning, nor do I while inspecting the souces in current next-master. Here it's gcc 4.6.3. I've seen that some versions of gcc produce bogus warnings of that sort and manual review is needed, but I haven't found a code path that would lead to uninitialized use of err. The warning points to 1259 if (i == 0) 1260 err = prepare_uptodate_page(pages[i], pos, 1261 force_uptodate); 1262 if (i == num_pages - 1) 1263 err = prepare_uptodate_page(pages[i], 1264 pos + write_bytes, false); 1265 if (err) { 1266 page_cache_release(pages[i]); 1267 faili = i - 1; 1268 goto fail; 1269 } But the loop starts from i = 0 and the variable is initialized before the check. So ti's gcc that does not see that, not a real error. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume
David Sterba dste...@suse.cz writes: On Tue, Dec 17, 2013 at 05:13:49PM +0800, Wang Shilong wrote: If we change our default subvolume, btrfs receive will fail to find subvolume. To fix this problem, i have two ideas. 1.make btrfs snapshot ioctl support passing source subvolume's objectid 2.when we want to using interval subvolume path, we mount it other place that use subvolume 5 as its default subvolume. 3. Tell the user to mount the toplevel subvol by himself and run receive again Ugh. I hope that would be considered a short-term hack waiting for a better solution, perhaps requiring a kernel upgrade. From a user's perspective there is no reason this should be necessary, and requiring this would be extraordinarily surprising. Why is btrfs unable to find my snapshot? It's right there! Moreover, this used to work just fine in previous versions of btrfs-progs. We'd better use the second approach because it won't bother kernel change. I don't think that the silent mount is the right way to fix it, that way the btrfs tool tooks responsibility not to break anything. Like the unhandled umount failure below. I think admins and power users do not like to see some random tool mess with the system like this. @@ -199,6 +200,10 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, char uuid_str[BTRFS_UUID_UNPARSED_SIZE]; struct btrfs_ioctl_vol_args_v2 args_v2; struct subvol_info *parent_subvol = NULL; +char *dev = NULL; +char tmp_name[15] = btrfs-XX; +char tmp_dir[30] = /tmp; Mounting valuable data under /tmp is dangerous, what if some /tmp cleaner starts to remove old files. I've seen that happen in practice. Agreed. If you _were_ to continue to implement it like this, you should include code to respect the TMPDIR envvar at the very least. -- Michael Welsh Duggan (m...@cert.org) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
What is needed to build an AFS fileserver on top of BTRFS?
It has occurred to me and others that something like BTRFS could be a good fit to build an AFS fileserver directly on top of. The question is what facilities would be needed from BTRFS to make this work? So I thought I'd kick off a shopping list;-) (1) 64-bit data version numbers that increase monotonically with each write. Yes, this is likely to cause some performance degredation as it introduces an ordering over data writes and metadata writes to a file. Maybe writes can be batched to improve performance? (2) Storage for ACLs and AFS UIDs. Having shareable ACLs might also be useful. Xattrs would likely do for this. (3) The ability to snapshot a filesystem to make backups and for pushing to read-only volume servers. (4) A 32-bit vnode number and 32-bit vnode uniquifier/generation number. These don't necessarily have to be stored by BTRFS directly but could instead be in a separate database file that gets snapshotted also. (5) The ability to set the vnode number, vnode uniquifier and data version number to specific values. Necessary to clone volumes and restore volume dumps. David -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is needed to build an AFS fileserver on top of BTRFS?
On Tue, 2013-12-17 at 16:53 +, David Howells wrote: It has occurred to me and others that something like BTRFS could be a good fit to build an AFS fileserver directly on top of. The question is what facilities would be needed from BTRFS to make this work? So I thought I'd kick off a shopping list;-) (1) 64-bit data version numbers that increase monotonically with each write. Yes, this is likely to cause some performance degredation as it introduces an ordering over data writes and metadata writes to a file. Maybe writes can be batched to improve performance? (2) Storage for ACLs and AFS UIDs. Having shareable ACLs might also be useful. Xattrs would likely do for this. (3) The ability to snapshot a filesystem to make backups and for pushing to read-only volume servers. (4) A 32-bit vnode number and 32-bit vnode uniquifier/generation number. These don't necessarily have to be stored by BTRFS directly but could instead be in a separate database file that gets snapshotted also. (5) The ability to set the vnode number, vnode uniquifier and data version number to specific values. Necessary to clone volumes and restore volume dumps. Hmmm, what exactly are vnodes? Could we put them in xattrs? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is needed to build an AFS fileserver on top of BTRFS?
On Tue, Dec 17, 2013 at 04:53:16PM +, David Howells wrote: It has occurred to me and others that something like BTRFS could be a good fit to build an AFS fileserver directly on top of. The question is what facilities would be needed from BTRFS to make this work? So I thought I'd kick off a shopping list;-) (1) 64-bit data version numbers that increase monotonically with each write. Yes, this is likely to cause some performance degredation as it introduces an ordering over data writes and metadata writes to a file. Maybe writes can be batched to improve performance? Do these have to be per-file? If not, then you might be able to get away with using the transid, which is a filesystem-global monotonically-increasing number. btrfs batches disk writes already, and uses the transid to differentiate these -- the writes come at 30 second intervals (by default, although there's an option to change the period). There may be multiple distinct changes to a single file within that transaction (although obviously, only the state of the file after the last one gets written to disk). I don't know exactly what you need it for, so this may or may not be appropriate here. Ceph uses transids for [something, mumble, wavy-hand] -- I don't know if the use-case for Ceph is equivalent to the use-case for AFS. (2) Storage for ACLs and AFS UIDs. Having shareable ACLs might also be useful. Xattrs would likely do for this. This would seem like a reasonable place to put them, given that that's what POSIX ACLs do, and we have POSIX ACL support already. (3) The ability to snapshot a filesystem to make backups and for pushing to read-only volume servers. We have snapshots of subvolumes, but not the filesystem as a whole. (4) A 32-bit vnode number and 32-bit vnode uniquifier/generation number. These don't necessarily have to be stored by BTRFS directly but could instead be in a separate database file that gets snapshotted also. (5) The ability to set the vnode number, vnode uniquifier and data version number to specific values. Necessary to clone volumes and restore volume dumps. What's a vnode meant to represent? I'm not familiar with the terminology. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Are you the man who rules the Universe? Well, I --- try not to. signature.asc Description: Digital signature
Re: What is needed to build an AFS fileserver on top of BTRFS?
Chris Mason c...@fb.com wrote: Hmmm, what exactly are vnodes? Could we put them in xattrs? vnode numbers are AFS's equivalent of inode numbers. Since they're one per file, they could be the object filename. Probably there would have to be a table of {vnode,latest_uniquifier} as the uniquifier must still go up even if the vnode is unused for a while, so there could also be a table of {vnode,btrfs_file}. David -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is needed to build an AFS fileserver on top of BTRFS?
Hugo Mills h...@carfax.org.uk wrote: (1) 64-bit data version numbers that increase monotonically with each write. Yes, this is likely to cause some performance degredation as it introduces an ordering over data writes and metadata writes to a file. Maybe writes can be batched to improve performance? Do these have to be per-file? If not, then you might be able to get away with using the transid, which is a filesystem-global monotonically-increasing number. Yes. If you send a write RPC op to the server, you get back the new version number. If the new version number is not the old version number + 1 you know there was a collision with a write from another client and you have to flush your cache for that file and request a new callback (ie. a promise to notify you if someone else changes the file). (3) The ability to snapshot a filesystem to make backups and for pushing to read-only volume servers. We have snapshots of subvolumes, but not the filesystem as a whole. By filesystem I meant the current state of an AFS volume. Very likely this would be represented by a BTRFS subvolume, if I understand it correctly. You might have several AFS volumes represented within a BTRFS filesystem. They would be manipulated independently. (5) The ability to set the vnode number, vnode uniquifier and data version number to specific values. Necessary to clone volumes and restore volume dumps. What's a vnode meant to represent? I'm not familiar with the terminology. AFS's equivalent of an inode with a 32-bit number representing it. See my reply to Chris's question about the same thing. David -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
On 12/12/13, Chris Mason c...@fb.com wrote: For me anyway, data=dup in mixed mode is definitely an accident ;) I personally think data dup is a false sense of security, but drives have gotten so huge that it may actually make sense in a few configurations. Sure, it's not about any security regarding the device. It's about the capability of recovering from any bit-rot which can creep into your backups and can be detected when you need the file after 20-30 generations of backups which is too late. (Who keeps that much incremental archive and reads backup logs of millions of files, regularly?) Someone asks for it roughly once a year, so it probably isn't a horrible idea. -chris Today, I've brought up an old 2 GB Seagate from the basement. Literaly, it has been Rusted. So it deserves the title of Spinning Rust for real. I had no hope whether it would work, but out of curiosity I plugged it into a USB-IDE box. It spinned up and wow!; it showed up among the devices. It had two swap and an ext2 partition. I remembered that it was one of the disk used for linux installations more than 10 years ago. I mounted it . Most of the files dates back to 2001-07. They are more than 12 years old and they seem to be intact with just one inode size missmatch. (See fsck output below). If there were BTRFS (and -d dup :) ) at the time, now I would perform a scrub and report the outcome here. Hence, 'Digital Archeology' can surely benefit from Btrfs. :) PS: And regarding the SSD data retension debate this can be an interesting benchmark for a device whick was kept in an unfavorable environment. Regards, Imran FSCK output: fsck from util-linux 2.20.1 e2fsck 1.42.8 (20-Jun-2013) /dev/sdb3 has gone 4209 days without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Special (device/socket/fifo) inode 82669 has non-zero size. Fixy? yes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/sdb3: * FILE SYSTEM WAS MODIFIED * /dev/sdb3: 41930/226688 files (1.0% non-contiguous), 200558/453096 blocks -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [OpenAFS-devel] Re: What is needed to build an AFS fileserver on top of BTRFS?
On Tue, 2013-12-17 at 17:47 +, David Howells wrote: Hugo Mills h...@carfax.org.uk wrote: (1) 64-bit data version numbers that increase monotonically with each write. Yes, this is likely to cause some performance degredation as it introduces an ordering over data writes and metadata writes to a file. Maybe writes can be batched to improve performance? Do these have to be per-file? If not, then you might be able to get away with using the transid, which is a filesystem-global monotonically-increasing number. Yes. If you send a write RPC op to the server, you get back the new version number. If the new version number is not the old version number + 1 you know there was a collision with a write from another client and you have to flush your cache for that file and request a new callback (ie. a promise to notify you if someone else changes the file). Right. So, the DV must increment by exactly one for each successful StoreData (and not for other changes). This is important because clients cache data and metadata independently, and cached data is labeled with the file's DV. This means that even if metadata for a file has to be refetched for some reason (for example, an expired callback), the _data_ doesn't have to be refetched unless it has actually changed, or been evicted from the client's cache due to cache pressure. -- Jeff -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [OpenAFS-devel] Re: What is needed to build an AFS fileserver on top of BTRFS?
On Tue, 2013-12-17 at 17:40 +, David Howells wrote: Chris Mason c...@fb.com wrote: Hmmm, what exactly are vnodes? Could we put them in xattrs? vnode numbers are AFS's equivalent of inode numbers. Since they're one per file, they could be the object filename. Yes, in fact, the volume, vnode number, uniqifier, and DV are effectively the name the fileserver uses for the underlying inode. Note that if the fileserver is maintaining the vnode indices, then you don't actually _need_ to store a uniqifier for normal operation, because at any given time, a volume can contain at most one vnode with a particular vnode number, and that vnode's uniqifier is stored in the index. The uniqifier is used on-the-wire to distinguish different files that existed at different points in time with the same vnode number. Probably there would have to be a table of {vnode,latest_uniquifier} as the uniquifier must still go up even if the vnode is unused for a while, so there could also be a table of {vnode,btrfs_file}. No, you don't actually have to do this. The OpenAFS fileserver maintains a single uniqifier for an entire volume, and simply increments it every time a vnode is created. -- Jeff -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs RAID1 File System Grew Something Extra
I have been using btrfs for my /home partition on my home machine for a few years now. I created the file system RAID1 using two disk partitions. Recently I noticed btrfs fi df shows extra Data, System, and Metadata allocations. And btrfs fi show indicates extra allocations on one of my disk drives accounting for the 20 MiB allocation in the df display. I'm confused. What does this mean? garry@vfr$ sudo btrfs subvolume list /home garry@vfr$ sudo btrfs filesystem df /home Data, RAID1: total=32.00GiB, used=21.01GiB -- Data, single: total=8.00MiB, used=0.00 System, RAID1: total=8.00MiB, used=12.00KiB -- System, single: total=4.00MiB, used=0.00 Metadata, RAID1: total=15.00GiB, used=424.60MiB -- Metadata, single: total=8.00MiB, used=0.00 garry@vfr$ sudo btrfs filesystem show /home Label: none uuid: 6c3aeff6-9a50-4481-a175-7b98980eb638 Total devices 2 FS bytes used 21.43GiB -- devid1 size 373.76GiB used 47.03GiB path /dev/sda4 devid2 size 373.76GiB used 47.01GiB path /dev/sdb4 Btrfs v3.12 garry@vfr$ If it matters, I create a snapshot each night and run a rsync backup to another drive and then delete the snapshot. garry@vfr$ uname -r 3.11.10-200.fc19.x86_64 garry@vfr$ rpm -q btrfs-progs btrfs-progs-3.12-1.fc19.x86_64 -- Garry T. Williams -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs RAID1 File System Grew Something Extra
Garry, this is a known bug in mkfs.btrfs, the workaround for now is to run balance on FS having some data. so that unused group- profile will go away. HTH, Anand On 12/18/2013 10:03 AM, Garry T. Williams wrote: I have been using btrfs for my /home partition on my home machine for a few years now. I created the file system RAID1 using two disk partitions. Recently I noticed btrfs fi df shows extra Data, System, and Metadata allocations. And btrfs fi show indicates extra allocations on one of my disk drives accounting for the 20 MiB allocation in the df display. I'm confused. What does this mean? garry@vfr$ sudo btrfs subvolume list /home garry@vfr$ sudo btrfs filesystem df /home Data, RAID1: total=32.00GiB, used=21.01GiB -- Data, single: total=8.00MiB, used=0.00 System, RAID1: total=8.00MiB, used=12.00KiB -- System, single: total=4.00MiB, used=0.00 Metadata, RAID1: total=15.00GiB, used=424.60MiB -- Metadata, single: total=8.00MiB, used=0.00 garry@vfr$ sudo btrfs filesystem show /home Label: none uuid: 6c3aeff6-9a50-4481-a175-7b98980eb638 Total devices 2 FS bytes used 21.43GiB -- devid1 size 373.76GiB used 47.03GiB path /dev/sda4 devid2 size 373.76GiB used 47.01GiB path /dev/sdb4 Btrfs v3.12 garry@vfr$ If it matters, I create a snapshot each night and run a rsync backup to another drive and then delete the snapshot. garry@vfr$ uname -r 3.11.10-200.fc19.x86_64 garry@vfr$ rpm -q btrfs-progs btrfs-progs-3.12-1.fc19.x86_64 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume
On Tue, 17 Dec 2013 10:40:41 -0500, Michael Welsh Duggan wrote: David Sterba dste...@suse.cz writes: On Tue, Dec 17, 2013 at 05:13:49PM +0800, Wang Shilong wrote: If we change our default subvolume, btrfs receive will fail to find subvolume. To fix this problem, i have two ideas. 1.make btrfs snapshot ioctl support passing source subvolume's objectid 2.when we want to using interval subvolume path, we mount it other place that use subvolume 5 as its default subvolume. 3. Tell the user to mount the toplevel subvol by himself and run receive again Ugh. I hope that would be considered a short-term hack waiting for a better solution, perhaps requiring a kernel upgrade. From a user's perspective there is no reason this should be necessary, and requiring this would be extraordinarily surprising. Why is btrfs unable to find my snapshot? It's right there! Moreover, this used to work just fine in previous versions of btrfs-progs. Though the snapshot is still in the fs, it is inaccessible because you mount some subvolume as the root, and you can not find the path to the snapshot. For example: There are two subvolumes in the fs, and they are in the root directory of the fs, just like real root directory |-subv0 |-subv1 Then if you mount the subv1 as the root directory, the real root directory of the fs and subv0 will be shielded, +---+ |real root directory| | |-subv0 | +---+ |-subv1 you can only access the files, directories, subvolumes... in the subv1. So the tool will report can not find BTW, it is impossible that the previous version of btrfs-progs can work well in this case. We'd better use the second approach because it won't bother kernel change. I don't think that the silent mount is the right way to fix it, that way the btrfs tool tooks responsibility not to break anything. Like the unhandled umount failure below. I think admins and power users do not like to see some random tool mess with the system like this. @@ -199,6 +200,10 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, char uuid_str[BTRFS_UUID_UNPARSED_SIZE]; struct btrfs_ioctl_vol_args_v2 args_v2; struct subvol_info *parent_subvol = NULL; + char *dev = NULL; + char tmp_name[15] = btrfs-XX; + char tmp_dir[30] = /tmp; Mounting valuable data under /tmp is dangerous, what if some /tmp cleaner starts to remove old files. I've seen that happen in practice. Agreed. If you _were_ to continue to implement it like this, you should include code to respect the TMPDIR envvar at the very least. Since the TMPDIR is not safe, I think the approach that David said is better. Let's tell the users why we can not find the subvolume, and ask the users to make the final decision. Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Cleanup the unused btrfs_check_super_valid.
Since in David's commit 1104a8855, there is nothing really check the super block now, the btrfs_check_super_valid function can be removed if no one else needs the function. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com Cc: David Sterba dste...@suse.cz --- fs/btrfs/disk-io.c | 18 -- 1 file changed, 18 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8072cfa..3bda365 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -56,8 +56,6 @@ static struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); static void free_fs_root(struct btrfs_root *root); -static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info, - int read_only); static void btrfs_destroy_ordered_operations(struct btrfs_transaction *t, struct btrfs_root *root); static void btrfs_destroy_ordered_extents(struct btrfs_root *root); @@ -2354,13 +2352,6 @@ int open_ctree(struct super_block *sb, memcpy(fs_info-fsid, fs_info-super_copy-fsid, BTRFS_FSID_SIZE); - ret = btrfs_check_super_valid(fs_info, sb-s_flags MS_RDONLY); - if (ret) { - printk(KERN_ERR btrfs: superblock contains fatal errors\n); - err = -EINVAL; - goto fail_alloc; - } - disk_super = fs_info-super_copy; if (!btrfs_super_root(disk_super)) goto fail_alloc; @@ -3705,15 +3696,6 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid) return btree_read_extent_buffer_pages(root, buf, 0, parent_transid); } -static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info, - int read_only) -{ - /* -* Placeholder for checks -*/ - return 0; -} - static void btrfs_error_commit_super(struct btrfs_root *root) { mutex_lock(root-fs_info-cleaner_mutex); -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume
Miao Xie mi...@cn.fujitsu.com writes: On Tue, 17 Dec 2013 10:40:41 -0500, Michael Welsh Duggan wrote: David Sterba dste...@suse.cz writes: On Tue, Dec 17, 2013 at 05:13:49PM +0800, Wang Shilong wrote: If we change our default subvolume, btrfs receive will fail to find subvolume. To fix this problem, i have two ideas. 1.make btrfs snapshot ioctl support passing source subvolume's objectid 2.when we want to using interval subvolume path, we mount it other place that use subvolume 5 as its default subvolume. 3. Tell the user to mount the toplevel subvol by himself and run receive again Ugh. I hope that would be considered a short-term hack waiting for a better solution, perhaps requiring a kernel upgrade. From a user's perspective there is no reason this should be necessary, and requiring this would be extraordinarily surprising. Why is btrfs unable to find my snapshot? It's right there! Moreover, this used to work just fine in previous versions of btrfs-progs. Though the snapshot is still in the fs, it is inaccessible because you mount some subvolume as the root, and you can not find the path to the snapshot. For example: There are two subvolumes in the fs, and they are in the root directory of the fs, just like real root directory |-subv0 |-subv1 Then if you mount the subv1 as the root directory, the real root directory of the fs and subv0 will be shielded, +---+ |real root directory| | |-subv0 | +---+ |-subv1 you can only access the files, directories, subvolumes... in the subv1. So the tool will report can not find BTW, it is impossible that the previous version of btrfs-progs can work well in this case. In that case I either misunderstand completely, or my problem is almost decidedly different. To recap, this is the command that failed: # ./btrfs send -p /snapshots/bo /snapshots/bp | ./btrfs receive /backup/snapshots/root/ At subvol /snapshots/bp At snapshot bp ioctl(BTRFS_IOC_TREE_SEARCH, uuid, key 48f0ebae83fd32f1, UUID_KEY, 90139d8200afeaab) ret=-1, error: No such file or directory ioctl(BTRFS_IOC_TREE_SEARCH, uuid, key 48f0ebae83fd32f1, UUID_KEY, 90139d8200afeaab) ret=-1, error: No such file or directory ERROR: could not find parent subvolume Now, I believe you are saying that this means that it can't find the bo snapshot in the backup volume. But it is mounted in the expected location: # ls -ld /backup/snapshots/root/bo/ drwxr-xr-x 1 root root 280 Dec 13 17:54 /backup/snapshots/root/bo/ and # ./btrfs sub list -p /backup/ | grep root/bo ID 1030 gen 1046 parent 5 top level 5 path snapshots/root/bo # btrfs sub show /backup/snapshots/root/bo/ /backup/snapshots/root/bo Name: bo uuid: 5e15ef24-f2d0-194f-886d-3f7afc7413a4 Parent uuid:9a226af3-8497-744b-90f7-d7e54d58946d Creation time: 2013-12-13 17:51:57 Object ID: 1030 Generation (Gen): 1046 Gen at creation:1042 Parent: 5 Top Level: 5 Flags: readonly Snapshot(s): Maybe I am missing some terminology here? Is there some output I can send to make the problem clearer? -- Michael Welsh Duggan (m...@md5i.com) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume
Hello Michael, On 12/18/2013 11:29 AM, Michael Welsh Duggan wrote: Miao Xie mi...@cn.fujitsu.com writes: On Tue, 17 Dec 2013 10:40:41 -0500, Michael Welsh Duggan wrote: David Sterba dste...@suse.cz writes: On Tue, Dec 17, 2013 at 05:13:49PM +0800, Wang Shilong wrote: If we change our default subvolume, btrfs receive will fail to find subvolume. To fix this problem, i have two ideas. 1.make btrfs snapshot ioctl support passing source subvolume's objectid 2.when we want to using interval subvolume path, we mount it other place that use subvolume 5 as its default subvolume. 3. Tell the user to mount the toplevel subvol by himself and run receive again Ugh. I hope that would be considered a short-term hack waiting for a better solution, perhaps requiring a kernel upgrade. From a user's perspective there is no reason this should be necessary, and requiring this would be extraordinarily surprising. Why is btrfs unable to find my snapshot? It's right there! Moreover, this used to work just fine in previous versions of btrfs-progs. Though the snapshot is still in the fs, it is inaccessible because you mount some subvolume as the root, and you can not find the path to the snapshot. For example: There are two subvolumes in the fs, and they are in the root directory of the fs, just like real root directory |-subv0 |-subv1 Then if you mount the subv1 as the root directory, the real root directory of the fs and subv0 will be shielded, +---+ |real root directory| | |-subv0 | +---+ |-subv1 you can only access the files, directories, subvolumes... in the subv1. So the tool will report can not find BTW, it is impossible that the previous version of btrfs-progs can work well in this case. In that case I either misunderstand completely, or my problem is almost decidedly different. To recap, this is the command that failed: # ./btrfs send -p /snapshots/bo /snapshots/bp | ./btrfs receive /backup/snapshots/root/ At subvol /snapshots/bp At snapshot bp ioctl(BTRFS_IOC_TREE_SEARCH, uuid, key 48f0ebae83fd32f1, UUID_KEY, 90139d8200afeaab) ret=-1, error: No such file or directory ioctl(BTRFS_IOC_TREE_SEARCH, uuid, key 48f0ebae83fd32f1, UUID_KEY, 90139d8200afeaab) ret=-1, error: No such file or directory ERROR: could not find parent subvolume It seems that you use older kernel version but use the latest btrfs-progs, new btrfs-progs use uuid tree to search but this tree did not exist yet. Can you try to upgrade your kernel? Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] btrfs-progs: fix btrfstune silence on failure
Originally, btrfstune will fail without any options, like this: # btrfstune /dev/sdb An error prompt usage should show up upon this condition. Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com --- V1 - V2: add optind assignment to make reviewers happy; print error msg if no options provided --- btrfstune.c | 9 + 1 file changed, 9 insertions(+) diff --git a/btrfstune.c b/btrfstune.c index 50724ba..da82f36 100644 --- a/btrfstune.c +++ b/btrfstune.c @@ -115,6 +115,7 @@ int main(int argc, char *argv[]) int skinny_flag = 0; int ret; + optind = 1; while(1) { int c = getopt(argc, argv, S:rx); if (c 0) @@ -143,6 +144,13 @@ int main(int argc, char *argv[]) return 1; } + if (!(seeding_flag + extrefs_flag + skinny_flag)) { + fprintf(stderr, + ERROR: At least one option should be assigned.\n); + print_usage(); + return 1; + } + if (check_mounted(device)) { fprintf(stderr, %s is mounted\n, device); return 1; @@ -176,6 +184,7 @@ int main(int argc, char *argv[]) } else { root-fs_info-readonly = 1; ret = 1; + fprintf(stderr, btrfstune failed\n); } close_ctree(root); -- 1.8.0.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 3/3] btrfs-progs: handle error in the btrfs_prepare_device
this patch will handle the strerror reporting of the error instead of printing errno, and also replaced the BUG_ON with the error handling Signed-off-by: Anand Jain anand.j...@oracle.com --- v4: replaced ? statement with proper if statement v3: fix per Stefan review, update error message v2: commit update cmds-device.c |7 +++ cmds-replace.c |9 - mkfs.c |9 - utils.c| 31 --- 4 files changed, 35 insertions(+), 21 deletions(-) diff --git a/cmds-device.c b/cmds-device.c index bc4a8dc..ada0bcd 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -111,13 +111,11 @@ static int cmd_add_dev(int argc, char **argv) res = btrfs_prepare_device(devfd, argv[i], 1, dev_block_count, 0, mixed, discard); + close(devfd); if (res) { - fprintf(stderr, ERROR: Unable to init '%s'\n, argv[i]); - close(devfd); ret++; - continue; + goto error_out; } - close(devfd); strncpy_null(ioctl_args.name, argv[i]); res = ioctl(fdmnt, BTRFS_IOC_ADD_DEV, ioctl_args); @@ -130,6 +128,7 @@ static int cmd_add_dev(int argc, char **argv) } +error_out: close_file_or_dir(fdmnt, dirstream); return !!ret; } diff --git a/cmds-replace.c b/cmds-replace.c index d9b0940..c683d6c 100644 --- a/cmds-replace.c +++ b/cmds-replace.c @@ -276,12 +276,11 @@ static int cmd_start_replace(int argc, char **argv) } strncpy((char *)start_args.start.tgtdev_name, dstdev, BTRFS_DEVICE_PATH_NAME_MAX); - if (btrfs_prepare_device(fddstdev, dstdev, 1, dstdev_block_count, 0, -mixed, 0)) { - fprintf(stderr, Error: Failed to prepare device '%s'\n, - dstdev); + ret = btrfs_prepare_device(fddstdev, dstdev, 1, dstdev_block_count, 0, +mixed, 0); + if (ret) goto leave_with_error; - } + close(fddstdev); fddstdev = -1; diff --git a/mkfs.c b/mkfs.c index 33369f9..18df087 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1446,6 +1446,10 @@ int main(int ac, char **av) first_file = file; ret = btrfs_prepare_device(fd, file, zero_end, dev_block_count, block_count, mixed, discard); + if (ret) { + close(fd); + exit(1); + } if (block_count block_count dev_block_count) { fprintf(stderr, %s is smaller than requested size\n, file); exit(1); @@ -1553,8 +1557,11 @@ int main(int ac, char **av) } ret = btrfs_prepare_device(fd, file, zero_end, dev_block_count, block_count, mixed, discard); + if (ret) { + close(fd); + exit(1); + } mixed = old_mixed; - BUG_ON(ret); ret = btrfs_add_to_fsid(trans, root, fd, file, dev_block_count, sectorsize, sectorsize, sectorsize); diff --git a/utils.c b/utils.c index f499023..f37083a 100644 --- a/utils.c +++ b/utils.c @@ -581,13 +581,13 @@ int btrfs_prepare_device(int fd, char *file, int zero_end, u64 *block_count_ret, ret = fstat(fd, st); if (ret 0) { fprintf(stderr, unable to stat %s\n, file); - exit(1); + return 1; } block_count = btrfs_device_size(fd, st); if (block_count == 0) { fprintf(stderr, unable to find %s size\n, file); - exit(1); + return 1; } if (max_block_count) block_count = min(block_count, max_block_count); @@ -612,26 +612,35 @@ int btrfs_prepare_device(int fd, char *file, int zero_end, u64 *block_count_ret, } ret = zero_dev_start(fd); - if (ret) { - fprintf(stderr, failed to zero device start %d\n, ret); - exit(1); - } + if (ret) + goto zero_dev_error; for (i = 0 ; i BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); if (bytenr = block_count) break; - zero_blocks(fd, bytenr, BTRFS_SUPER_INFO_SIZE); + ret = zero_blocks(fd, bytenr, BTRFS_SUPER_INFO_SIZE); + if (ret) + goto zero_dev_error; } if (zero_end) { ret = zero_dev_end(fd, block_count); - if (ret) { - fprintf(stderr, failed to zero device end %d\n, ret); -
Re: Btrfs RAID1 File System Grew Something Extra
On 12-18-13 10:46:29 Anand Jain wrote: On 12/18/2013 10:03 AM, Garry T. Williams wrote: I have been using btrfs for my /home partition on my home machine for a few years now. I created the file system RAID1 using two disk partitions. Recently I noticed btrfs fi df shows extra Data, System, and Metadata allocations. And btrfs fi show indicates extra allocations on one of my disk drives accounting for the 20 MiB allocation in the df display. this is a known bug in mkfs.btrfs, the workaround for now is to run balance on FS having some data. so that unused group- profile will go away. Thanks. garry@vfr$ sudo btrfs balance start /home Done, had to relocate 50 out of 50 chunks garry@vfr$ sudo btrfs filesystem df /home Data, RAID1: total=22.00GiB, used=21.02GiB System, RAID1: total=32.00MiB, used=12.00KiB System, single: total=4.00MiB, used=0.00 Metadata, RAID1: total=1.00GiB, used=419.60MiB Hmmm. Well, it's better, but the extra allocation for System is baffling. I believe that this happened sometime after creating the file system. Also balance on a RAID1 file system with exactly two drives doesn't make much sense to me. Why would any chunks have to be relocated? I'm clearly missing something here. -- Garry T. Williams -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume
Wang Shilong wangsl.f...@cn.fujitsu.com writes: It seems that you use older kernel version but use the latest btrfs-progs, new btrfs-progs use uuid tree to search but this tree did not exist yet. Can you try to upgrade your kernel? What version is necessary? (I am currently on 3.11.10.) -- Michael Welsh Duggan (m...@md5i.com) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume
On 12/18/2013 12:06 PM, Michael Welsh Duggan wrote: Wang Shilong wangsl.f...@cn.fujitsu.com writes: It seems that you use older kernel version but use the latest btrfs-progs, new btrfs-progs use uuid tree to search but this tree did not exist yet. Can you try to upgrade your kernel? What version is necessary? (I am currently on 3.11.10.) 3.12 is ok, btw, can you run for 3.11.10 #dmesg Let's see if it output somthing like: btrfs: can not found root: 9 Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html