[PATCH] btrfs-progs: ins: fix arg order in print_inode_item()
In the print_inode_item(), the argument order of sequence and flags are reversed: printf("... sequence %llu flags 0x%llx(%s)\n", ... (unsigned long long)btrfs_inode_flags(eb,ii), (unsigned long long)btrfs_inode_sequence(eb, ii), ...) So, just fix it. Signed-off-by: Tomohiro Misono --- print-tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/print-tree.c b/print-tree.c index 3c585e3..8abd760 100644 --- a/print-tree.c +++ b/print-tree.c @@ -896,8 +896,8 @@ static void print_inode_item(struct extent_buffer *eb, btrfs_inode_uid(eb, ii), btrfs_inode_gid(eb, ii), (unsigned long long)btrfs_inode_rdev(eb,ii), - (unsigned long long)btrfs_inode_flags(eb,ii), (unsigned long long)btrfs_inode_sequence(eb, ii), + (unsigned long long)btrfs_inode_flags(eb,ii), flags_str); print_timespec(eb, btrfs_inode_atime(ii), "\t\tatime ", "\n"); print_timespec(eb, btrfs_inode_ctime(ii), "\t\tctime ", "\n"); -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE
Currently, the top-level subvolume lacks the UUID. As a result, both non-snapshot subvolume and snapshot of top-level subvolume do not have Parent UUID and cannot be distinguisued. Therefore "fi show" of top-level lists all the subvolumes which lacks the UUID in "Snapshot(s)". Also, it lacks the otime information. Fix this by adding the UUID and otime at the mkfs time. As a consequence, snapshots of top-level subvolume now have a Parent UUID and UUID tree will create an entry for top-level subvolume at mount time. Signed-off-by: Tomohiro Misono --- ctree.h | 5 + mkfs/common.c | 14 ++ mkfs/main.c | 3 +++ 3 files changed, 22 insertions(+) diff --git a/ctree.h b/ctree.h index 2280659..5737978 100644 --- a/ctree.h +++ b/ctree.h @@ -2071,6 +2071,11 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct btrfs_root_item, stransid, 64); BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item, rtransid, 64); +static inline void btrfs_set_root_uuid(struct btrfs_root_item *root_item, +u8 uuid[BTRFS_UUID_SIZE]) +{ + memcpy(root_item->uuid, uuid, BTRFS_UUID_SIZE); +} /* struct btrfs_root_backup */ BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup, diff --git a/mkfs/common.c b/mkfs/common.c index c9ce10d..808d343 100644 --- a/mkfs/common.c +++ b/mkfs/common.c @@ -44,6 +44,7 @@ static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg, u32 itemoff; int ret = 0; int blk; + u8 uuid[BTRFS_UUID_SIZE]; memset(buf->data + sizeof(struct btrfs_header), 0, cfg->nodesize - sizeof(struct btrfs_header)); @@ -77,6 +78,19 @@ static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg, btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff); btrfs_set_item_size(buf, btrfs_item_nr(nritems), sizeof(root_item)); + if (blk == MKFS_FS_TREE) { + time_t now = time(NULL); + + uuid_generate(uuid); + btrfs_set_root_uuid(&root_item, uuid); + btrfs_set_stack_timespec_sec(&root_item.otime, now); + btrfs_set_stack_timespec_sec(&root_item.ctime, now); + } else { + memset(uuid, 0, BTRFS_UUID_SIZE); + btrfs_set_root_uuid(&root_item, uuid); + btrfs_set_stack_timespec_sec(&root_item.otime, 0); + btrfs_set_stack_timespec_sec(&root_item.ctime, 0); + } write_extent_buffer(buf, &root_item, btrfs_item_ptr_offset(buf, nritems), sizeof(root_item)); diff --git a/mkfs/main.c b/mkfs/main.c index 1b4cabc..4184a2d 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -329,6 +329,7 @@ static int create_tree(struct btrfs_trans_handle *trans, struct btrfs_key location; struct btrfs_root_item root_item; struct extent_buffer *tmp; + u8 uuid[BTRFS_UUID_SIZE] = {0}; int ret; ret = btrfs_copy_root(trans, root, root->node, &tmp, objectid); @@ -339,6 +340,8 @@ static int create_tree(struct btrfs_trans_handle *trans, btrfs_set_root_bytenr(&root_item, tmp->start); btrfs_set_root_level(&root_item, btrfs_header_level(tmp)); btrfs_set_root_generation(&root_item, trans->transid); + /* clear uuid of source tree */ + btrfs_set_root_uuid(&root_item, uuid); free_extent_buffer(tmp); location.objectid = objectid; -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs-progs: show: allow fi show -u for FS_TREE
Allow "fi show -u" for uuid of top-level subvolume too for consistency. Signed-off-by: Tomohiro Misono --- btrfs-list.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/btrfs-list.c b/btrfs-list.c index b6d7658..91fdab8 100644 --- a/btrfs-list.c +++ b/btrfs-list.c @@ -1594,9 +1594,15 @@ int btrfs_get_subvol(int fd, struct root_info *the_ri) ri = rb_entry(rbn, struct root_info, rb_node); rr = resolve_root(&rl, ri, root_id); if (rr == -ENOENT) { - ret = -ENOENT; - rbn = rb_next(rbn); - continue; + if (ri->root_id == BTRFS_FS_TREE_OBJECTID) { + ri->path = strdup("/"); + ri->name = strdup(""); + ri->full_path = strdup("/"); + } else { + ret = -ENOENT; + rbn = rb_next(rbn); + continue; + } } if (!comp_entry_with_rootid(the_ri, ri, 0) || -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: ins: fix arg order in print_inode_item()
On 2017年10月30日 16:10, Misono, Tomohiro wrote: > In the print_inode_item(), the argument order of sequence and flags are > reversed: > > printf("... sequence %llu flags 0x%llx(%s)\n", > ... > (unsigned long long)btrfs_inode_flags(eb,ii), > (unsigned long long)btrfs_inode_sequence(eb, ii), > ...) > > So, just fix it. > > Signed-off-by: Tomohiro Misono Reviewed-by: Qu Wenruo Thanks, Qu > --- > print-tree.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/print-tree.c b/print-tree.c > index 3c585e3..8abd760 100644 > --- a/print-tree.c > +++ b/print-tree.c > @@ -896,8 +896,8 @@ static void print_inode_item(struct extent_buffer *eb, > btrfs_inode_uid(eb, ii), > btrfs_inode_gid(eb, ii), > (unsigned long long)btrfs_inode_rdev(eb,ii), > -(unsigned long long)btrfs_inode_flags(eb,ii), > (unsigned long long)btrfs_inode_sequence(eb, ii), > +(unsigned long long)btrfs_inode_flags(eb,ii), > flags_str); > print_timespec(eb, btrfs_inode_atime(ii), "\t\tatime ", "\n"); > print_timespec(eb, btrfs_inode_ctime(ii), "\t\tctime ", "\n"); > signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/2] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE
On 2017年10月30日 16:14, Misono, Tomohiro wrote: > Currently, the top-level subvolume lacks the UUID. As a result, both > non-snapshot subvolume and snapshot of top-level subvolume do not have > Parent UUID and cannot be distinguisued. Therefore "fi show" of > top-level lists all the subvolumes which lacks the UUID in > "Snapshot(s)". Also, it lacks the otime information. What about creating a wiki page for ROOT_ITEM and detailed restriction for its members? > > Fix this by adding the UUID and otime at the mkfs time. As a > consequence, snapshots of top-level subvolume now have a Parent UUID and > UUID tree will create an entry for top-level subvolume at mount time. This patch also inspires me about the btrfs_create_tree() function I'm introducing should also populate its UUID and timespec. > > Signed-off-by: Tomohiro Misono > --- > ctree.h | 5 + > mkfs/common.c | 14 ++ > mkfs/main.c | 3 +++ > 3 files changed, 22 insertions(+) > > diff --git a/ctree.h b/ctree.h > index 2280659..5737978 100644 > --- a/ctree.h > +++ b/ctree.h > @@ -2071,6 +2071,11 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct > btrfs_root_item, >stransid, 64); > BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item, >rtransid, 64); > +static inline void btrfs_set_root_uuid(struct btrfs_root_item *root_item, > + u8 uuid[BTRFS_UUID_SIZE]) > +{ > + memcpy(root_item->uuid, uuid, BTRFS_UUID_SIZE); > +} This is the stack version. However there is no extent buffer version to set uuid as we just call write_extent_buffer(), so it seems unnecessary to me. > > /* struct btrfs_root_backup */ > BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup, > diff --git a/mkfs/common.c b/mkfs/common.c > index c9ce10d..808d343 100644 > --- a/mkfs/common.c > +++ b/mkfs/common.c > @@ -44,6 +44,7 @@ static int btrfs_create_tree_root(int fd, struct > btrfs_mkfs_config *cfg, > u32 itemoff; > int ret = 0; > int blk; > + u8 uuid[BTRFS_UUID_SIZE]; > > memset(buf->data + sizeof(struct btrfs_header), 0, > cfg->nodesize - sizeof(struct btrfs_header)); > @@ -77,6 +78,19 @@ static int btrfs_create_tree_root(int fd, struct > btrfs_mkfs_config *cfg, > btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff); > btrfs_set_item_size(buf, btrfs_item_nr(nritems), > sizeof(root_item)); > + if (blk == MKFS_FS_TREE) { > + time_t now = time(NULL); > + > + uuid_generate(uuid); > + btrfs_set_root_uuid(&root_item, uuid); > + btrfs_set_stack_timespec_sec(&root_item.otime, now); > + btrfs_set_stack_timespec_sec(&root_item.ctime, now); > + } else { > + memset(uuid, 0, BTRFS_UUID_SIZE); This leads to the question about the UUID meaning of different trees. AFAIK every tree created by kernel has its own UUID, no one uses all zero. In kernel btrfs_create_tree(), it always calls uuid_le_gen() to generate a new UUID. So I'm wonder if such branch is really needed. And maybe we fix all tree creation to generate UUID. Despite the question about the meaning of UUID for root_item, I think the patch really makes sense. Thanks, Qu > + btrfs_set_root_uuid(&root_item, uuid); > + btrfs_set_stack_timespec_sec(&root_item.otime, 0); > + btrfs_set_stack_timespec_sec(&root_item.ctime, 0); > + } > write_extent_buffer(buf, &root_item, > btrfs_item_ptr_offset(buf, nritems), > sizeof(root_item)); > diff --git a/mkfs/main.c b/mkfs/main.c > index 1b4cabc..4184a2d 100644 > --- a/mkfs/main.c > +++ b/mkfs/main.c > @@ -329,6 +329,7 @@ static int create_tree(struct btrfs_trans_handle *trans, > struct btrfs_key location; > struct btrfs_root_item root_item; > struct extent_buffer *tmp; > + u8 uuid[BTRFS_UUID_SIZE] = {0}; > int ret; > > ret = btrfs_copy_root(trans, root, root->node, &tmp, objectid); > @@ -339,6 +340,8 @@ static int create_tree(struct btrfs_trans_handle *trans, > btrfs_set_root_bytenr(&root_item, tmp->start); > btrfs_set_root_level(&root_item, btrfs_header_level(tmp)); > btrfs_set_root_generation(&root_item, trans->transid); > + /* clear uuid of source tree */ > + btrfs_set_root_uuid(&root_item, uuid); > free_extent_buffer(tmp); > > location.objectid = objectid; > signature.asc Description: OpenPGP digital signature
Re: [PATCH] btrfs-progs: ins: fix arg order in print_inode_item()
On 2017年10月30日 16:10, Misono, Tomohiro wrote: > In the print_inode_item(), the argument order of sequence and flags are > reversed: > > printf("... sequence %llu flags 0x%llx(%s)\n", > ... > (unsigned long long)btrfs_inode_flags(eb,ii), > (unsigned long long)btrfs_inode_sequence(eb, ii), > ...) > > So, just fix it. Not related to this patch, but considering you're going to enhance root_item creation, it's also a good idea to print such info like c/o/s/r time and parent/receive UUID. More patches will never be a bad thing. Thanks, Qu > > Signed-off-by: Tomohiro Misono > --- > print-tree.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/print-tree.c b/print-tree.c > index 3c585e3..8abd760 100644 > --- a/print-tree.c > +++ b/print-tree.c > @@ -896,8 +896,8 @@ static void print_inode_item(struct extent_buffer *eb, > btrfs_inode_uid(eb, ii), > btrfs_inode_gid(eb, ii), > (unsigned long long)btrfs_inode_rdev(eb,ii), > -(unsigned long long)btrfs_inode_flags(eb,ii), > (unsigned long long)btrfs_inode_sequence(eb, ii), > +(unsigned long long)btrfs_inode_flags(eb,ii), > flags_str); > print_timespec(eb, btrfs_inode_atime(ii), "\t\tatime ", "\n"); > print_timespec(eb, btrfs_inode_ctime(ii), "\t\tctime ", "\n"); > signature.asc Description: OpenPGP digital signature
Re: USB upgrade fun
On Sun, 29 Oct 2017, at 06:02 PM, Qu Wenruo wrote: > If so, mount it, do minimal write like creating an empty file, to update > both superblock copies, and then try fix-device-size. Tried that, and it didn't work. Made a recording: https://youtu.be/SFd3QscNT6w -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: clean up btrfs_dev_stat_inc() usage
On Sat, Oct 21, 2017 at 01:45:33AM +0800, Anand Jain wrote: > btrfs_end_bio() is using btrfs_dev_stat_inc() and then > btrfs_dev_stat_print_on_error() separately instead use > btrfs_dev_stat_inc_and_print() directly. > > As of now there isn't any bio in btrfs which is - a non-empty write and > also the REQ_PREFLUSH flag is set. So in actual the condition >if (bio->bi_opf & REQ_PREFLUSH) > is never true in btrfs_end_bio(), and so there won't be any redundant error > log by using btrfs_dev_stat_inc_and_print() separately one for write and > another for flush. > > This consolidation will help to add the device critical error handles in the > function btrfs_dev_stat_inc_and_print() and which can be renamed as needed. > > Signed-off-by: Anand Jain Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] btrfs: Remove redundant memory barrier
On Fri, Oct 20, 2017 at 06:10:58PM +0300, Nikolay Borisov wrote: > As per atomic_t.txt documentation : > - RMW operations that have a return value are fully ordered; > > atomic_xchg is one such operation so it already includes everything it needs > w.r.t memory ordering and add a comment ot be more explicit about that. > > Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: Fix bug for misused dev_t when lookup in dev state hash table.
On Thu, Oct 19, 2017 at 09:49:27AM +0800, Gu Jinxiang wrote: > From: Gu JinXiang > > Fix bug of commit 74d46992e0d9 > ("block: replace bi_bdev with a gendisk pointer and partitions index"). > > In this modify, use bio_dev(bio) to find dev state in function > __btrfsic_submit_bio. But when dev_state added to hashtable, it is using > dev_t of block_device. > > bio_dev(bio) returns a dev_t of part0 which is different from dev_t in > block_device(bd_dev). bd_dev in block_device represents the exact > partition. > block_device.bd_dev = > bio->bi_partno (same as block_device.bd_partno) + bio_dev(bio). > > When add a dev_state into hashtable it is using the exact partition's dev_t. > So when lookup it, it should also use the exact partition's dev_t. > > Reproduce of this bug: > Use MOUNT_OPTIONS="-o check_int" when run btrfs/001 in xfstest. > Then there will be WARNING like below. > WARNING: > btrfs: attempt to write superblock which references block M @29523968 (sda7 > /654400/2) which is never written! > > changelog: > v1->v2: Add explanation that bio_dev(bio) is different with > block_device(bd_dev). > > Signed-off-by: Gu JinXiang Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization
On Fri, 2017-05-12 at 12:21 -0400, J. Bruce Fields wrote: > On Fri, May 12, 2017 at 08:22:23AM +1000, NeilBrown wrote: > > On Thu, May 11 2017, J. Bruce Fields wrote: > > > +static inline u64 nfsd4_change_attribute(struct inode *inode) > > > +{ > > > + u64 chattr; > > > + > > > + chattr = inode->i_ctime.tv_sec << 30; > > > + chattr += inode->i_ctime.tv_nsec; > > > + chattr += inode->i_version; > > > + return chattr; > > > > So if I chmod a file, all clients will need to flush the content from their > > cache? > > Maybe they already do? Maybe it is a boring corner case? > > Yeah, that's the assumption, maybe it's wrong. I can't recall > complaints about anyone bitten by that case. > I'm pretty sure that's required by the RFC. The change attribute changes with both data and metadata changes, and there is no way to tell what sort of change it was. You have to dump everything out of the cache when it changes. > > > /* > > > * Fill in the pre_op attr for the wcc data > > > */ > > > @@ -253,7 +263,7 @@ fill_pre_wcc(struct svc_fh *fhp) > > > fhp->fh_pre_mtime = inode->i_mtime; > > > fhp->fh_pre_ctime = inode->i_ctime; > > > fhp->fh_pre_size = inode->i_size; > > > - fhp->fh_pre_change = inode->i_version; > > > + fhp->fh_pre_change = nfsd4_change_attribute(inode); > > > fhp->fh_pre_saved = true; > > > } > > > } > > > --- a/fs/nfsd/nfs3xdr.c > > > +++ b/fs/nfsd/nfs3xdr.c > > > @@ -260,7 +260,7 @@ void fill_post_wcc(struct svc_fh *fhp) > > > printk("nfsd: inode locked twice during operation.\n"); > > > > > > err = fh_getattr(fhp, &fhp->fh_post_attr); > > > - fhp->fh_post_change = d_inode(fhp->fh_dentry)->i_version; > > > + fhp->fh_post_change = nfsd4_change_attribute(d_inode(fhp->fh_dentry)); > > > if (err) { > > > fhp->fh_post_saved = false; > > > /* Grab the ctime anyway - set_change_info might use it */ > > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > > > index 26780d53a6f9..a09532d4a383 100644 > > > --- a/fs/nfsd/nfs4xdr.c > > > +++ b/fs/nfsd/nfs4xdr.c > > > @@ -1973,7 +1973,7 @@ static __be32 *encode_change(__be32 *p, struct > > > kstat *stat, struct inode *inode, > > > *p++ = cpu_to_be32(convert_to_wallclock(exp->cd->flush_time)); > > > *p++ = 0; > > > } else if (IS_I_VERSION(inode)) { > > > - p = xdr_encode_hyper(p, inode->i_version); > > > + p = xdr_encode_hyper(p, nfsd4_change_attribute(inode)); > > > } else { > > > *p++ = cpu_to_be32(stat->ctime.tv_sec); > > > *p++ = cpu_to_be32(stat->ctime.tv_nsec); > > > > It is *really* confusing to find that fh_post_change is only set in nfs3 > > code, and only used in nfs4 code. > > Yup. > > > It is probably time to get a 'version' field in 'struct kstat'. > > The pre/post_wcc code doesn't seem to be doing an explicit stat, I > wonder if that matters? > Probably not for now. We only use this for namespace altering operations anyway (create, link, unlink, and rename). The post code does do a fh_getattr. It's only the pre-op i_version that comes out of the cache. Only btrfs, xfs, and ext4 have a real i_version counter today, and they just scrape that info out of the in-core inode. So while not completely atomic, you should see a difference in the change_info4 during any of those operations. FWIW, userland cephfs now supports a cluster-coherent change attribute, though the kernel client still needs some work before we can implement it there. Eventually we'll add that, and at that point we might need to have nfsd do a getattr in the pre part as well. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the purpose of EXTENT_PAGE_MAPPED
On 27.10.2017 07:17, Liu Bo wrote: > On Tue, Oct 24, 2017 at 05:47:11AM -0500, Goldwyn Rodrigues wrote: >> >> EXTENT_PAGE_MAPPED gets set in set_page_extent_mapped(), but I don't see >> it being cross checked anytime. What is the purpose of setting it? > > Please check commit d1310b2e0cd98eb1348553e69b73827b436dca7b, it was > used to differentiate page for metadata and for data, but I think > currently it's just a piece of legacy code. Be that as it may - is there any reason why we are keeping this and can it be killed off? > > -liubo > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Remove redundant mirror_num arg
On Tue, Oct 24, 2017 at 11:50:39AM +0300, Nikolay Borisov wrote: > The following callpath is always invoked with mirror_num set to 0, so let's > remove it as an argument and directly pass 0 to __do_redpage. No functional > change > > extent_readpages > __extent_readpages > __do_contiguous_readpages > __do_readpage > > Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: add write_flags for compression bio
On Mon, Oct 23, 2017 at 11:18:16PM -0600, Liu Bo wrote: > Compression code path has only flaged bios with REQ_OP_WRITE no matter > where the bios come from, but it could be a sync write if fsync starts > this writeback or a normal writeback write if wb kthread starts a > periodic writeback. > > It breaks the rule that sync writes and writeback writes need to be > differentiated from each other, because from the POV of block layer, > all bios need to be recognized by these flags in order to do some > management, e.g. throttlling. > > This passes writeback_control to compression write path so that it can > send bios with proper flags to block layer. > > Signed-off-by: Liu Bo Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the purpose of EXTENT_PAGE_MAPPED
On Mon, Oct 30, 2017 at 03:21:51PM +0200, Nikolay Borisov wrote: > > > On 27.10.2017 07:17, Liu Bo wrote: > > On Tue, Oct 24, 2017 at 05:47:11AM -0500, Goldwyn Rodrigues wrote: > >> > >> EXTENT_PAGE_MAPPED gets set in set_page_extent_mapped(), but I don't see > >> it being cross checked anytime. What is the purpose of setting it? > > > > Please check commit d1310b2e0cd98eb1348553e69b73827b436dca7b, it was > > used to differentiate page for metadata and for data, but I think > > currently it's just a piece of legacy code. > > Be that as it may - is there any reason why we are keeping this and can > it be killed off? Are we're talking about EXTENT_PAGE_PRIVATE? There's no EXTENT_PAGE_MAPPED. There's some control dependency on the page private bit and the value, so we should be careful and replace the function with an assert (or a BUG_ON if it's a must-not-happen state). The page->private points to an extent buffer, and if it's always an eb, then the EXTENT_PAGE_PRIVATE is unused. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the purpose of EXTENT_PAGE_MAPPED
On 30.10.2017 15:55, David Sterba wrote: > On Mon, Oct 30, 2017 at 03:21:51PM +0200, Nikolay Borisov wrote: >> >> >> On 27.10.2017 07:17, Liu Bo wrote: >>> On Tue, Oct 24, 2017 at 05:47:11AM -0500, Goldwyn Rodrigues wrote: EXTENT_PAGE_MAPPED gets set in set_page_extent_mapped(), but I don't see it being cross checked anytime. What is the purpose of setting it? >>> >>> Please check commit d1310b2e0cd98eb1348553e69b73827b436dca7b, it was >>> used to differentiate page for metadata and for data, but I think >>> currently it's just a piece of legacy code. >> >> Be that as it may - is there any reason why we are keeping this and can >> it be killed off? > > Are we're talking about EXTENT_PAGE_PRIVATE? There's no > EXTENT_PAGE_MAPPED. There's some control dependency on the page private > bit and the value, so we should be careful and replace the function with > an assert (or a BUG_ON if it's a must-not-happen state). The page->private > points to an extent buffer, and if it's always an eb, then the >EXTENT_PAGE_PRIVATE is unused. I guess I meant do we actually need: set_page_extent_mapped and all the jazz happening in it or is it a leftover (which I believe it is) ? > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the purpose of EXTENT_PAGE_MAPPED
On 10/30/2017 09:01 AM, Nikolay Borisov wrote: > > > On 30.10.2017 15:55, David Sterba wrote: >> On Mon, Oct 30, 2017 at 03:21:51PM +0200, Nikolay Borisov wrote: >>> >>> >>> On 27.10.2017 07:17, Liu Bo wrote: On Tue, Oct 24, 2017 at 05:47:11AM -0500, Goldwyn Rodrigues wrote: > > EXTENT_PAGE_MAPPED gets set in set_page_extent_mapped(), but I don't see > it being cross checked anytime. What is the purpose of setting it? Please check commit d1310b2e0cd98eb1348553e69b73827b436dca7b, it was used to differentiate page for metadata and for data, but I think currently it's just a piece of legacy code. >>> >>> Be that as it may - is there any reason why we are keeping this and can >>> it be killed off? >> >> Are we're talking about EXTENT_PAGE_PRIVATE? There's no >> EXTENT_PAGE_MAPPED. There's some control dependency on the page private >> bit and the value, so we should be careful and replace the function with >> an assert (or a BUG_ON if it's a must-not-happen state). The page->private >> points to an extent buffer, and if it's always an eb, then the >> EXTENT_PAGE_PRIVATE is unused. > > I guess I meant do we actually need: set_page_extent_mapped and all the > jazz happening in it or is it a leftover (which I believe it is) ? > We definitely need set_page_extent_mapped() to set the page private for the lowers layers to handle I/O, primarily writebacks. However, we may not need EXTENT_PAGE_PRIVATE (yes, I got it wrong the first time). I am still reading on this though. -- Goldwyn -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"
Lakshmipathi.G posted on Mon, Oct 30, 2017 at 12:07 AM as excerpted: > Can you share these steps as simple bash script? Currently I'm running > few tests, I can check your script and share the results. sure, but it is pretty trivial: device1=/dev/disk/by-id/XX1 device2=/dev/disk/by-id/XX2 device3=/dev/disk/by-id/XX3 mkfs.btrfs -L OfflineJ $device1 $device2 $device3 mount -t btrfs $device1 /mnt btrfs subvolume create /mnt/dataroot mkdir -p /media/OfflineJ umount /mnt mount -o subvol=/dataroot $device1 /media/OfflineJ > > The only thing I could think of is that the btrfs version that I used to > > mkfs > > was not up to date. Is there a way to determine which version was used to > > create the filesystem? > > > If I'm not wrong, i don't think we can get mkfs version from existing file > system layout. what was your kernel version? Not quite sure. Oldest: archlinux linux-lts as of 2017.06.08 Most likely: archlinux linux-lts as of 2017.10.18 It may be possible that I updated and created the fs without rebooting or reloading the kernel modules. But normally in that case any interaction, even mounting, just fails loudly. Duncan posted on Mon, Oct 30, 2017 at 12:09 AM as excerpted: > and I downclocked it a notch (from its rated pc3200 to 3000, IIRC). At Hmm, I did have a mild overclock on this system when I used it as a desktop but I thought I reverted it when I made it into a server. I'll double check this. > sometimes, but not always, trigger as well, but the most frequent symptom > really was checksum verification failures untarring tbz2s. I'll have to try and recreate this using my EXT4 system disk. Zak Kohler posted on Sun, Oct 29, 2017 at 9:57 PM as excerpted: > I'm running '$ sudo btrfs scrub start --offline --progress > /dev/disk/by-id/WD-XX3' one more time to just to make sure that I > wasn't reading something wrong. Well even stranger news on the fourth run: $ sudo btrfs scrub start --offline --progress /dev/disk/by-id/WD-XX3 Csum error: 175 # first run Csum error: 112 # second run... Csum error: 285 # third run... Csum error: 0 # fourth run Also can anyone confirm that the above --offline command checks the whole filesystem regardless which device is specified on the cli? > Only other think I can think to try is transferring those drives to > another system. I still need to try this also. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] btrfs: match btrfs_device->mode same as it used for open
On Fri, Oct 20, 2017 at 10:07:15PM +0800, Anand Jain wrote: > We aren't setting the FMODE_WRITE when initializing btrfs_device > structure and when calling blkdev_put, however we are setting it > only when calling blkdev_get_by_path(). But this still does not say why this is a problem worth fixing. Nikolay asked for it, and I would do the same, but why do we even have to ask for that? https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: add support for fallocate's zero range operation
On Wed, Oct 25, 2017 at 03:59:08PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > This implements support the zero range operation of fallocate. For now > at least it's as simple as possible while reusing most of the existing > fallocate and hole punching infrastructure. > > Signed-off-by: Filipe Manana > --- I'll add this to for-next so we can more testing coverage, review is open. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: Fix transaction abort during failure in btrfs_rm_dev_item
On Mon, Oct 23, 2017 at 10:29:27AM -0600, Edmund Nadolski wrote: > > --- a/fs/btrfs/volumes.c > > +++ b/fs/btrfs/volumes.c > > @@ -1765,20 +1765,24 @@ static int btrfs_rm_dev_item(struct btrfs_fs_info > > *fs_info, > > key.offset = device->devid; > > > > ret = btrfs_search_slot(trans, root, &key, path, -1, 1); > > - if (ret < 0) > > - goto out; > > - > > - if (ret > 0) { > > - ret = -ENOENT; > > + if (ret) { > > + if (ret > 0) > > + ret = -ENOENT; > > + btrfs_abort_transaction(trans, ret); > > + btrfs_end_transaction(trans); > > goto out; > > } > > > > ret = btrfs_del_item(trans, root, path); > > - if (ret) > > - goto out; > > + if (ret) { > > + btrfs_abort_transaction(trans, ret); > > + btrfs_end_transaction(trans); > > + } > > + > > out: > > btrfs_free_path(path); > > - btrfs_commit_transaction(trans); > > + if (!ret) > > + ret = btrfs_commit_transaction(trans); > > return ret; > > } > > > > > > Perhaps slightly simpler (and the 'out:' label maybe goes away): > > . > ret = btrfs_search_slot(trans, root, &key, path, -1, 1); > if (ret > 0) > ret = -ENOENT; > else if (!ret) > ret = btrfs_del_item(trans, root, path); > > if (ret) { > btrfs_abort_transaction(trans, ret); > btrfs_end_transaction(trans); This would merge 2 abort sites into one, btrfs_search_slot and btrfs_del_item, so it wouldn't be obvious which one failed just from the stack trace and line number. V4 is ok, as it only joins return values from btrfs_search_slot, where the positive value means "search slot was not able to find the key, but this is where you should insert it". This really translates to ENOENT so we're not losing any information. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: Make btrfs_async_run_delayed_root use a loop rather than multiple labels
On Mon, Oct 23, 2017 at 07:43:02PM +0800, Qu Wenruo wrote: > > > On 2017年10月23日 18:51, Nikolay Borisov wrote: > > Currently btrfs_async_run_delayed_root's implementation uses 3 goto labels > > to > > mimic the functionality of a simple do {} while loop. Refactor the function > > to use a do {} while construct, making intention clear and code easier to > > follow. No functional changes > > > > Signed-off-by: Nikolay Borisov > > Looks good to me. > > Reviewed-by: Qu Wenruo 1-2 added to next. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: remove rcu_barrier in btrfs_close_devices
On Wed, Oct 11, 2017 at 11:02:49AM -0600, Liu Bo wrote: > On Wed, Oct 11, 2017 at 03:41:23PM +0800, Anand Jain wrote: > > > > > > On 10/11/2017 02:11 PM, Anand Jain wrote: > > > > > > > > > On 10/11/2017 05:51 AM, Liu Bo wrote: > > > > It was introduced because btrfs used to do blkdev_put in a deferred > > > > work, now that btrfs has put blkdev in place, this rcu_barrier can be > > > > removed. > > > > On the 2nd thought, modprobe -r btrfs would still need rcu_barrier(), some > > where else outside of umount context ? > > Thanks a lot for the comments. > > modprobe -r btrfs will do btrfs_cleanup_fs_uuids(), where it cleanup > every %fs_devices on the list, but when we do btrfs_close_devices(), we > have replaced the devices on the list with dummy ones which only have > the same name and uuid, so modprobe -r btrfs will free those instead > of what we were using, this change won't cause a problem for it. Added to the changelog. Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: bail out gracefully rather than BUG_ON
If a file's DIR_ITEM key is invalid (due to memory errors) and gets written to disk, a future lookup_path can end up with kernel panic due to BUG_ON(). This gets rid of the BUG_ON(), meanwhile output the corrupted key and return ENOENT if it's invalid. Signed-off-by: Liu Bo --- The diff doesn't show the logic well, 'goto out_err' will return with assigning 0 to location->objectid, and the caller already has a check for (location->objectid == 0) to return -ENOENT. fs/btrfs/inode.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d94e3f6..916cdc9 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5500,6 +5500,14 @@ static int btrfs_inode_by_name(struct inode *dir, struct dentry *dentry, goto out_err; btrfs_dir_item_key_to_cpu(path->nodes[0], di, location); + if (location->type != BTRFS_INODE_ITEM_KEY && + location->type != BTRFS_ROOT_ITEM_KEY) { + btrfs_warn(root->fs_info, + "%s gets something invalid in DIR_ITEM (name %s, directory ino %llu, location(%llu %u %llu))", + __func__, name, btrfs_ino(BTRFS_I(dir)), + location->objectid, location->type, location->offset); + goto out_err; + } out: btrfs_free_path(path); return ret; @@ -5816,8 +5824,6 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry) return inode; } - BUG_ON(location.type != BTRFS_ROOT_ITEM_KEY); - index = srcu_read_lock(&fs_info->subvol_srcu); ret = fixup_tree_root_location(fs_info, dir, dentry, &location, &sub_root); -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC] Btrfs: remove max_active and thresh
First and foremost, here are the problems we have right now, a) %thread_pool is configurable via mount option, however for those 'workers' who really needs concurrency, there are at most "min(num_cpus+2, 8)" threads running to process works, and the value can't be tuned after mount because they're tagged with NO_THRESHOLD. b) For THRESHOLD workers, btrfs is adjusting how concurrency will happen by starting with the minimum max_active and calling btrfs_workqueue_set_max() to grow it on demand, but it also has a upper limit, "min(num_cpus+2, 8)", which means at most we can have such many threads running at the same time. In fact, kernel workqueue can be created on demand and destroyed once no work needs to be processed. The small max_active limitation (8) from btrfs has prevented us from utilizing all available cpus, and that is against why we choose workqueue. What this patch does: - resizing %thread_pool is totally removed as they are no long needed, while keeping its mount option for compatibility. - The default "0" is passed when allocating workqueue, so the maximum number of running works is typically 256. And all fields for limiting max_active are removed, including current_active, limit_active, thresh etc. Signed-off-by: Liu Bo --- fs/btrfs/async-thread.c | 155 +++ fs/btrfs/async-thread.h | 27 +++-- fs/btrfs/delayed-inode.c | 7 ++- fs/btrfs/disk-io.c | 61 ++- fs/btrfs/scrub.c | 11 ++-- fs/btrfs/super.c | 32 -- 6 files changed, 58 insertions(+), 235 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index e00c8a9..dd627ad 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -29,41 +29,6 @@ #define WORK_ORDER_DONE_BIT 1 #define WORK_HIGH_PRIO_BIT 2 -#define NO_THRESHOLD (-1) -#define DFT_THRESHOLD (32) - -struct __btrfs_workqueue { - struct workqueue_struct *normal_wq; - - /* File system this workqueue services */ - struct btrfs_fs_info *fs_info; - - /* List head pointing to ordered work list */ - struct list_head ordered_list; - - /* Spinlock for ordered_list */ - spinlock_t list_lock; - - /* Thresholding related variants */ - atomic_t pending; - - /* Up limit of concurrency workers */ - int limit_active; - - /* Current number of concurrency workers */ - int current_active; - - /* Threshold to change current_active */ - int thresh; - unsigned int count; - spinlock_t thres_lock; -}; - -struct btrfs_workqueue { - struct __btrfs_workqueue *normal; - struct __btrfs_workqueue *high; -}; - static void normal_work_helper(struct btrfs_work *work); #define BTRFS_WORK_HELPER(name)\ @@ -86,20 +51,6 @@ btrfs_work_owner(const struct btrfs_work *work) return work->wq->fs_info; } -bool btrfs_workqueue_normal_congested(const struct btrfs_workqueue *wq) -{ - /* -* We could compare wq->normal->pending with num_online_cpus() -* to support "thresh == NO_THRESHOLD" case, but it requires -* moving up atomic_inc/dec in thresh_queue/exec_hook. Let's -* postpone it until someone needs the support of that case. -*/ - if (wq->normal->thresh == NO_THRESHOLD) - return false; - - return atomic_read(&wq->normal->pending) > wq->normal->thresh * 2; -} - BTRFS_WORK_HELPER(worker_helper); BTRFS_WORK_HELPER(delalloc_helper); BTRFS_WORK_HELPER(flush_delalloc_helper); @@ -125,7 +76,7 @@ BTRFS_WORK_HELPER(scrubparity_helper); static struct __btrfs_workqueue * __btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info, const char *name, - unsigned int flags, int limit_active, int thresh) + unsigned int flags) { struct __btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_KERNEL); @@ -133,31 +84,15 @@ __btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info, const char *name, return NULL; ret->fs_info = fs_info; - ret->limit_active = limit_active; atomic_set(&ret->pending, 0); - if (thresh == 0) - thresh = DFT_THRESHOLD; - /* For low threshold, disabling threshold is a better choice */ - if (thresh < DFT_THRESHOLD) { - ret->current_active = limit_active; - ret->thresh = NO_THRESHOLD; - } else { - /* -* For threshold-able wq, let its concurrency grow on demand. -* Use minimal max_active at alloc time to reduce resource -* usage. -*/ - ret->current_active = 1; - ret->thresh = thresh; - } if (flags & WQ_HIGHPRI) ret->normal_wq = alloc_workqueue("%s-%s-high", flags, -ret->curren
[PATCH] Btrfs: kill threshold for submit_workers
"submit_workers" is a workqueue that serves to collect and dispatch IOs on each device in btrfs, thus the work that is queued on it is per-device, which means at most there're as many works as the number of devices owned by a btrfs. Now we've set threshhold (=64) for "submit_workers" and the 'max_active' work of this workqueue is set to 1 and will be updated accrodingly. However, as the threshold (64) is the highest one and the 'max_active' gets updated only when there're more works than its threshold at the same time, the end result is that it's almostly unlikely to update 'max_active' because you'll need >64 devices to have a chance to do that. Given the above fact, in most cases, works on each device which process IOs is run in order by only one kthread of 'submit_workers'. It's OK for DUP and raid0 since at the same time IOs are always submitted to one device, while for raid1 and raid10, where our primary bio only completes when all cloned bios submitted by each raid1/10's device complete[1], it's suboptimal. This changes the threshold to 1 so that __btrfs_alloc_workqueue() can use NO_THRESHOLD for 'submit_workers' and the 'max_active' will be min(num_devices, thread_pool_size). [1]: raid1 example, primary bio /\ bio1 bio2 | | dev1 dev2 | | endio1 endio2 \/ endio Signed-off-by: Liu Bo --- fs/btrfs/disk-io.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dfdab84..373d3f6f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2368,15 +2368,10 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info, fs_info->caching_workers = btrfs_alloc_workqueue(fs_info, "cache", flags, max_active, 0); - /* -* a higher idle thresh on the submit workers makes it much more -* likely that bios will be send down in a sane order to the -* devices -*/ fs_info->submit_workers = btrfs_alloc_workqueue(fs_info, "submit", flags, min_t(u64, fs_devices->num_devices, - max_active), 64); + max_active), 1); fs_info->fixup_workers = btrfs_alloc_workqueue(fs_info, "fixup", flags, 1, 0); -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: bail out gracefully rather than BUG_ON
On Mon, Oct 30, 2017 at 11:14:38AM -0600, Liu Bo wrote: > If a file's DIR_ITEM key is invalid (due to memory errors) and gets > written to disk, a future lookup_path can end up with kernel panic due > to BUG_ON(). > > This gets rid of the BUG_ON(), meanwhile output the corrupted key and > return ENOENT if it's invalid. > The kernel panic is originally Reported-by: Guillaume Bouchard Thanks, -liubo > Signed-off-by: Liu Bo > --- > The diff doesn't show the logic well, 'goto out_err' will return with > assigning 0 to location->objectid, and the caller already has a check > for (location->objectid == 0) to return -ENOENT. > > fs/btrfs/inode.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index d94e3f6..916cdc9 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -5500,6 +5500,14 @@ static int btrfs_inode_by_name(struct inode *dir, > struct dentry *dentry, > goto out_err; > > btrfs_dir_item_key_to_cpu(path->nodes[0], di, location); > + if (location->type != BTRFS_INODE_ITEM_KEY && > + location->type != BTRFS_ROOT_ITEM_KEY) { > + btrfs_warn(root->fs_info, > + "%s gets something invalid in DIR_ITEM (name %s, directory ino %llu, > location(%llu %u %llu))", > +__func__, name, btrfs_ino(BTRFS_I(dir)), > +location->objectid, location->type, > location->offset); > + goto out_err; > + } > out: > btrfs_free_path(path); > return ret; > @@ -5816,8 +5824,6 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, > struct dentry *dentry) > return inode; > } > > - BUG_ON(location.type != BTRFS_ROOT_ITEM_KEY); > - > index = srcu_read_lock(&fs_info->subvol_srcu); > ret = fixup_tree_root_location(fs_info, dir, dentry, > &location, &sub_root); > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: avoid losing data raid profile when deleting a device
On Mon, Oct 16, 2017 at 11:53:08AM +0300, Nikolay Borisov wrote: > > > On 13.10.2017 23:51, Liu Bo wrote: > > On Wed, Oct 11, 2017 at 10:38:50AM +0300, Nikolay Borisov wrote: > >> > >> > >> On 10.10.2017 20:53, Liu Bo wrote: > >>> We've avoided data losing raid profile when doing balance, but it > >>> turns out that deleting a device could also result in the same > >>> problem > >>> > >>> This fixes the problem by creating an empty data chunk before > >>> relocating the data chunk. > >> > >> Why is this needed - copy the metadata of the to-be-relocated chunk into > >> the newly created empty chunk? I don't entirely understand that code but > >> doesn't this seem a bit like a hack in order to stash some information? > >> Perhaps you could elaborate the logic a bit more in the changelog? > >> > >>> > >>> Metadata/System chunk are supposed to have non-zero bytes all the time > >>> so their raid profile is persistent. > >> > >> I think this changelog is a bit scarce on detail as to the culprit of > >> the problem. Could you perhaps put a sentence or two what the underlying > >> logic which deletes the raid profile if a chunk is empty ? > >> > > > > Fair enough. > > > > The problem is as same as what commit 2c9fe8355258 ("btrfs: Fix > > lost-data-profile caused by balance bg") had fixed. > > > > Similar to doing balance, deleting a device can also move all chunks > > on this disk to other available disks, after 'move' successfully, > > it'll remove those chunks. > > > > If our last data chunk is empty and part of it happens to be on this > ^ > "Last data chunk" of which disk - the "to-be-removed" one or the disk to > which chunks have been relocated? > It refers to the 'to-be-removed' disk. > > disk, then there is no data chunk in this btrfs after deleting the >^ > Which disk are you referring to - the one being removed? Yes, the 'to-be-removed' disk. Thanks, -liubo > > device successfully, any following write will try to create a new data > > chunk which ends up with a single data chunk because the only > > available data raid profile is 'single'. > > > > thanks, > > -liubo > > > >>> > >>> Reported-by: James Alandt > >>> Signed-off-by: Liu Bo > >>> --- > >>> > >>> v2: - return the correct error. > >>> - move helper ahead of __btrfs_balance(). > >>> > >>> fs/btrfs/volumes.c | 84 > >>> ++ > >>> 1 file changed, 65 insertions(+), 19 deletions(-) > >>> > >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > >>> index 4a72c45..a74396d 100644 > >>> --- a/fs/btrfs/volumes.c > >>> +++ b/fs/btrfs/volumes.c > >>> @@ -3018,6 +3018,48 @@ static int btrfs_relocate_sys_chunks(struct > >>> btrfs_fs_info *fs_info) > >>> return ret; > >>> } > >>> > >>> +/* > >>> + * return 1 : allocate a data chunk successfully, > >>> + * return <0: errors during allocating a data chunk, > >>> + * return 0 : no need to allocate a data chunk. > >>> + */ > >>> +static int btrfs_may_alloc_data_chunk(struct btrfs_fs_info *fs_info, > >>> + u64 chunk_offset) > >>> +{ > >>> + struct btrfs_block_group_cache *cache; > >>> + u64 bytes_used; > >>> + u64 chunk_type; > >>> + > >>> + cache = btrfs_lookup_block_group(fs_info, chunk_offset); > >>> + ASSERT(cache); > >>> + chunk_type = cache->flags; > >>> + btrfs_put_block_group(cache); > >>> + > >>> + if (chunk_type & BTRFS_BLOCK_GROUP_DATA) { > >>> + spin_lock(&fs_info->data_sinfo->lock); > >>> + bytes_used = fs_info->data_sinfo->bytes_used; > >>> + spin_unlock(&fs_info->data_sinfo->lock); > >>> + > >>> + if (!bytes_used) { > >>> + struct btrfs_trans_handle *trans; > >>> + int ret; > >>> + > >>> + trans = btrfs_join_transaction(fs_info->tree_root); > >>> + if (IS_ERR(trans)) > >>> + return PTR_ERR(trans); > >>> + > >>> + ret = btrfs_force_chunk_alloc(trans, fs_info, > >>> + BTRFS_BLOCK_GROUP_DATA); > >>> + btrfs_end_transaction(trans); > >>> + if (ret < 0) > >>> + return ret; > >>> + > >>> + return 1; > >>> + } > >>> + } > >>> + return 0; > >>> +} > >>> + > >>> static int insert_balance_item(struct btrfs_fs_info *fs_info, > >>> struct btrfs_balance_control *bctl) > >>> { > >>> @@ -3476,7 +3518,6 @@ static int __btrfs_balance(struct btrfs_fs_info > >>> *fs_info) > >>> u32 count_meta = 0; > >>> u32 count_sys = 0; > >>> int chunk_reserved = 0; > >>> - u64 bytes_used = 0; > >>> > >>> /* step one make some room on all the devices */ > >>> devices = &fs_info->fs_devices->devices; > >>> @@ -3635,28 +3676,21 @@ static int __btrfs_balance(struct btrfs_fs_info > >>> *fs_info) > >>> goto loop; > >>> } > >>> > >>> - ASSERT(fs_info->data_sinfo); > >>> -
Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"
On Mon, Oct 30, 2017 at 2:57 AM, Zak Kohler wrote: > $ sudo btrfs scrub start --offline --progress /dev/disk/by-id/WD-XX1 > Scrub result: > Tree bytes scrubbed: 5234425856 > Tree extents scrubbed: 638968 > Data bytes scrubbed: 4353723670528 > Data extents scrubbed: 374300 > Data bytes without csum: 533200896 > Read error: 0 > Verify error: 0 > Csum error: 150 > > $ sudo btrfs scrub start --offline --progress /dev/disk/by-id/WD-XX2 > Scrub result: > Tree bytes scrubbed: 5234425856 > Tree extents scrubbed: 638967 > Data bytes scrubbed: 4353723314176 > Data extents scrubbed: 374300 > Data bytes without csum: 533200896 > Read error: 0 > Verify error: 0 > Csum error: 238 > > $ sudo btrfs scrub start --offline --progress /dev/disk/by-id/WD-XX3 > Scrub result: > Tree bytes scrubbed: 5234491392 > Tree extents scrubbed: 638975 > Data bytes scrubbed: 4353723572224 > Data extents scrubbed: 374300 > Data bytes without csum: 533200896 > Read error: 0 > Verify error: 0 > Csum error: 175 #first run > Csum error: 112 #second run... > Csum error: 285 #third run... > > But I ran the /dev/disk/by-id/WD-XX3 device three times and you can > see the result... I expect these commands are the same, and involve all three drives in the offline scrub each time. So you have five different results, but all five involve csum errors. So the errors have a certain transience to them, hence inconsistent results. But the online scrub consistently reports zero errors. That to me sounds like a bug in the offline scrub code. Maybe it's confused, and reports data without csums (nodatacow) as csum errors? That does not explain the inconsistency though. And then you're getting an consistent failure, but at an inconsistent location, with Btrfs send, ostensibly due to IO error, which sounds like it's hitting a bad csum check. It is entirely possible to get transient errors like this somewhere in a storage stack that's otherwise not reported by the error detection code in that layer. The thing I really don't understand is how you're getting zero errors with conventional online scrub, every time. On my tiny 23G installation I'm traveling with, I get the same results with all three scrub methods on an NVMe drive. Zero errors. The slighly larger spinning rust drives are not with me so I can't check them for a while. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with file system
On Mon, Oct 30, 2017 at 4:31 AM, Dave wrote: > This is a very helpful thread. I want to share an interesting related story. > > We have a machine with 4 btrfs volumes and 4 Snapper configs. I > recently discovered that Snapper timeline cleanup been turned off for > 3 of those volumes. In the Snapper configs I found this setting: > > TIMELINE_CLEANUP="no" > > Normally that would be set to "yes". So I corrected the issue and set > it to "yes" for the 3 volumes where it had not been set correctly. > > I suppose it was turned off temporarily and then somebody forgot to > turn it back on. > > What I did not know, and what I did not realize was a critical piece > of information, was how long timeline cleanup had been turned off and > how many snapshots had accumulated on each volume in that time. > > I naively re-enabled Snapper timeline cleanup. The instant I started > the snapper-cleanup.service the system was hosed. The ssh session > became unresponsive, no other ssh sessions could be established and it > was impossible to log into the system at the console. > > My subsequent investigation showed that the root filesystem volume > accumulated more than 3000 btrfs snapshots. The two other affected > volumes also had very large numbers of snapshots. > > Deleting a single snapshot in that situation would likely require > hours. (I set up a test, but I ran out of patience before I was able > to delete even a single snapshot.) My guess is that if we had been > patient enough to wait for all the snapshots to be deleted, the > process would have finished in some number of months (or maybe a > year). > > We did not know most of this at the time, so we did what we usually do > when a system becomes totally unresponsive -- we did a hard reset. Of > course, we could never get the system to boot up again. > > Since we had backups, the easiest option became to replace that system > -- not unlike what the OP decided to do. In our case, the hardware was > not old, so we simply reformatted the drives and reinstalled Linux. > > That's a drastic consequence of changing TIMELINE_CLEANUP="no" to > TIMELINE_CLEANUP="yes" in the snapper config. Without a complete autopsy on the file system, it's unclear whether it was fixable with available tools, and why it wouldn't mount normally, or if necessary do its own autorecovery with one of the available backup roots. But off hand it sounds like hardware was sabotaging the expected write ordering. How to test a given hardware setup for that, I think, is really overdue. It affects literally every file system, and Linux storage technology. It kinda sounds like to me something other than supers is being overwritten too soon, and that's why it's possible for none of the backup roots to find a valid root tree, because all four possible root trees either haven't actually been written yet (still) or they've been overwritten, even though the super is updated. But again, it's speculation, we don't actually know why your system was no longer mountable. > > It's all part of the process of gaining critical experience with > BTRFS. Whether or not BTRFS is ready for production use is (it seems > to me) mostly a question of how knowledgeable and experienced are the > people administering it. "Btrfs is a copy on write filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, repair and easy administration." That is the current descriptive text at Documentation/filesystems/btrfs.txt for some time now. > In the various online discussions on this topic, all the focus is on > whether or not BTRFS itself is production-ready. At the current > maturity level of BTRFS, I think that's the wrong focus. The right > focus is on how production-ready is the admin person or team (with > respect to their BTRFS knowledge and experience). When a filesystem > has been around for decades, most of the critical admin issues become > fairly common knowledge, fairly widely known and easy to find. When a > filesystem is newer, far fewer people understand the gotchas. Also, in > older or widely used filesystems, when someone hits a gotcha, the > response isn't "that filesystem is not ready for production". Instead > the response is, "you should have known not to do that." That is not a general purpose file system. It's a file system for admins who understand where the bodies are buried. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Fix quota reservation leak on preallocated files
Commit c6887cd11149 (Btrfs: don't do nocow check unless we have to) changed the behavior of __btrfs_buffered_write() so that it first tries to get a data space reservation, and then skips the relatively expensive nocow check if the reservation succeeded. If we have quotas enabled, the data space reservation also includes a quota reservation. But in the rewrite case, the space has already been accounted for in qgroups. So btrfs_check_data_free_space() increases the quota reservation, but it never gets decreased when the data actually gets written and overwrites the pre-existing data. So we're left with both the qgroup and qgroup reservation accounting for the same space. This commit adds the missing btrfs_qgroup_free_data() call in the case of BTRFS_ORDERED_PREALLOC extents. Signed-off-by: Justin Maggard --- fs/btrfs/inode.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d94e3f68b9b1..d7881ccf5310 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3006,6 +3006,8 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) compress_type = ordered_extent->compress_type; if (test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) { BUG_ON(compress_type); + btrfs_qgroup_free_data(inode, NULL, ordered_extent->file_offset, + ordered_extent->len); ret = btrfs_mark_extent_written(trans, BTRFS_I(inode), ordered_extent->file_offset, ordered_extent->file_offset + -- 2.14.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: test if receive with qgroups corrupts metadata
This test case does some concurrent send/receives with qgroups enabled. Currently (4.14-rc7) this usually results in btrfs check errors, and often also results in a WARN_ON in record_root_in_trans(). Bisecting points to 6426c7ad697d (btrfs: qgroup: Fix qgroup accounting when creating snapshot) as the culprit. Signed-off-by: Justin Maggard --- tests/btrfs/152 | 102 tests/btrfs/152.out | 13 +++ tests/btrfs/group | 1 + 3 files changed, 116 insertions(+) create mode 100755 tests/btrfs/152 create mode 100644 tests/btrfs/152.out diff --git a/tests/btrfs/152 b/tests/btrfs/152 new file mode 100755 index 000..ebb88ed --- /dev/null +++ b/tests/btrfs/152 @@ -0,0 +1,102 @@ +#! /bin/bash +# FS QA Test No. btrfs/152 +# +# Test that incremental send/receive operations don't corrupt metadata when +# qgroups are enabled. +# +#--- +# +# Copyright (c) 2017 NETGEAR, Inc. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch + +rm -f $seqres.full + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +# Enable quotas +$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT + +# Create 2 source and 4 destination subvolumes +for subvol in subvol1 subvol2 recv1_1 recv1_2 recv2_1 recv2_2; do + $BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/$subvol | _filter_scratch +done +mkdir $SCRATCH_MNT/subvol{1,2}/.snapshots +touch $SCRATCH_MNT/subvol{1,2}/foo + +# Create base snapshots and send them +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \ + $SCRATCH_MNT/subvol1/.snapshots/1 | _filter_scratch +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \ + $SCRATCH_MNT/subvol2/.snapshots/1 | _filter_scratch +for recv in recv1_1 recv1_2 recv2_1 recv2_2; do + $BTRFS_UTIL_PROG send $SCRATCH_MNT/subvol1/.snapshots/1 2> /dev/null | \ + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/${recv} | _filter_scratch +done + +# Now do 10 loops of concurrent incremental send/receives +for i in `seq 1 10`; do + prev=$i + curr=$((i+1)) + + $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \ + $SCRATCH_MNT/subvol1/.snapshots/${curr} > /dev/null + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \ + $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \ + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_1) > /dev/null & + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \ + $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \ + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_2) > /dev/null & + + $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \ + $SCRATCH_MNT/subvol2/.snapshots/${curr} > /dev/null + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \ + $SCRATCH_MNT/subvol2/.snapshots/${curr} 2> /dev/null | \ + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv2_1) > /dev/null & + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \ + $SCRATCH_MNT/subvol2/.snapshots/${curr} 2> /dev/null | \ + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv2_2) > /dev/null & + wait +done + +_scratch_unmount + +status=0 +exit diff --git a/tests/btrfs/152.out b/tests/btrfs/152.out new file mode 100644 index 000..a95bb57 --- /dev/null +++ b/tests/btrfs/152.out @@ -0,0 +1,13 @@ +QA output created by 152 +Create subvolume 'SCRATCH_MNT/subvol1' +Create subvolume 'SCRATCH_MNT/subvol2' +Create subvolume 'SCRATCH_MNT/recv1_1' +Create subvolume 'SCRATCH_MNT/recv1_2' +Create subvolume 'SCRATCH_MNT/recv2_1' +Create subvolume 'SCRATCH_MNT/recv2_2' +Create a readonly snapshot of 'SCRATCH_MNT/subvol1' in 'SCRATCH_MNT/subvol1/.snapshots/1' +Create a readonly snapshot of 'SCRA
[PATCH] btrfs: test for qgroup reservation leaks with prealloc
This test case writes into pre-allocated space, then tries to fallocate some more within the defined quota limit. Currently (4.14-rc7) this fails with EDQUOT due to quota reservation leakage when writing into pre- allocated space. A possible fix has been sent to the ML as "btrfs: Fix quota reservation leak on preallocated files" Signed-off-by: Justin Maggard --- tests/btrfs/153 | 72 + tests/btrfs/153.out | 2 ++ tests/btrfs/group | 1 + 3 files changed, 75 insertions(+) create mode 100755 tests/btrfs/153 create mode 100644 tests/btrfs/153.out diff --git a/tests/btrfs/153 b/tests/btrfs/153 new file mode 100755 index 000..95a8095 --- /dev/null +++ b/tests/btrfs/153 @@ -0,0 +1,72 @@ +#! /bin/bash +# FS QA Test 153 +# +# Test for leaking quota reservations on preallocated files. +# +#--- +# +# Copyright (c) 2017 NETGEAR, Inc. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_btrfs_qgroup_report +_require_xfs_io_command "falloc" + +_scratch_mkfs >/dev/null +_scratch_mount + +_run_btrfs_util_prog quota enable $SCRATCH_MNT +_run_btrfs_util_prog quota rescan -w $SCRATCH_MNT +_run_btrfs_util_prog qgroup limit 100M 0/5 $SCRATCH_MNT + +testfile1=$SCRATCH_MNT/testfile1 +testfile2=$SCRATCH_MNT/testfile2 +$XFS_IO_PROG -fc "falloc 0 80M" $testfile1 +$XFS_IO_PROG -fc "pwrite 0 80M" $testfile1 > /dev/null +$XFS_IO_PROG -fc "falloc 0 19M" $testfile2 +$XFS_IO_PROG -fc "pwrite 0 19M" $testfile2 > /dev/null + +echo "Silence is golden" +status=0 +exit diff --git a/tests/btrfs/153.out b/tests/btrfs/153.out new file mode 100644 index 000..cd9fe8a --- /dev/null +++ b/tests/btrfs/153.out @@ -0,0 +1,2 @@ +QA output created by 153 +Silence is golden diff --git a/tests/btrfs/group b/tests/btrfs/group index fb94461..15c2ecb 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -155,3 +155,4 @@ 150 auto quick dangerous 151 auto quick 152 auto quick metadata qgroup send +153 auto quick qgroup -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Fix quota reservation leak on preallocated files
On 2017年10月31日 06:29, Justin Maggard wrote: > Commit c6887cd11149 (Btrfs: don't do nocow check unless we have to) > changed the behavior of __btrfs_buffered_write() so that it first tries > to get a data space reservation, and then skips the relatively expensive > nocow check if the reservation succeeded. > > If we have quotas enabled, the data space reservation also includes a > quota reservation. But in the rewrite case, the space has already been > accounted for in qgroups. So btrfs_check_data_free_space() increases the > quota reservation, but it never gets decreased when the data actually > gets written and overwrites the pre-existing data. So we're left with > both the qgroup and qgroup reservation accounting for the same space. > > This commit adds the missing btrfs_qgroup_free_data() call in the case of > BTRFS_ORDERED_PREALLOC extents. > > Signed-off-by: Justin Maggard Reviewed-by: Qu Wenruo Thanks, Qu > --- > fs/btrfs/inode.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index d94e3f68b9b1..d7881ccf5310 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -3006,6 +3006,8 @@ static int btrfs_finish_ordered_io(struct > btrfs_ordered_extent *ordered_extent) > compress_type = ordered_extent->compress_type; > if (test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) { > BUG_ON(compress_type); > + btrfs_qgroup_free_data(inode, NULL, ordered_extent->file_offset, > +ordered_extent->len); > ret = btrfs_mark_extent_written(trans, BTRFS_I(inode), > ordered_extent->file_offset, > ordered_extent->file_offset + > signature.asc Description: OpenPGP digital signature
Re: [PATCH RFC] Btrfs: remove max_active and thresh
On 2017年10月31日 01:14, Liu Bo wrote: > First and foremost, here are the problems we have right now, > > a) %thread_pool is configurable via mount option, however for those >'workers' who really needs concurrency, there are at most >"min(num_cpus+2, 8)" threads running to process works, and the >value can't be tuned after mount because they're tagged with >NO_THRESHOLD. > > b) For THRESHOLD workers, btrfs is adjusting how concurrency will >happen by starting with the minimum max_active and calling >btrfs_workqueue_set_max() to grow it on demand, but it also has a >upper limit, "min(num_cpus+2, 8)", which means at most we can have >such many threads running at the same time. I was also wondering this when using kernel workqueue to replace btrfs workqueue. However at that time, oct-core CPU is even hard to find in servers. > > In fact, kernel workqueue can be created on demand and destroyed once > no work needs to be processed. The small max_active limitation (8) > from btrfs has prevented us from utilizing all available cpus, and > that is against why we choose workqueue. Yep, I'm also wondering should we keep the old 8 threads up limits. Especially nowadays oct-core CPU is getting cheaper and cheaper (thanks AMD and its Ryzen). > > What this patch does: > > - resizing %thread_pool is totally removed as they are no long needed, > while keeping its mount option for compatibility. > > - The default "0" is passed when allocating workqueue, so the maximum > number of running works is typically 256. And all fields for > limiting max_active are removed, including current_active, > limit_active, thresh etc. Any benchmark will make this more persuasive. Especially using low frequency but high thread count CPU. The idea and the patch looks good to me. Looking forward to some benchmark result. Thanks, Qu > > Signed-off-by: Liu Bo > --- > fs/btrfs/async-thread.c | 155 > +++ > fs/btrfs/async-thread.h | 27 +++-- > fs/btrfs/delayed-inode.c | 7 ++- > fs/btrfs/disk-io.c | 61 ++- > fs/btrfs/scrub.c | 11 ++-- > fs/btrfs/super.c | 32 -- > 6 files changed, 58 insertions(+), 235 deletions(-) > > diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c > index e00c8a9..dd627ad 100644 > --- a/fs/btrfs/async-thread.c > +++ b/fs/btrfs/async-thread.c > @@ -29,41 +29,6 @@ > #define WORK_ORDER_DONE_BIT 1 > #define WORK_HIGH_PRIO_BIT 2 > > -#define NO_THRESHOLD (-1) > -#define DFT_THRESHOLD (32) > - > -struct __btrfs_workqueue { > - struct workqueue_struct *normal_wq; > - > - /* File system this workqueue services */ > - struct btrfs_fs_info *fs_info; > - > - /* List head pointing to ordered work list */ > - struct list_head ordered_list; > - > - /* Spinlock for ordered_list */ > - spinlock_t list_lock; > - > - /* Thresholding related variants */ > - atomic_t pending; > - > - /* Up limit of concurrency workers */ > - int limit_active; > - > - /* Current number of concurrency workers */ > - int current_active; > - > - /* Threshold to change current_active */ > - int thresh; > - unsigned int count; > - spinlock_t thres_lock; > -}; > - > -struct btrfs_workqueue { > - struct __btrfs_workqueue *normal; > - struct __btrfs_workqueue *high; > -}; > - > static void normal_work_helper(struct btrfs_work *work); > > #define BTRFS_WORK_HELPER(name) \ > @@ -86,20 +51,6 @@ btrfs_work_owner(const struct btrfs_work *work) > return work->wq->fs_info; > } > > -bool btrfs_workqueue_normal_congested(const struct btrfs_workqueue *wq) > -{ > - /* > - * We could compare wq->normal->pending with num_online_cpus() > - * to support "thresh == NO_THRESHOLD" case, but it requires > - * moving up atomic_inc/dec in thresh_queue/exec_hook. Let's > - * postpone it until someone needs the support of that case. > - */ > - if (wq->normal->thresh == NO_THRESHOLD) > - return false; > - > - return atomic_read(&wq->normal->pending) > wq->normal->thresh * 2; > -} > - > BTRFS_WORK_HELPER(worker_helper); > BTRFS_WORK_HELPER(delalloc_helper); > BTRFS_WORK_HELPER(flush_delalloc_helper); > @@ -125,7 +76,7 @@ BTRFS_WORK_HELPER(scrubparity_helper); > > static struct __btrfs_workqueue * > __btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info, const char *name, > - unsigned int flags, int limit_active, int thresh) > + unsigned int flags) > { > struct __btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_KERNEL); > > @@ -133,31 +84,15 @@ __btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info, > const char *name, > return NULL; > > ret->fs_info = fs_info; > - ret->limit_active = limit_active; > atomic_set(&ret->pending, 0); > - if (thresh == 0) > -
Re: [PATCH] btrfs: test if receive with qgroups corrupts metadata
On 2017年10月31日 06:32, Justin Maggard wrote: > This test case does some concurrent send/receives with qgroups enabled. > Currently (4.14-rc7) this usually results in btrfs check errors, and > often also results in a WARN_ON in record_root_in_trans(). > > Bisecting points to 6426c7ad697d (btrfs: qgroup: Fix qgroup accounting > when creating snapshot) as the culprit. Thanks for the report, I'll look into it. BTW can this only be reproduced by concurrent run? Will single thread also cause the problem? Thanks, Qu > > Signed-off-by: Justin Maggard > --- > tests/btrfs/152 | 102 > > tests/btrfs/152.out | 13 +++ > tests/btrfs/group | 1 + > 3 files changed, 116 insertions(+) > create mode 100755 tests/btrfs/152 > create mode 100644 tests/btrfs/152.out > > diff --git a/tests/btrfs/152 b/tests/btrfs/152 > new file mode 100755 > index 000..ebb88ed > --- /dev/null > +++ b/tests/btrfs/152 > @@ -0,0 +1,102 @@ > +#! /bin/bash > +# FS QA Test No. btrfs/152 > +# > +# Test that incremental send/receive operations don't corrupt metadata when > +# qgroups are enabled. > +# > +#--- > +# > +# Copyright (c) 2017 NETGEAR, Inc. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# real QA test starts here > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch > + > +rm -f $seqres.full > + > +_scratch_mkfs >>$seqres.full 2>&1 > +_scratch_mount > + > +# Enable quotas > +$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT > + > +# Create 2 source and 4 destination subvolumes > +for subvol in subvol1 subvol2 recv1_1 recv1_2 recv2_1 recv2_2; do > + $BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/$subvol | _filter_scratch > +done > +mkdir $SCRATCH_MNT/subvol{1,2}/.snapshots > +touch $SCRATCH_MNT/subvol{1,2}/foo > + > +# Create base snapshots and send them > +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \ > + $SCRATCH_MNT/subvol1/.snapshots/1 | _filter_scratch > +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \ > + $SCRATCH_MNT/subvol2/.snapshots/1 | _filter_scratch > +for recv in recv1_1 recv1_2 recv2_1 recv2_2; do > + $BTRFS_UTIL_PROG send $SCRATCH_MNT/subvol1/.snapshots/1 2> /dev/null | \ > + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/${recv} | _filter_scratch > +done > + > +# Now do 10 loops of concurrent incremental send/receives > +for i in `seq 1 10`; do > + prev=$i > + curr=$((i+1)) > + > + $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \ > + $SCRATCH_MNT/subvol1/.snapshots/${curr} > /dev/null > + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \ > + $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \ > + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_1) > /dev/null & > + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \ > + $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \ > + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_2) > /dev/null & > + > + $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \ > + $SCRATCH_MNT/subvol2/.snapshots/${curr} > /dev/null > + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \ > + $SCRATCH_MNT/subvol2/.snapshots/${curr} 2> /dev/null | \ > + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv2_1) > /dev/null & > + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \ > + $SCRATCH_MNT/subvol2/.snapshots/${curr} 2> /dev/null | \ > + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv2_2) > /dev/null & > + wait > +done > + > +_scratch_unmount > + > +status=0 > +exit > diff --git a/tests/btrfs/152.out b/tests/btrfs/152.out > new file mode 100644 > index 000..a95bb57 > --- /dev/null > +++ b/tests/btrfs/152.out > @@ -0,
Re: [PATCH] btrfs: test if receive with qgroups corrupts metadata
On Mon, Oct 30, 2017 at 5:21 PM, Qu Wenruo wrote: > > > On 2017年10月31日 06:32, Justin Maggard wrote: >> This test case does some concurrent send/receives with qgroups enabled. >> Currently (4.14-rc7) this usually results in btrfs check errors, and >> often also results in a WARN_ON in record_root_in_trans(). >> >> Bisecting points to 6426c7ad697d (btrfs: qgroup: Fix qgroup accounting >> when creating snapshot) as the culprit. > > Thanks for the report, I'll look into it. > > BTW can this only be reproduced by concurrent run? > Will single thread also cause the problem? > > Thanks, > Qu I ran 1000 single-threaded passes with no failures, so I'm pretty sure there must be multiple concurrent receives running to reproduce it. -Justin >> >> Signed-off-by: Justin Maggard >> --- >> tests/btrfs/152 | 102 >> >> tests/btrfs/152.out | 13 +++ >> tests/btrfs/group | 1 + >> 3 files changed, 116 insertions(+) >> create mode 100755 tests/btrfs/152 >> create mode 100644 tests/btrfs/152.out >> >> diff --git a/tests/btrfs/152 b/tests/btrfs/152 >> new file mode 100755 >> index 000..ebb88ed >> --- /dev/null >> +++ b/tests/btrfs/152 >> @@ -0,0 +1,102 @@ >> +#! /bin/bash >> +# FS QA Test No. btrfs/152 >> +# >> +# Test that incremental send/receive operations don't corrupt metadata when >> +# qgroups are enabled. >> +# >> +#--- >> +# >> +# Copyright (c) 2017 NETGEAR, Inc. All Rights Reserved. >> +# >> +# This program is free software; you can redistribute it and/or >> +# modify it under the terms of the GNU General Public License as >> +# published by the Free Software Foundation. >> +# >> +# This program is distributed in the hope that it would be useful, >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +# GNU General Public License for more details. >> +# >> +# You should have received a copy of the GNU General Public License >> +# along with this program; if not, write the Free Software Foundation, >> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA >> +#--- >> +# >> + >> +seq=`basename $0` >> +seqres=$RESULT_DIR/$seq >> +echo "QA output created by $seq" >> + >> +tmp=/tmp/$$ >> +status=1 # failure is the default! >> +trap "_cleanup; exit \$status" 0 1 2 3 15 >> + >> +_cleanup() >> +{ >> + cd / >> + rm -f $tmp.* >> +} >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> + >> +# real QA test starts here >> +_supported_fs btrfs >> +_supported_os Linux >> +_require_scratch >> + >> +rm -f $seqres.full >> + >> +_scratch_mkfs >>$seqres.full 2>&1 >> +_scratch_mount >> + >> +# Enable quotas >> +$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT >> + >> +# Create 2 source and 4 destination subvolumes >> +for subvol in subvol1 subvol2 recv1_1 recv1_2 recv2_1 recv2_2; do >> + $BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/$subvol | >> _filter_scratch >> +done >> +mkdir $SCRATCH_MNT/subvol{1,2}/.snapshots >> +touch $SCRATCH_MNT/subvol{1,2}/foo >> + >> +# Create base snapshots and send them >> +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \ >> + $SCRATCH_MNT/subvol1/.snapshots/1 | _filter_scratch >> +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \ >> + $SCRATCH_MNT/subvol2/.snapshots/1 | _filter_scratch >> +for recv in recv1_1 recv1_2 recv2_1 recv2_2; do >> + $BTRFS_UTIL_PROG send $SCRATCH_MNT/subvol1/.snapshots/1 2> /dev/null | >> \ >> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/${recv} | _filter_scratch >> +done >> + >> +# Now do 10 loops of concurrent incremental send/receives >> +for i in `seq 1 10`; do >> + prev=$i >> + curr=$((i+1)) >> + >> + $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \ >> + $SCRATCH_MNT/subvol1/.snapshots/${curr} > /dev/null >> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \ >> + $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \ >> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_1) > /dev/null & >> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \ >> + $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \ >> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_2) > /dev/null & >> + >> + $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \ >> + $SCRATCH_MNT/subvol2/.snapshots/${curr} > /dev/null >> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \ >> + $SCRATCH_MNT/subvol2/.snapshots/${curr} 2> /dev/null | \ >> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv2_1) > /dev/null & >> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \ >> + $SCRATCH_MNT/subvol2
Re: Problem with file system
Dave posted on Sun, 29 Oct 2017 23:31:57 -0400 as excerpted: > It's all part of the process of gaining critical experience with BTRFS. > Whether or not BTRFS is ready for production use is (it seems to me) > mostly a question of how knowledgeable and experienced are the people > administering it. > > In the various online discussions on this topic, all the focus is on > whether or not BTRFS itself is production-ready. At the current maturity > level of BTRFS, I think that's the wrong focus. The right focus is on > how production-ready is the admin person or team (with respect to their > BTRFS knowledge and experience). When a filesystem has been around for > decades, most of the critical admin issues become fairly common > knowledge, fairly widely known and easy to find. When a filesystem is > newer, far fewer people understand the gotchas. Also, in older or widely > used filesystems, when someone hits a gotcha, the response isn't "that > filesystem is not ready for production". Instead the response is, "you > should have known not to do that." That's a view I hadn't seen before, but it seems reasonable and I like it. Indeed, there were/are a few reasonably widely known caveats with both ext3 and reiserfs, for instance, and certainly some that apply to fat/ vfat/fat32, the three filesystems other than btrfs I know most about, and if anything they're past their prime, /not/ "still maturing", as btrfs is typically described. For example, setting either of the two to writeback journaling and then losing data results in something akin to "you should have known not to do that unless you were prepared for the risk, as it's definitely a well known one." Which of course was about my own reaction when Linus and the other powers that be decided to set ext3 to writeback journaling by default for a few kernel cycles. Having lived thru that on reiserfs, I /knew/ where /that/ was headed, and sure enough... Similarly, ext3's performance problems with fsync, because it effectively forces a full filesystem sync not just a file sync, are well known, as are the risks of storing a reiserfs in a loopback file on reiserfs and then trying to run a tree restore on the host, since it's known to mix up the two filesystems in that case. It's thus a reasonable viewpoint to consider some of the btrfs quirks to be in the same category. Of course btrfs being the first COW-based-fs most will have had experience with, and the first filesystem most will have experienced that handles raid, snapshotting, etc, it's definitely rather different and more complex than the filesystems most people are familiar with, and thus can only be expected to have rather different and more complex caveats than the filesystems most are familiar with, as well. OTOH, there's definitely some known low-hanging-fruit in terms of ease of use, remaining to be implemented, tho I'd argue that we've reached the point where general stability is such that it has allowed the focus to gradually tilt toward implementing some of this, over the last year or so, and we're beginning to see the loose ends tied up in the documentation, for instance. I'd say we are getting close, and your viewpoint is a definite argument in support of that. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] btrfs: match btrfs_device->mode same as it used for open
On 10/30/2017 10:39 PM, David Sterba wrote: On Fri, Oct 20, 2017 at 10:07:15PM +0800, Anand Jain wrote: We aren't setting the FMODE_WRITE when initializing btrfs_device structure and when calling blkdev_put, however we are setting it only when calling blkdev_get_by_path(). But this still does not say why this is a problem worth fixing. Nikolay asked for it, and I would do the same, but why do we even have to ask for that? Here its just a cleanup of miss match of open mode and close modes. And there isn't any problem that I noticed. Thanks, Anand https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"
Duncan posted on Mon, 30 Oct 2017 04:09:58 + as excerpted: > Zak Kohler posted on Sun, 29 Oct 2017 21:57:00 -0400 as excerpted: > >> So I ran memtest86+ 5.01 for >4 days: >> Pass: 39 Errors: 0 > > Be aware that memtest86+ will detect some types of errors but not > others. > > In particular, some years ago I had some memory (DDR1/3-digit-Opteron > era), actually registered as required by Opterons and ECC, that passed > that sort of memory test because what the test /tests/ is memory cell > retention (if you put a value in does it verify on read-back?), that was > none-the-less bad memory in that at its rated speed it was unreliable at > memory /transfers/. > > Eventually that mobo got a BIOS update that could adjust memory > clocking, and I downclocked it a notch[.] At > the lower clock it was rock stable, even with reduced wait-states to > make up a bit of the performance I was losing to the lower clock. But I > had to get a BIOS that could do it, first [...] > That's what really amazed me about reiserfs, that it remained stable > thru not only that hardware problem but various others I've had over the > years that would have killed or made unworkable other filesystems. BTW, one of those other hardware problems I had, the one that ultimately did in my old server-clase mobo, was leaky capacitors on the then 8-ish years old system. It was of the generation that had problem capacitors, and it eventually succumbed... The reason this is relevant is that it was the storage path that had the worst problems. Before I figured out what the problem actually was I did try btrfs, with around kernel 3.6 at the time, and it really /was/ unusable due to checksum errors. But I could still limp along with reiserfs... One thing about the behavior I noticed, however, was that as the problem was developing, the system was more usable if I kept it reasonably cool. By the time I gave up on it, it was early summer here in Phoenix, and temperatures were climbing. But I was sitting at home with the AC on, in a heavy winter jacket, wearing sweats under my pants and trying to type with gloves on my hands to keep warm, in ordered to cool down the computer so it'd work. That's when I decided enough was enough and gave up on it. I only found the burst capacitors, however, once I got the new mobo and was switching out the old one. No WONDER it wasn't working right any more! The rest of the system was actually reasonably stable, however. I guess the worst caps were in the storage path. But as I said, reiserfs was amazing. I bought a SATA addon board with the same chipset as on the old mobo so I could boot the old monolithic kernel with those drivers builtin on the new mobo, and I didn't notice any corruption or anything. But as I said, btrfs was entirely unusable on the old hardware, due to both checksum errors and large transactions (like trying to copy files over from reiserfs) ending up entirely reverted when I'd crash, instead of the partial completion I'd get on reiserfs, so I could at least reboot and start where it has crashed on reiserfs, instead of having to start over entirely, thus making no progress at all, which is what I was seeing on btrfs. That reiserfs continued to work well enough to keep going so long under those conditions, while btrfs was entirely unworkable, and even more that once I was running good hardware again, I didn't see massive corruption on reiserfs as a result of trying to run it on so long on the bad hardware, was really /really/ amazing! But like I said, I don't expect that btrfs, with its checksumming, with /ever/ be really workable on that sort of defective hardware. It was just really amazing to me that reiserfs wasn't screwed up by it as well, as it had every right to be given the screwed up hardware I was trying to run it on. No filesystem can be expected to go thru that and end up still usable, but somehow, reiserfs did. Bottom line, it could be the storage path, not the memory or cpu. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE
On 2017/10/30 17:32, Qu Wenruo wrote: > > > On 2017年10月30日 16:14, Misono, Tomohiro wrote: >> Currently, the top-level subvolume lacks the UUID. As a result, both >> non-snapshot subvolume and snapshot of top-level subvolume do not have >> Parent UUID and cannot be distinguisued. Therefore "fi show" of >> top-level lists all the subvolumes which lacks the UUID in >> "Snapshot(s)". Also, it lacks the otime information. > > What about creating a wiki page for ROOT_ITEM and detailed restriction > for its members? > >> >> Fix this by adding the UUID and otime at the mkfs time. As a >> consequence, snapshots of top-level subvolume now have a Parent UUID and >> UUID tree will create an entry for top-level subvolume at mount time. > > This patch also inspires me about the btrfs_create_tree() function I'm > introducing should also populate its UUID and timespec. > >> >> Signed-off-by: Tomohiro Misono >> --- >> ctree.h | 5 + >> mkfs/common.c | 14 ++ >> mkfs/main.c | 3 +++ >> 3 files changed, 22 insertions(+) >> >> diff --git a/ctree.h b/ctree.h >> index 2280659..5737978 100644 >> --- a/ctree.h >> +++ b/ctree.h >> @@ -2071,6 +2071,11 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct >> btrfs_root_item, >> stransid, 64); >> BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item, >> rtransid, 64); >> +static inline void btrfs_set_root_uuid(struct btrfs_root_item *root_item, >> + u8 uuid[BTRFS_UUID_SIZE]) >> +{ >> +memcpy(root_item->uuid, uuid, BTRFS_UUID_SIZE); >> +} > > This is the stack version. > > However there is no extent buffer version to set uuid as we just call > write_extent_buffer(), so it seems unnecessary to me. > >> >> /* struct btrfs_root_backup */ >> BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup, >> diff --git a/mkfs/common.c b/mkfs/common.c >> index c9ce10d..808d343 100644 >> --- a/mkfs/common.c >> +++ b/mkfs/common.c >> @@ -44,6 +44,7 @@ static int btrfs_create_tree_root(int fd, struct >> btrfs_mkfs_config *cfg, >> u32 itemoff; >> int ret = 0; >> int blk; >> +u8 uuid[BTRFS_UUID_SIZE]; >> >> memset(buf->data + sizeof(struct btrfs_header), 0, >> cfg->nodesize - sizeof(struct btrfs_header)); >> @@ -77,6 +78,19 @@ static int btrfs_create_tree_root(int fd, struct >> btrfs_mkfs_config *cfg, >> btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff); >> btrfs_set_item_size(buf, btrfs_item_nr(nritems), >> sizeof(root_item)); >> +if (blk == MKFS_FS_TREE) { >> +time_t now = time(NULL); >> + >> +uuid_generate(uuid); >> +btrfs_set_root_uuid(&root_item, uuid); >> +btrfs_set_stack_timespec_sec(&root_item.otime, now); >> +btrfs_set_stack_timespec_sec(&root_item.ctime, now); >> +} else { >> +memset(uuid, 0, BTRFS_UUID_SIZE); > > This leads to the question about the UUID meaning of different trees. > > AFAIK every tree created by kernel has its own UUID, no one uses all zero. > > In kernel btrfs_create_tree(), it always calls uuid_le_gen() to generate > a new UUID. > So I'm wonder if such branch is really needed. > And maybe we fix all tree creation to generate UUID. > > Despite the question about the meaning of UUID for root_item, I think > the patch really makes sense. > > Thanks, > Qu > Hello, Thanks for the whole comments. I notice UUID (and otime/ctime) of ROOT_ITEM is not used nor updated currently expect subvolumes. Therefore I think it is better to keep these fields to zero to express non-used status. (So, it may be better that uuid/quota tree is not hold UUID.) Regards, Tomohiro >> +btrfs_set_root_uuid(&root_item, uuid); >> +btrfs_set_stack_timespec_sec(&root_item.otime, 0); >> +btrfs_set_stack_timespec_sec(&root_item.ctime, 0); >> +} >> write_extent_buffer(buf, &root_item, >> btrfs_item_ptr_offset(buf, nritems), >> sizeof(root_item)); >> diff --git a/mkfs/main.c b/mkfs/main.c >> index 1b4cabc..4184a2d 100644 >> --- a/mkfs/main.c >> +++ b/mkfs/main.c >> @@ -329,6 +329,7 @@ static int create_tree(struct btrfs_trans_handle *trans, >> struct btrfs_key location; >> struct btrfs_root_item root_item; >> struct extent_buffer *tmp; >> +u8 uuid[BTRFS_UUID_SIZE] = {0}; >> int ret; >> >> ret = btrfs_copy_root(trans, root, root->node, &tmp, objectid); >> @@ -339,6 +340,8 @@ static int create_tree(struct btrfs_trans_handle *trans, >> btrfs_set_root_bytenr(&root_item, tmp->start); >> btrfs_set_root_level(&root_item, btrfs_header_level(tmp)); >> btrfs_set_root_generation(&root_item, trans->transid); >> +/* clear uuid of source tree */ >> +btrfs_set_root_
Re: [PATCH] btrfs-progs: ins: fix arg order in print_inode_item()
On 2017/10/30 17:39, Qu Wenruo wrote: > > > On 2017年10月30日 16:10, Misono, Tomohiro wrote: >> In the print_inode_item(), the argument order of sequence and flags are >> reversed: >> >> printf("... sequence %llu flags 0x%llx(%s)\n", >> ... >> (unsigned long long)btrfs_inode_flags(eb,ii), >> (unsigned long long)btrfs_inode_sequence(eb, ii), >> ...) >> >> So, just fix it. > > Not related to this patch, but considering you're going to enhance > root_item creation, it's also a good idea to print such info like > c/o/s/r time and parent/receive UUID. > > More patches will never be a bad thing. Actually parent/received UUID is printed if it exists but c/o/s/r time is not (for ROOT_ITEM). So, I will send a patch. Thanks, Tomohiro -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE
On 2017年10月31日 10:55, Misono, Tomohiro wrote: > On 2017/10/30 17:32, Qu Wenruo wrote: >> >> >> On 2017年10月30日 16:14, Misono, Tomohiro wrote: >>> Currently, the top-level subvolume lacks the UUID. As a result, both >>> non-snapshot subvolume and snapshot of top-level subvolume do not have >>> Parent UUID and cannot be distinguisued. Therefore "fi show" of >>> top-level lists all the subvolumes which lacks the UUID in >>> "Snapshot(s)". Also, it lacks the otime information. >> >> What about creating a wiki page for ROOT_ITEM and detailed restriction >> for its members? >> >>> >>> Fix this by adding the UUID and otime at the mkfs time. As a >>> consequence, snapshots of top-level subvolume now have a Parent UUID and >>> UUID tree will create an entry for top-level subvolume at mount time. >> >> This patch also inspires me about the btrfs_create_tree() function I'm >> introducing should also populate its UUID and timespec. >> >>> >>> Signed-off-by: Tomohiro Misono >>> --- >>> ctree.h | 5 + >>> mkfs/common.c | 14 ++ >>> mkfs/main.c | 3 +++ >>> 3 files changed, 22 insertions(+) >>> >>> diff --git a/ctree.h b/ctree.h >>> index 2280659..5737978 100644 >>> --- a/ctree.h >>> +++ b/ctree.h >>> @@ -2071,6 +2071,11 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct >>> btrfs_root_item, >>> stransid, 64); >>> BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item, >>> rtransid, 64); >>> +static inline void btrfs_set_root_uuid(struct btrfs_root_item *root_item, >>> +u8 uuid[BTRFS_UUID_SIZE]) >>> +{ >>> + memcpy(root_item->uuid, uuid, BTRFS_UUID_SIZE); >>> +} >> >> This is the stack version. >> >> However there is no extent buffer version to set uuid as we just call >> write_extent_buffer(), so it seems unnecessary to me. >> >>> >>> /* struct btrfs_root_backup */ >>> BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup, >>> diff --git a/mkfs/common.c b/mkfs/common.c >>> index c9ce10d..808d343 100644 >>> --- a/mkfs/common.c >>> +++ b/mkfs/common.c >>> @@ -44,6 +44,7 @@ static int btrfs_create_tree_root(int fd, struct >>> btrfs_mkfs_config *cfg, >>> u32 itemoff; >>> int ret = 0; >>> int blk; >>> + u8 uuid[BTRFS_UUID_SIZE]; >>> >>> memset(buf->data + sizeof(struct btrfs_header), 0, >>> cfg->nodesize - sizeof(struct btrfs_header)); >>> @@ -77,6 +78,19 @@ static int btrfs_create_tree_root(int fd, struct >>> btrfs_mkfs_config *cfg, >>> btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff); >>> btrfs_set_item_size(buf, btrfs_item_nr(nritems), >>> sizeof(root_item)); >>> + if (blk == MKFS_FS_TREE) { >>> + time_t now = time(NULL); >>> + >>> + uuid_generate(uuid); >>> + btrfs_set_root_uuid(&root_item, uuid); >>> + btrfs_set_stack_timespec_sec(&root_item.otime, now); >>> + btrfs_set_stack_timespec_sec(&root_item.ctime, now); >>> + } else { >>> + memset(uuid, 0, BTRFS_UUID_SIZE); >> >> This leads to the question about the UUID meaning of different trees. >> >> AFAIK every tree created by kernel has its own UUID, no one uses all zero. >> >> In kernel btrfs_create_tree(), it always calls uuid_le_gen() to generate >> a new UUID. >> So I'm wonder if such branch is really needed. >> And maybe we fix all tree creation to generate UUID. >> >> Despite the question about the meaning of UUID for root_item, I think >> the patch really makes sense. >> >> Thanks, >> Qu >> > Hello, > > Thanks for the whole comments. > I notice UUID (and otime/ctime) of ROOT_ITEM is not used nor updated > currently > expect subvolumes. Therefore I think it is better to keep these fields to zero > to express non-used status. (So, it may be better that uuid/quota tree is not > hold UUID.) Makes sense. I'll update kernel code to avoid UUID creation for non-fs trees. Since they are not in UUID tree so there is no meaning creating uuid for them. Thanks, Qu > > Regards, > Tomohiro > >>> + btrfs_set_root_uuid(&root_item, uuid); >>> + btrfs_set_stack_timespec_sec(&root_item.otime, 0); >>> + btrfs_set_stack_timespec_sec(&root_item.ctime, 0); >>> + } >>> write_extent_buffer(buf, &root_item, >>> btrfs_item_ptr_offset(buf, nritems), >>> sizeof(root_item)); >>> diff --git a/mkfs/main.c b/mkfs/main.c >>> index 1b4cabc..4184a2d 100644 >>> --- a/mkfs/main.c >>> +++ b/mkfs/main.c >>> @@ -329,6 +329,7 @@ static int create_tree(struct btrfs_trans_handle *trans, >>> struct btrfs_key location; >>> struct btrfs_root_item root_item; >>> struct extent_buffer *tmp; >>> + u8 uuid[BTRFS_UUID_SIZE] = {0}; >>> int ret; >>> >>> ret = btrfs_copy_root(trans, root, root->node, &tmp, objectid); >>> @@ -339,6 +340
[PATCH] btrfs-progs: print-tree: Enehance uuid item print
For key type BTRFS_UUID_KEY_SUBVOL or BTRFS_UUID_KEY_RECEIVED_SUBVOL the key objectid and key offset are just half of the UUID. However we just print the key as %llu, which is converted from little endian, not byte order for UUID, nor the traditional 36 bytes human readable uuid format. Although true engineer can easily convert it in their brain, but to make it easier for search, output the result UUID using the 36 chars format. Cc: Misono Tomohiro Signed-off-by: Qu Wenruo --- Inspired by UUID related work from Misono. --- print-tree.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/print-tree.c b/print-tree.c index 3c585e31f1fc..687f871db302 100644 --- a/print-tree.c +++ b/print-tree.c @@ -803,14 +803,25 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key) } } -static void print_uuid_item(struct extent_buffer *l, unsigned long offset, - u32 item_size) +static void print_uuid_item(struct extent_buffer *l, int slot, + unsigned long offset, u32 item_size) { + struct btrfs_key key; + char uuid_str[BTRFS_UUID_UNPARSED_SIZE]; + u8 uuid[BTRFS_UUID_SIZE]; + + /* Reassemble the uuid from key.objecitd and key.offset */ + btrfs_item_key_to_cpu(l, &key, slot); + put_unaligned_le64(key.objectid, uuid); + put_unaligned_le64(key.offset, uuid + sizeof(u64)); + uuid_unparse(uuid, uuid_str); + if (item_size & (sizeof(u64) - 1)) { printf("btrfs: uuid item with illegal size %lu!\n", (unsigned long)item_size); return; } + printf("\t\tuuid %s\n", uuid_str); while (item_size) { __le64 subvol_id; @@ -1297,7 +1308,7 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *eb) break; case BTRFS_UUID_KEY_SUBVOL: case BTRFS_UUID_KEY_RECEIVED_SUBVOL: - print_uuid_item(eb, btrfs_item_ptr_offset(eb, i), + print_uuid_item(eb, i, btrfs_item_ptr_offset(eb, i), btrfs_item_size_nr(eb, i)); break; case BTRFS_STRING_ITEM_KEY: { -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: ins: print uuid only if it exists
UUID of ROOT_ITEM is empty if it is not subvolume (and not created by kernel). So, just skip it if it is empty just like parent/received UUID. Signed-off-by: Tomohiro Misono --- print-tree.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/print-tree.c b/print-tree.c index 8abd760..9d1b862 100644 --- a/print-tree.c +++ b/print-tree.c @@ -581,8 +581,10 @@ static void print_root_item(struct extent_buffer *leaf, int slot) flags_str); if (root_item.generation == root_item.generation_v2) { - uuid_unparse(root_item.uuid, uuid_str); - printf("\t\tuuid %s\n", uuid_str); + if (!empty_uuid(root_item.uuid)) { + uuid_unparse(root_item.uuid, uuid_str); + printf("\t\tuuid %s\n", uuid_str); + } if (!empty_uuid(root_item.parent_uuid)) { uuid_unparse(root_item.parent_uuid, uuid_str); printf("\t\tparent_uuid %s\n", uuid_str); -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: ins: print c/o/s/r time of ROOT_ITEM
Currently c/o/s/r time of ROOT_ITEM is not printed in print_root_item(). Fix this and print them if the values are not zero. print_timespec() is moved forward to reuse. Signed-off-by: Tomohiro Misono --- ctree.h | 32 print-tree.c | 52 2 files changed, 64 insertions(+), 20 deletions(-) diff --git a/ctree.h b/ctree.h index 2280659..54a85fd 100644 --- a/ctree.h +++ b/ctree.h @@ -2072,6 +2072,38 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct btrfs_root_item, BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item, rtransid, 64); +static inline struct btrfs_timespec * +btrfs_root_ctime(struct btrfs_root_item *root_item) +{ + unsigned long ptr = (unsigned long)root_item; + ptr += offsetof(struct btrfs_root_item, ctime); + return (struct btrfs_timespec *)ptr; +} + +static inline struct btrfs_timespec * +btrfs_root_otime(struct btrfs_root_item *root_item) +{ + unsigned long ptr = (unsigned long)root_item; + ptr += offsetof(struct btrfs_root_item, otime); + return (struct btrfs_timespec *)ptr; +} + +static inline struct btrfs_timespec * +btrfs_root_stime(struct btrfs_root_item *root_item) +{ + unsigned long ptr = (unsigned long)root_item; + ptr += offsetof(struct btrfs_root_item, stime); + return (struct btrfs_timespec *)ptr; +} + +static inline struct btrfs_timespec * +btrfs_root_rtime(struct btrfs_root_item *root_item) +{ + unsigned long ptr = (unsigned long)root_item; + ptr += offsetof(struct btrfs_root_item, rtime); + return (struct btrfs_timespec *)ptr; +} + /* struct btrfs_root_backup */ BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup, tree_root, 64); diff --git a/print-tree.c b/print-tree.c index 9d1b862..62f3163 100644 --- a/print-tree.c +++ b/print-tree.c @@ -551,6 +551,26 @@ static void root_flags_to_str(u64 flags, char *ret) strcat(ret, "none"); } +static void print_timespec(struct extent_buffer *eb, + struct btrfs_timespec *timespec, const char *prefix, + const char *suffix) +{ + struct tm tm; + u64 tmp_u64; + u32 tmp_u32; + time_t tmp_time; + char timestamp[256]; + + tmp_u64 = btrfs_timespec_sec(eb, timespec); + tmp_u32 = btrfs_timespec_nsec(eb, timespec); + tmp_time = tmp_u64; + localtime_r(&tmp_time, &tm); + strftime(timestamp, sizeof(timestamp), + "%Y-%m-%d %H:%M:%S", &tm); + printf("%s%llu.%u (%s)%s", prefix, (unsigned long long)tmp_u64, tmp_u32, + timestamp, suffix); +} + static void print_root_item(struct extent_buffer *leaf, int slot) { struct btrfs_root_item *ri; @@ -600,6 +620,18 @@ static void print_root_item(struct extent_buffer *leaf, int slot) btrfs_root_stransid(&root_item), btrfs_root_rtransid(&root_item)); } + if (btrfs_timespec_sec(leaf, btrfs_root_ctime(ri))) + print_timespec(leaf, btrfs_root_ctime(ri), + "\t\tctime ", "\n"); + if (btrfs_timespec_sec(leaf, btrfs_root_otime(ri))) + print_timespec(leaf, btrfs_root_otime(ri), + "\t\totime ", "\n"); + if (btrfs_timespec_sec(leaf, btrfs_root_stime(ri))) + print_timespec(leaf, btrfs_root_stime(ri), + "\t\tstime ", "\n"); + if (btrfs_timespec_sec(leaf, btrfs_root_rtime(ri))) + print_timespec(leaf, btrfs_root_rtime(ri), + "\t\trtime ", "\n"); } btrfs_disk_key_to_cpu(&drop_key, &root_item.drop_progress); @@ -858,26 +890,6 @@ static void inode_flags_to_str(u64 flags, char *ret) strcat(ret, "none"); } -static void print_timespec(struct extent_buffer *eb, - struct btrfs_timespec *timespec, const char *prefix, - const char *suffix) -{ - struct tm tm; - u64 tmp_u64; - u32 tmp_u32; - time_t tmp_time; - char timestamp[256]; - - tmp_u64 = btrfs_timespec_sec(eb, timespec); - tmp_u32 = btrfs_timespec_nsec(eb, timespec); - tmp_time = tmp_u64; - localtime_r(&tmp_time, &tm); - strftime(timestamp, sizeof(timestamp), - "%Y-%m-%d %H:%M:%S", &tm); - printf("%s%llu.%u (%s)%s", prefix, (unsigned long long)tmp_u64, tmp_u32, - timestamp, suffix); -} - static void print_inode_item(struct extent_buffer *eb, struct btrfs_inode_item *ii) { -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at h
Re: [PATCH] btrfs-progs: print-tree: Enehance uuid item print
On 2017/10/31 13:03, Qu Wenruo wrote: > For key type BTRFS_UUID_KEY_SUBVOL or BTRFS_UUID_KEY_RECEIVED_SUBVOL the > key objectid and key offset are just half of the UUID. > > However we just print the key as %llu, which is converted from little > endian, not byte order for UUID, nor the traditional 36 bytes human > readable uuid format. > > Although true engineer can easily convert it in their brain, but to > make it easier for search, output the result UUID using the 36 chars format. > > Cc: Misono Tomohiro > Signed-off-by: Qu Wenruo > --- > Inspired by UUID related work from Misono. > --- > print-tree.c | 17 ++--- > 1 file changed, 14 insertions(+), 3 deletions(-) > > diff --git a/print-tree.c b/print-tree.c > index 3c585e31f1fc..687f871db302 100644 > --- a/print-tree.c > +++ b/print-tree.c > @@ -803,14 +803,25 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key) > } > } > > -static void print_uuid_item(struct extent_buffer *l, unsigned long offset, > - u32 item_size) > +static void print_uuid_item(struct extent_buffer *l, int slot, > + unsigned long offset, u32 item_size) > { > + struct btrfs_key key; > + char uuid_str[BTRFS_UUID_UNPARSED_SIZE]; > + u8 uuid[BTRFS_UUID_SIZE]; > + > + /* Reassemble the uuid from key.objecitd and key.offset */ > + btrfs_item_key_to_cpu(l, &key, slot); > + put_unaligned_le64(key.objectid, uuid); > + put_unaligned_le64(key.offset, uuid + sizeof(u64)); > + uuid_unparse(uuid, uuid_str); > + > if (item_size & (sizeof(u64) - 1)) { > printf("btrfs: uuid item with illegal size %lu!\n", > (unsigned long)item_size); > return; > } > + printf("\t\tuuid %s\n", uuid_str); > while (item_size) { > __le64 subvol_id; > > @@ -1297,7 +1308,7 @@ void btrfs_print_leaf(struct btrfs_root *root, struct > extent_buffer *eb) > break; > case BTRFS_UUID_KEY_SUBVOL: > case BTRFS_UUID_KEY_RECEIVED_SUBVOL: > - print_uuid_item(eb, btrfs_item_ptr_offset(eb, i), > + print_uuid_item(eb, i, btrfs_item_ptr_offset(eb, i), > btrfs_item_size_nr(eb, i)); > break; > case BTRFS_STRING_ITEM_KEY: { > Reviewed-by: Tomohiro Misono -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: ins: print uuid only if it exists
On 2017年10月31日 13:39, Misono, Tomohiro wrote: > UUID of ROOT_ITEM is empty if it is not subvolume (and not created by > kernel). So, just skip it if it is empty just like parent/received UUID. > > Signed-off-by: Tomohiro Misono Reviewed-by: Qu Wenruo Thanks, Qu > --- > print-tree.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/print-tree.c b/print-tree.c > index 8abd760..9d1b862 100644 > --- a/print-tree.c > +++ b/print-tree.c > @@ -581,8 +581,10 @@ static void print_root_item(struct extent_buffer *leaf, > int slot) > flags_str); > > if (root_item.generation == root_item.generation_v2) { > - uuid_unparse(root_item.uuid, uuid_str); > - printf("\t\tuuid %s\n", uuid_str); > + if (!empty_uuid(root_item.uuid)) { > + uuid_unparse(root_item.uuid, uuid_str); > + printf("\t\tuuid %s\n", uuid_str); > + } > if (!empty_uuid(root_item.parent_uuid)) { > uuid_unparse(root_item.parent_uuid, uuid_str); > printf("\t\tparent_uuid %s\n", uuid_str); > signature.asc Description: OpenPGP digital signature
Re: Problem with file system
On 31/10/17 00:37, Chris Murphy wrote: But off hand it sounds like hardware was sabotaging the expected write ordering. How to test a given hardware setup for that, I think, is really overdue. It affects literally every file system, and Linux storage technology. It kinda sounds like to me something other than supers is being overwritten too soon, and that's why it's possible for none of the backup roots to find a valid root tree, because all four possible root trees either haven't actually been written yet (still) or they've been overwritten, even though the super is updated. But again, it's speculation, we don't actually know why your system was no longer mountable. Just a detached view: I know hardware should respect ordering/barriers and such, but how hard is it really to avoid overwriting at least one complete metadata tree for half an hour (even better, yet another one for a day)? Just metadata, not data extents. -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: ins: print c/o/s/r time of ROOT_ITEM
On 2017年10月31日 13:40, Misono, Tomohiro wrote: > Currently c/o/s/r time of ROOT_ITEM is not printed in print_root_item(). > Fix this and print them if the values are not zero. print_timespec() > is moved forward to reuse. > > Signed-off-by: Tomohiro Misono Reviewed-by: Qu Wenruo Thanks, Qu > --- > ctree.h | 32 > print-tree.c | 52 > 2 files changed, 64 insertions(+), 20 deletions(-) > > diff --git a/ctree.h b/ctree.h > index 2280659..54a85fd 100644 > --- a/ctree.h > +++ b/ctree.h > @@ -2072,6 +2072,38 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct > btrfs_root_item, > BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item, >rtransid, 64); > > +static inline struct btrfs_timespec * > +btrfs_root_ctime(struct btrfs_root_item *root_item) > +{ > + unsigned long ptr = (unsigned long)root_item; > + ptr += offsetof(struct btrfs_root_item, ctime); > + return (struct btrfs_timespec *)ptr; > +} > + > +static inline struct btrfs_timespec * > +btrfs_root_otime(struct btrfs_root_item *root_item) > +{ > + unsigned long ptr = (unsigned long)root_item; > + ptr += offsetof(struct btrfs_root_item, otime); > + return (struct btrfs_timespec *)ptr; > +} > + > +static inline struct btrfs_timespec * > +btrfs_root_stime(struct btrfs_root_item *root_item) > +{ > + unsigned long ptr = (unsigned long)root_item; > + ptr += offsetof(struct btrfs_root_item, stime); > + return (struct btrfs_timespec *)ptr; > +} > + > +static inline struct btrfs_timespec * > +btrfs_root_rtime(struct btrfs_root_item *root_item) > +{ > + unsigned long ptr = (unsigned long)root_item; > + ptr += offsetof(struct btrfs_root_item, rtime); > + return (struct btrfs_timespec *)ptr; > +} > + > /* struct btrfs_root_backup */ > BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup, > tree_root, 64); > diff --git a/print-tree.c b/print-tree.c > index 9d1b862..62f3163 100644 > --- a/print-tree.c > +++ b/print-tree.c > @@ -551,6 +551,26 @@ static void root_flags_to_str(u64 flags, char *ret) > strcat(ret, "none"); > } > > +static void print_timespec(struct extent_buffer *eb, > + struct btrfs_timespec *timespec, const char *prefix, > + const char *suffix) > +{ > + struct tm tm; > + u64 tmp_u64; > + u32 tmp_u32; > + time_t tmp_time; > + char timestamp[256]; > + > + tmp_u64 = btrfs_timespec_sec(eb, timespec); > + tmp_u32 = btrfs_timespec_nsec(eb, timespec); > + tmp_time = tmp_u64; > + localtime_r(&tmp_time, &tm); > + strftime(timestamp, sizeof(timestamp), > + "%Y-%m-%d %H:%M:%S", &tm); > + printf("%s%llu.%u (%s)%s", prefix, (unsigned long long)tmp_u64, tmp_u32, > + timestamp, suffix); > +} > + > static void print_root_item(struct extent_buffer *leaf, int slot) > { > struct btrfs_root_item *ri; > @@ -600,6 +620,18 @@ static void print_root_item(struct extent_buffer *leaf, > int slot) > btrfs_root_stransid(&root_item), > btrfs_root_rtransid(&root_item)); > } > + if (btrfs_timespec_sec(leaf, btrfs_root_ctime(ri))) > + print_timespec(leaf, btrfs_root_ctime(ri), > + "\t\tctime ", "\n"); > + if (btrfs_timespec_sec(leaf, btrfs_root_otime(ri))) > + print_timespec(leaf, btrfs_root_otime(ri), > + "\t\totime ", "\n"); > + if (btrfs_timespec_sec(leaf, btrfs_root_stime(ri))) > + print_timespec(leaf, btrfs_root_stime(ri), > + "\t\tstime ", "\n"); > + if (btrfs_timespec_sec(leaf, btrfs_root_rtime(ri))) > + print_timespec(leaf, btrfs_root_rtime(ri), > + "\t\trtime ", "\n"); > } > > btrfs_disk_key_to_cpu(&drop_key, &root_item.drop_progress); > @@ -858,26 +890,6 @@ static void inode_flags_to_str(u64 flags, char *ret) > strcat(ret, "none"); > } > > -static void print_timespec(struct extent_buffer *eb, > - struct btrfs_timespec *timespec, const char *prefix, > - const char *suffix) > -{ > - struct tm tm; > - u64 tmp_u64; > - u32 tmp_u32; > - time_t tmp_time; > - char timestamp[256]; > - > - tmp_u64 = btrfs_timespec_sec(eb, timespec); > - tmp_u32 = btrfs_timespec_nsec(eb, timespec); > - tmp_time = tmp_u64; > - localtime_r(&tmp_time, &tm); > - strftime(timestamp, sizeof(timestamp), > - "%Y-%m-%d %H:%M:%S", &tm); > - printf("%s%llu.%u (%s)%s", prefix, (unsigned long long)tmp_u64, tmp_u32, > - timestamp, suffix); > -} > - > static void print_inode_item(struct extent_buffer
[PATCH 1/2] btrfs-progs: test/fsck: Introduce test images containing tree reloc tree
Tree reloc tree is a special tree with very short life spawn. It acts as a special snapshot for any tree, with related nodes/leaves or EXTENT_DATA modified to point to new position. Considering the short life spawn and its specialness, it should be quite reasonable to keep them as both corner case for fsck and educational dump for anyone interested in relocation. Signed-off-by: Qu Wenruo --- tests/fsck-tests/027-tree-reloc-tree/test.sh | 22 + .../tree_reloc_for_data_reloc.img.xz | Bin 0 -> 2112 bytes .../tree_reloc_for_fs_tree.img.xz | Bin 0 -> 2424 bytes 3 files changed, 22 insertions(+) create mode 100755 tests/fsck-tests/027-tree-reloc-tree/test.sh create mode 100644 tests/fsck-tests/027-tree-reloc-tree/tree_reloc_for_data_reloc.img.xz create mode 100644 tests/fsck-tests/027-tree-reloc-tree/tree_reloc_for_fs_tree.img.xz diff --git a/tests/fsck-tests/027-tree-reloc-tree/test.sh b/tests/fsck-tests/027-tree-reloc-tree/test.sh new file mode 100755 index ..559c4bc700bd --- /dev/null +++ b/tests/fsck-tests/027-tree-reloc-tree/test.sh @@ -0,0 +1,22 @@ +#!/bin/bash +# Make sure btrfs check won't report any false alert for valid image with +# tree reloc tree. +# +# Also due to the short life spawn of tree reloc tree, save them as dump +# example for later usage. + +source "$TOP/tests/common" + +check_prereq btrfs + +for img in *.img.xz +do + image=$(extract_image "$img") + run_check_stdout "$TOP/btrfs" check "$image" 2>&1 | + grep -q "Errors found in extent allocation tree or chunk allocation" + if [ $? -eq 0 ]; then + rm -f "$image" + _fail "unexpected error occurred when checking $img" + fi + rm -f "$image" +done diff --git a/tests/fsck-tests/027-tree-reloc-tree/tree_reloc_for_data_reloc.img.xz b/tests/fsck-tests/027-tree-reloc-tree/tree_reloc_for_data_reloc.img.xz new file mode 100644 index ..66d8bde6b7a6c451eb7b3b6f4dbc6a919144d1f5 GIT binary patch literal 2112 zcmV-G2*3CJH+ooF000E$*0e?f03iV!G&sfah5B~@LT>wRyj;C3^v%$$4d1r37 zhA1@4w_z}05@xu}qhN$h#=9A)MD}#c5t`w0DAA1ZzCzD>FO393JL%KESo(OESm)Se zsSibx%8nF~Qojo*ma_q0pk(2sJJzy~$;uMaE)o>1vKb8VqnTOf1B z^DYElxsTLnJ-2@t0RIq$YA_Uc6P|OH7-DzZ=zRt)R^=~+JYg-ai)BxPu7rrjpGu}4 zTDD{+Z`Q;`k`tk38Et8X+8yqB?oe!Q@AR$b8-XBMO#7JxvctkyJ(M;~Gz4}+g?mDV z5?|k$9|heMleXlkp)$TnyH9!{0TkT( z<3H)4UM-$f5Tee_H!mO7a-_~8vi;P$UX}tes_Bu@etL&IQr6A3TqKF@bR%~(m_gR* z;SPQOrplP(i>JJFUI!pHly?zbn^-#r;4SwnYLXE5@<)o@S(>?UYxE{{Q(X0T)I1cN zoeplPtCGGm{g~C#IN>}DHXZ;41F0?`xeMn(HAEq*MCej4ZJ*B;*9CWi_yWjE9Pmm@ z>>2=5eU&|=tn2;xulKMF*H3_C`i7boa^LDsL58^C>$y$L7bU-FeR$wAt-F840_ZB4 zHrm8V)k~c|Tsxr1XtzPiQ|*GP{m-b_<8SDCmw$T;aH5}ak-%NYza0~fG?DPpTaQ0n5Rn9DlO`{Mbfe#s?9tT;Dgv=&}409 zRvxmwB)Sx;;#~%L4-C0#eE@-pAH{i8F|Y%{2QX5@S!_)<5rG}7I*He}n!~ENhw=95 zq1CwQV`AuuwJYht!Mxu@hZMrDQxdy%<;%W8s1f&K*#wl;jZD`o1X5F!h2`^(f2W|Q z?LD5%!%87{>Z)*MC;IEHbkP{)iYN;|P82N?90+ym%1E)#e7<0LKPG-fyzAh07L}^} zTf(RS?7X#7p^TuABH5^?y?uUjVju&vXxT70sb4g@jZP*kK8)=TW=><1zL&@O_M@KM zh1rgiXqFmg5EKcxSuEsN`R{k*k=GrMX>+>#*a_)&;dxC;3uHTJ?bCA{Vj_!PRgo2* z)WFtif^0P)$HJlGj6Q;$loId_4OqDkat-|iEfu#@qfGmi zS%f-Xe7(R!Uxi~N>Ua1T$*LWsy<@bK^oFctct|$u++dxdLRY@FB{QS#Jp@N@apFKa zub7?`EBxgYwjut$@wen7`xa@TmNMUG#Uk3~KUZ_yWaOXZs-eH$kloTnvgET)C`woz zjYI~+Xh4f(S2h@|itNP+y!R#%PbIyY(0^jrqV%JV`*$%1Ge~NDce(aR8iXt-yrXU0 z=@xsVbM&{F70Lo)kqEM}iG8D1%{QGGlQ45PYScBNbzW{G!K6HQz&9u4p^*gb)aG2r ztG2;H8>8RVSdOfp7f~q8E>k+Kwn?g$+F8$z;4cEZvxVITJWTED!+R`6(mA+0B-_qC zJsIHU!-+%XM;F}&i)W4=s9Y1~59S{;tHjV3#@pB@p)Y|_b18Xt@V7d$RvU_?VO4>z zG3=K58gsz@I34qyZ3`&xa`yx?ev>Q9h#I}(btMPT zq#bp-XdqDmp4)XKL&ck+`(-G232$HYERD9*cC8F@K=W>P64b1}-en()U$vc98j?69n zVupjhY@t%^!(-SdbQ{D5+ezg;(m*>@H?`%>N$Qt8t^FamLhbH|e6u!1A2o}ie`#F_ zujRnW7#+)7e1Q3B+W&}hLVHMZ5iyc_4ZQX_bo>HSYux!aa$l|@xkv>#xedMW zh?74dkLXz7Tk0EWJGP6}1@wMwRyj;C3^v%$$4d1wdA zhA1@4FQtHC5@xu}qhN$h#=9A)MD}kZJDsuMAcn@)`6EaApWd{S>8%BnWfF=649R^9 zhSZ!gBqxWUwRV^Y$&R=6HvdFxjTEunmCftkc^4dvZx8UWdfa;I$XeqGd=1wHkb}6SL5fgModV(X}^PDMLC`~!Xt2yMwUQ})L|d-^i&C7B3IJ5 z*~l)NLguNp5_68vyjw>C8TZ$&70X186PlPn3FTje$fYuAl%pIkC`C!YRFZN;2uv*L z*g&Ef)!<(YoZXwp2!7ezH0wp>Mzz$h+MSK+WsjcmF^jE`o}Gwn6qMmzjIkZ3#Qs>E zW0GvL>i-bM81TB}z+v33!pPDo&+)!M1uMq+M$^MLZ6XzFT$6AJN|+ubXEsnWY(k`0^lA~#W4uk!fV|0qUvORLZF`Y5EOc;+H3sB z2G)yQC&6KJDRmR#=b%~pLcl%cPaVeZBJNq!@@ly`BJ`yDnr;PbDni!&deCFBaWka~ z*$<;$Uw69e{DclkU%sqb`i@-&aPf-f>?g-&4UXsiWts3;h<=6Gvtu)k#YrRvx}r$0 z=SV&8RZ2tI&`xibB-yU+`XVnQ;odaS&Vln1UsHYI2$wg<{?8e(^StYAysd%!Dy@Y6 zPWgsIQ=2af@$$W`BnH2FY1MupDVNr**<*8wjf>|wJh|7_89E)A!9PBnM?VLvXVpc7 zy_Fv9tWoHez{yXP>{LFe%}7a+Z!xmMBS7D{&7(#$?hhzKW*fCy`uoILxVzSTC)X)n z^(On{b0D6-_*3PYmt68B$d<=EUYwHI8vg-jG#e0?@?Pb8?=gNarwf-DcP%Ybr3?r8 zmpz46Mf1G>cRCY}&x3&z@~zetgwQeb8r9~u7BsuxI9X9Wf+>~vABg%sKRz(<$pkA> z9Mxe~ytdJ9??4hprnrF2Qh6e)9xPSEbbetOX(el_5JNgbci9TdKv=;^x^eW;DDeaA zxC|6|7+99Wo-2`cc_acVEFneoH@Y6p>I|zIm|rK+4v&MM)e%f9MV!&iTy<=b@7pE@ zit>o?pO#+^Oq3e%i;WJc=kn3n?vygauKq`SgouE&RBi+iW1}tY`S8`8fQIapSGz~ddESlv z!iixF>e|f!L4k0ph&rTN@{hUT2cC?8%5P
[PATCH 2/2] btrfs-progs: print-tree: Print offset as tree objectid
For case like tree reloc trees and subvolume trees, their key offset is the tree id. For case like tree treloc tree for data reloc tree, The key will be outputted as: (TREE_RELOC ROOT_ITEM 18446744073709551607) The minus number is long and even guys with real engineer brains can't easily get the meaning. This patch will change the output format to: (TREE_RELOC ROOT_ITEM DATA_RELOC_TREE) While for special offset value like 0 or (u64)-1, it's still shown as is. Signed-off-by: Qu Wenruo --- print-tree.c | 12 1 file changed, 12 insertions(+) diff --git a/print-tree.c b/print-tree.c index 687f871db302..326fafcea763 100644 --- a/print-tree.c +++ b/print-tree.c @@ -794,6 +794,18 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key) case BTRFS_UUID_KEY_RECEIVED_SUBVOL: printf(" 0x%016llx)", (unsigned long long)offset); break; + + /* +* For those key offset, they are points to tree id, print them +* in human readable format other than tree id. +* Especially useful for trees like data/tree reloc tree, whose +* tree id can be minus. +*/ + case BTRFS_ROOT_ITEM_KEY: + printf(" "); + print_objectid(stdout, offset, type); + printf(")"); + break; default: if (offset == (u64)-1) printf(" -1)"); -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Don't generate UUID for non-fs tree
btrfs_create_tree() will unconditionally generate UUID for any root. So for quota tree and data reloc tree created by kernel, they will have unique UUIDs. However UUID in root item is only referred by UUID tree, which only records UUID for fs trees. This makes unique UUIDs for quota/data reloc tree meaningless. Leave the UUID as zero for non-fs tree, making btrfs-debug-tree output less confusing. Reported-by: Misono Tomohiro Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dfdab849037b..d85e04a675fe 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1403,7 +1403,7 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans, struct btrfs_root *root; struct btrfs_key key; int ret = 0; - uuid_le uuid; + uuid_le uuid = { 0 }; root = btrfs_alloc_root(fs_info, GFP_KERNEL); if (!root) @@ -1444,7 +1444,8 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans, btrfs_set_root_used(&root->root_item, leaf->len); btrfs_set_root_last_snapshot(&root->root_item, 0); btrfs_set_root_dirid(&root->root_item, 0); - uuid_le_gen(&uuid); + if (is_fstree(objectid)) + uuid_le_gen(&uuid); memcpy(root->root_item.uuid, uuid.b, BTRFS_UUID_SIZE); root->root_item.drop_level = 0; -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html