[PATCH] btrfs-progs: ins: fix arg order in print_inode_item()

2017-10-30 Thread Misono, Tomohiro
In the print_inode_item(), the argument order of sequence and flags are
reversed:

printf("... sequence %llu flags 0x%llx(%s)\n",
... 
   (unsigned long long)btrfs_inode_flags(eb,ii),
   (unsigned long long)btrfs_inode_sequence(eb, ii),
...)

So, just fix it.

Signed-off-by: Tomohiro Misono 
---
 print-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/print-tree.c b/print-tree.c
index 3c585e3..8abd760 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -896,8 +896,8 @@ static void print_inode_item(struct extent_buffer *eb,
   btrfs_inode_uid(eb, ii),
   btrfs_inode_gid(eb, ii),
   (unsigned long long)btrfs_inode_rdev(eb,ii),
-  (unsigned long long)btrfs_inode_flags(eb,ii),
   (unsigned long long)btrfs_inode_sequence(eb, ii),
+  (unsigned long long)btrfs_inode_flags(eb,ii),
   flags_str);
print_timespec(eb, btrfs_inode_atime(ii), "\t\tatime ", "\n");
print_timespec(eb, btrfs_inode_ctime(ii), "\t\tctime ", "\n");
-- 
2.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE

2017-10-30 Thread Misono, Tomohiro
Currently, the top-level subvolume lacks the UUID. As a result, both
non-snapshot subvolume and snapshot of top-level subvolume do not have
Parent UUID and cannot be distinguisued. Therefore "fi show" of
top-level lists all the subvolumes which lacks the UUID in
"Snapshot(s)". Also, it lacks the otime information.

Fix this by adding the UUID and otime at the mkfs time.  As a
consequence, snapshots of top-level subvolume now have a Parent UUID and
UUID tree will create an entry for top-level subvolume at mount time.

Signed-off-by: Tomohiro Misono 
---
 ctree.h   |  5 +
 mkfs/common.c | 14 ++
 mkfs/main.c   |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/ctree.h b/ctree.h
index 2280659..5737978 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2071,6 +2071,11 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct 
btrfs_root_item,
 stransid, 64);
 BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item,
 rtransid, 64);
+static inline void btrfs_set_root_uuid(struct btrfs_root_item *root_item,
+u8 uuid[BTRFS_UUID_SIZE])
+{
+   memcpy(root_item->uuid, uuid, BTRFS_UUID_SIZE);
+}
 
 /* struct btrfs_root_backup */
 BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup,
diff --git a/mkfs/common.c b/mkfs/common.c
index c9ce10d..808d343 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -44,6 +44,7 @@ static int btrfs_create_tree_root(int fd, struct 
btrfs_mkfs_config *cfg,
u32 itemoff;
int ret = 0;
int blk;
+   u8 uuid[BTRFS_UUID_SIZE];
 
memset(buf->data + sizeof(struct btrfs_header), 0,
cfg->nodesize - sizeof(struct btrfs_header));
@@ -77,6 +78,19 @@ static int btrfs_create_tree_root(int fd, struct 
btrfs_mkfs_config *cfg,
btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff);
btrfs_set_item_size(buf, btrfs_item_nr(nritems),
sizeof(root_item));
+   if (blk == MKFS_FS_TREE) {
+   time_t now = time(NULL);
+
+   uuid_generate(uuid);
+   btrfs_set_root_uuid(&root_item, uuid);
+   btrfs_set_stack_timespec_sec(&root_item.otime, now);
+   btrfs_set_stack_timespec_sec(&root_item.ctime, now);
+   } else {
+   memset(uuid, 0, BTRFS_UUID_SIZE);
+   btrfs_set_root_uuid(&root_item, uuid);
+   btrfs_set_stack_timespec_sec(&root_item.otime, 0);
+   btrfs_set_stack_timespec_sec(&root_item.ctime, 0);
+   }
write_extent_buffer(buf, &root_item,
btrfs_item_ptr_offset(buf, nritems),
sizeof(root_item));
diff --git a/mkfs/main.c b/mkfs/main.c
index 1b4cabc..4184a2d 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -329,6 +329,7 @@ static int create_tree(struct btrfs_trans_handle *trans,
struct btrfs_key location;
struct btrfs_root_item root_item;
struct extent_buffer *tmp;
+   u8 uuid[BTRFS_UUID_SIZE] = {0};
int ret;
 
ret = btrfs_copy_root(trans, root, root->node, &tmp, objectid);
@@ -339,6 +340,8 @@ static int create_tree(struct btrfs_trans_handle *trans,
btrfs_set_root_bytenr(&root_item, tmp->start);
btrfs_set_root_level(&root_item, btrfs_header_level(tmp));
btrfs_set_root_generation(&root_item, trans->transid);
+   /* clear uuid of source tree */
+   btrfs_set_root_uuid(&root_item, uuid);
free_extent_buffer(tmp);
 
location.objectid = objectid;
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs-progs: show: allow fi show -u for FS_TREE

2017-10-30 Thread Misono, Tomohiro
Allow "fi show -u" for uuid of top-level subvolume too for consistency.

Signed-off-by: Tomohiro Misono 
---
 btrfs-list.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/btrfs-list.c b/btrfs-list.c
index b6d7658..91fdab8 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -1594,9 +1594,15 @@ int btrfs_get_subvol(int fd, struct root_info *the_ri)
ri = rb_entry(rbn, struct root_info, rb_node);
rr = resolve_root(&rl, ri, root_id);
if (rr == -ENOENT) {
-   ret = -ENOENT;
-   rbn = rb_next(rbn);
-   continue;
+   if (ri->root_id == BTRFS_FS_TREE_OBJECTID) {
+   ri->path = strdup("/");
+   ri->name = strdup("");
+   ri->full_path = strdup("/");
+   } else {
+   ret = -ENOENT;
+   rbn = rb_next(rbn);
+   continue;
+   }
}
 
if (!comp_entry_with_rootid(the_ri, ri, 0) ||
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: ins: fix arg order in print_inode_item()

2017-10-30 Thread Qu Wenruo


On 2017年10月30日 16:10, Misono, Tomohiro wrote:
> In the print_inode_item(), the argument order of sequence and flags are
> reversed:
> 
> printf("... sequence %llu flags 0x%llx(%s)\n",
>   ... 
>  (unsigned long long)btrfs_inode_flags(eb,ii),
>  (unsigned long long)btrfs_inode_sequence(eb, ii),
>   ...)
> 
> So, just fix it.
> 
> Signed-off-by: Tomohiro Misono 

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
>  print-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/print-tree.c b/print-tree.c
> index 3c585e3..8abd760 100644
> --- a/print-tree.c
> +++ b/print-tree.c
> @@ -896,8 +896,8 @@ static void print_inode_item(struct extent_buffer *eb,
>  btrfs_inode_uid(eb, ii),
>  btrfs_inode_gid(eb, ii),
>  (unsigned long long)btrfs_inode_rdev(eb,ii),
> -(unsigned long long)btrfs_inode_flags(eb,ii),
>  (unsigned long long)btrfs_inode_sequence(eb, ii),
> +(unsigned long long)btrfs_inode_flags(eb,ii),
>  flags_str);
>   print_timespec(eb, btrfs_inode_atime(ii), "\t\tatime ", "\n");
>   print_timespec(eb, btrfs_inode_ctime(ii), "\t\tctime ", "\n");
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE

2017-10-30 Thread Qu Wenruo


On 2017年10月30日 16:14, Misono, Tomohiro wrote:
> Currently, the top-level subvolume lacks the UUID. As a result, both
> non-snapshot subvolume and snapshot of top-level subvolume do not have
> Parent UUID and cannot be distinguisued. Therefore "fi show" of
> top-level lists all the subvolumes which lacks the UUID in
> "Snapshot(s)". Also, it lacks the otime information.

What about creating a wiki page for ROOT_ITEM and detailed restriction
for its members?

> 
> Fix this by adding the UUID and otime at the mkfs time.  As a
> consequence, snapshots of top-level subvolume now have a Parent UUID and
> UUID tree will create an entry for top-level subvolume at mount time.

This patch also inspires me about the btrfs_create_tree() function I'm
introducing should also populate its UUID and timespec.

> 
> Signed-off-by: Tomohiro Misono 
> ---
>  ctree.h   |  5 +
>  mkfs/common.c | 14 ++
>  mkfs/main.c   |  3 +++
>  3 files changed, 22 insertions(+)
> 
> diff --git a/ctree.h b/ctree.h
> index 2280659..5737978 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -2071,6 +2071,11 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct 
> btrfs_root_item,
>stransid, 64);
>  BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item,
>rtransid, 64);
> +static inline void btrfs_set_root_uuid(struct btrfs_root_item *root_item,
> +  u8 uuid[BTRFS_UUID_SIZE])
> +{
> + memcpy(root_item->uuid, uuid, BTRFS_UUID_SIZE);
> +}

This is the stack version.

However there is no extent buffer version to set uuid as we just call
write_extent_buffer(), so it seems unnecessary to me.

>  
>  /* struct btrfs_root_backup */
>  BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup,
> diff --git a/mkfs/common.c b/mkfs/common.c
> index c9ce10d..808d343 100644
> --- a/mkfs/common.c
> +++ b/mkfs/common.c
> @@ -44,6 +44,7 @@ static int btrfs_create_tree_root(int fd, struct 
> btrfs_mkfs_config *cfg,
>   u32 itemoff;
>   int ret = 0;
>   int blk;
> + u8 uuid[BTRFS_UUID_SIZE];
>  
>   memset(buf->data + sizeof(struct btrfs_header), 0,
>   cfg->nodesize - sizeof(struct btrfs_header));
> @@ -77,6 +78,19 @@ static int btrfs_create_tree_root(int fd, struct 
> btrfs_mkfs_config *cfg,
>   btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff);
>   btrfs_set_item_size(buf, btrfs_item_nr(nritems),
>   sizeof(root_item));
> + if (blk == MKFS_FS_TREE) {
> + time_t now = time(NULL);
> +
> + uuid_generate(uuid);
> + btrfs_set_root_uuid(&root_item, uuid);
> + btrfs_set_stack_timespec_sec(&root_item.otime, now);
> + btrfs_set_stack_timespec_sec(&root_item.ctime, now);
> + } else {
> + memset(uuid, 0, BTRFS_UUID_SIZE);

This leads to the question about the UUID meaning of different trees.

AFAIK every tree created by kernel has its own UUID, no one uses all zero.

In kernel btrfs_create_tree(), it always calls uuid_le_gen() to generate
a new UUID.
So I'm wonder if such branch is really needed.
And maybe we fix all tree creation to generate UUID.

Despite the question about the meaning of UUID for root_item, I think
the patch really makes sense.

Thanks,
Qu

> + btrfs_set_root_uuid(&root_item, uuid);
> + btrfs_set_stack_timespec_sec(&root_item.otime, 0);
> + btrfs_set_stack_timespec_sec(&root_item.ctime, 0);
> + }
>   write_extent_buffer(buf, &root_item,
>   btrfs_item_ptr_offset(buf, nritems),
>   sizeof(root_item));
> diff --git a/mkfs/main.c b/mkfs/main.c
> index 1b4cabc..4184a2d 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -329,6 +329,7 @@ static int create_tree(struct btrfs_trans_handle *trans,
>   struct btrfs_key location;
>   struct btrfs_root_item root_item;
>   struct extent_buffer *tmp;
> + u8 uuid[BTRFS_UUID_SIZE] = {0};
>   int ret;
>  
>   ret = btrfs_copy_root(trans, root, root->node, &tmp, objectid);
> @@ -339,6 +340,8 @@ static int create_tree(struct btrfs_trans_handle *trans,
>   btrfs_set_root_bytenr(&root_item, tmp->start);
>   btrfs_set_root_level(&root_item, btrfs_header_level(tmp));
>   btrfs_set_root_generation(&root_item, trans->transid);
> + /* clear uuid of source tree */
> + btrfs_set_root_uuid(&root_item, uuid);
>   free_extent_buffer(tmp);
>  
>   location.objectid = objectid;
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs-progs: ins: fix arg order in print_inode_item()

2017-10-30 Thread Qu Wenruo


On 2017年10月30日 16:10, Misono, Tomohiro wrote:
> In the print_inode_item(), the argument order of sequence and flags are
> reversed:
> 
> printf("... sequence %llu flags 0x%llx(%s)\n",
>   ... 
>  (unsigned long long)btrfs_inode_flags(eb,ii),
>  (unsigned long long)btrfs_inode_sequence(eb, ii),
>   ...)
> 
> So, just fix it.

Not related to this patch, but considering you're going to enhance
root_item creation, it's also a good idea to print such info like
c/o/s/r time and parent/receive UUID.

More patches will never be a bad thing.

Thanks,
Qu
> 
> Signed-off-by: Tomohiro Misono 
> ---
>  print-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/print-tree.c b/print-tree.c
> index 3c585e3..8abd760 100644
> --- a/print-tree.c
> +++ b/print-tree.c
> @@ -896,8 +896,8 @@ static void print_inode_item(struct extent_buffer *eb,
>  btrfs_inode_uid(eb, ii),
>  btrfs_inode_gid(eb, ii),
>  (unsigned long long)btrfs_inode_rdev(eb,ii),
> -(unsigned long long)btrfs_inode_flags(eb,ii),
>  (unsigned long long)btrfs_inode_sequence(eb, ii),
> +(unsigned long long)btrfs_inode_flags(eb,ii),
>  flags_str);
>   print_timespec(eb, btrfs_inode_atime(ii), "\t\tatime ", "\n");
>   print_timespec(eb, btrfs_inode_ctime(ii), "\t\tctime ", "\n");
> 



signature.asc
Description: OpenPGP digital signature


Re: USB upgrade fun

2017-10-30 Thread Kai Hendry
On Sun, 29 Oct 2017, at 06:02 PM, Qu Wenruo wrote:
> If so, mount it, do minimal write like creating an empty file, to update
> both superblock copies, and then try fix-device-size.

Tried that, and it didn't work. Made a recording:
https://youtu.be/SFd3QscNT6w
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: clean up btrfs_dev_stat_inc() usage

2017-10-30 Thread David Sterba
On Sat, Oct 21, 2017 at 01:45:33AM +0800, Anand Jain wrote:
> btrfs_end_bio() is using btrfs_dev_stat_inc() and then
> btrfs_dev_stat_print_on_error() separately instead use
> btrfs_dev_stat_inc_and_print() directly.
> 
> As of now there isn't any bio in btrfs which is - a non-empty write and
> also the REQ_PREFLUSH flag is set. So in actual the condition
>if (bio->bi_opf & REQ_PREFLUSH)
> is never true in btrfs_end_bio(), and so there won't be any redundant error
> log by using btrfs_dev_stat_inc_and_print() separately one for write and
> another for flush.
> 
> This consolidation will help to add the device critical error handles in the
> function btrfs_dev_stat_inc_and_print() and which can be renamed as needed.
> 
> Signed-off-by: Anand Jain 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs: Remove redundant memory barrier

2017-10-30 Thread David Sterba
On Fri, Oct 20, 2017 at 06:10:58PM +0300, Nikolay Borisov wrote:
> As per atomic_t.txt documentation :
>  - RMW operations that have a return value are fully ordered;
> 
> atomic_xchg is one such operation so it already includes everything it needs
> w.r.t memory ordering and add a comment ot be more explicit about that.
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: Fix bug for misused dev_t when lookup in dev state hash table.

2017-10-30 Thread David Sterba
On Thu, Oct 19, 2017 at 09:49:27AM +0800, Gu Jinxiang wrote:
> From: Gu JinXiang 
> 
> Fix bug of commit 74d46992e0d9
> ("block: replace bi_bdev with a gendisk pointer and partitions index").
> 
> In this modify, use bio_dev(bio) to find dev state in function
> __btrfsic_submit_bio. But when dev_state added to hashtable, it is using
> dev_t of block_device.
> 
> bio_dev(bio) returns a dev_t of part0 which is different from dev_t in
> block_device(bd_dev). bd_dev in block_device represents the exact
> partition.
> block_device.bd_dev = 
>   bio->bi_partno (same as block_device.bd_partno) + bio_dev(bio).
> 
> When add a dev_state into hashtable it is using the exact partition's dev_t.
> So when lookup it, it should also use the exact partition's dev_t.
> 
> Reproduce of this bug:
> Use MOUNT_OPTIONS="-o check_int" when run btrfs/001 in xfstest.
> Then there will be WARNING like below.
> WARNING:
> btrfs: attempt to write superblock which references block M @29523968 (sda7   
>   /654400/2) which is never written!
> 
> changelog:
> v1->v2: Add explanation that bio_dev(bio) is different with 
> block_device(bd_dev).
> 
> Signed-off-by: Gu JinXiang 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

2017-10-30 Thread Jeff Layton
On Fri, 2017-05-12 at 12:21 -0400, J. Bruce Fields wrote:
> On Fri, May 12, 2017 at 08:22:23AM +1000, NeilBrown wrote:
> > On Thu, May 11 2017, J. Bruce Fields wrote:
> > > +static inline u64 nfsd4_change_attribute(struct inode *inode)
> > > +{
> > > + u64 chattr;
> > > +
> > > + chattr = inode->i_ctime.tv_sec << 30;
> > > + chattr += inode->i_ctime.tv_nsec;
> > > + chattr += inode->i_version;
> > > + return chattr;
> > 
> > So if I chmod a file, all clients will need to flush the content from their 
> > cache?
> > Maybe they already do?  Maybe it is a boring corner case?
> 
> Yeah, that's the assumption, maybe it's wrong.  I can't recall
> complaints about anyone bitten by that case.
> 

I'm pretty sure that's required by the RFC. The change attribute changes
with both data and metadata changes, and there is no way to tell what
sort of change it was. You have to dump everything out of the cache when
it changes.

> > >  /*
> > >   * Fill in the pre_op attr for the wcc data
> > >   */
> > > @@ -253,7 +263,7 @@ fill_pre_wcc(struct svc_fh *fhp)
> > >   fhp->fh_pre_mtime = inode->i_mtime;
> > >   fhp->fh_pre_ctime = inode->i_ctime;
> > >   fhp->fh_pre_size  = inode->i_size;
> > > - fhp->fh_pre_change = inode->i_version;
> > > + fhp->fh_pre_change = nfsd4_change_attribute(inode);
> > >   fhp->fh_pre_saved = true;
> > >   }
> > >  }
> > > --- a/fs/nfsd/nfs3xdr.c
> > > +++ b/fs/nfsd/nfs3xdr.c
> > > @@ -260,7 +260,7 @@ void fill_post_wcc(struct svc_fh *fhp)
> > >   printk("nfsd: inode locked twice during operation.\n");
> > >  
> > >   err = fh_getattr(fhp, &fhp->fh_post_attr);
> > > - fhp->fh_post_change = d_inode(fhp->fh_dentry)->i_version;
> > > + fhp->fh_post_change = nfsd4_change_attribute(d_inode(fhp->fh_dentry));
> > >   if (err) {
> > >   fhp->fh_post_saved = false;
> > >   /* Grab the ctime anyway - set_change_info might use it */
> > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > > index 26780d53a6f9..a09532d4a383 100644
> > > --- a/fs/nfsd/nfs4xdr.c
> > > +++ b/fs/nfsd/nfs4xdr.c
> > > @@ -1973,7 +1973,7 @@ static __be32 *encode_change(__be32 *p, struct 
> > > kstat *stat, struct inode *inode,
> > >   *p++ = cpu_to_be32(convert_to_wallclock(exp->cd->flush_time));
> > >   *p++ = 0;
> > >   } else if (IS_I_VERSION(inode)) {
> > > - p = xdr_encode_hyper(p, inode->i_version);
> > > + p = xdr_encode_hyper(p, nfsd4_change_attribute(inode));
> > >   } else {
> > >   *p++ = cpu_to_be32(stat->ctime.tv_sec);
> > >   *p++ = cpu_to_be32(stat->ctime.tv_nsec);
> > 
> > It is *really* confusing to find that fh_post_change is only set in nfs3
> > code, and only used in nfs4 code.
> 
> Yup.
> 
> > It is probably time to get a 'version' field in 'struct kstat'.
> 
> The pre/post_wcc code doesn't seem to be doing an explicit stat, I
> wonder if that matters?
> 

Probably not for now. We only use this for namespace altering operations
anyway (create, link, unlink, and rename).

The post code does do a fh_getattr. It's only the pre-op i_version that
comes out of the cache. Only btrfs, xfs, and ext4 have a real i_version
counter today, and they just scrape that info out of the in-core inode.
So while not completely atomic, you should see a difference in the
change_info4 during any of those operations.

FWIW, userland cephfs now supports a cluster-coherent change attribute,
though the kernel client still needs some work before we can implement
it there. Eventually we'll add that, and at that point we might need to
have nfsd do a getattr in the pre part as well.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What is the purpose of EXTENT_PAGE_MAPPED

2017-10-30 Thread Nikolay Borisov


On 27.10.2017 07:17, Liu Bo wrote:
> On Tue, Oct 24, 2017 at 05:47:11AM -0500, Goldwyn Rodrigues wrote:
>>
>> EXTENT_PAGE_MAPPED gets set in set_page_extent_mapped(), but I don't see
>> it being cross checked anytime. What is the purpose of setting it?
> 
> Please check commit d1310b2e0cd98eb1348553e69b73827b436dca7b, it was
> used to differentiate page for metadata and for data, but I think
> currently it's just a piece of legacy code.

Be that as it may - is there any reason why we are keeping this and can
it be killed off?

> 
> -liubo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Remove redundant mirror_num arg

2017-10-30 Thread David Sterba
On Tue, Oct 24, 2017 at 11:50:39AM +0300, Nikolay Borisov wrote:
> The following callpath is always invoked with mirror_num set to 0, so let's
> remove it as an argument and directly pass 0 to __do_redpage. No functional 
> change
> 
> extent_readpages
>   __extent_readpages
> __do_contiguous_readpages
>   __do_readpage
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: add write_flags for compression bio

2017-10-30 Thread David Sterba
On Mon, Oct 23, 2017 at 11:18:16PM -0600, Liu Bo wrote:
> Compression code path has only flaged bios with REQ_OP_WRITE no matter
> where the bios come from, but it could be a sync write if fsync starts
> this writeback or a normal writeback write if wb kthread starts a
> periodic writeback.
> 
> It breaks the rule that sync writes and writeback writes need to be
> differentiated from each other, because from the POV of block layer,
> all bios need to be recognized by these flags in order to do some
> management, e.g. throttlling.
> 
> This passes writeback_control to compression write path so that it can
> send bios with proper flags to block layer.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What is the purpose of EXTENT_PAGE_MAPPED

2017-10-30 Thread David Sterba
On Mon, Oct 30, 2017 at 03:21:51PM +0200, Nikolay Borisov wrote:
> 
> 
> On 27.10.2017 07:17, Liu Bo wrote:
> > On Tue, Oct 24, 2017 at 05:47:11AM -0500, Goldwyn Rodrigues wrote:
> >>
> >> EXTENT_PAGE_MAPPED gets set in set_page_extent_mapped(), but I don't see
> >> it being cross checked anytime. What is the purpose of setting it?
> > 
> > Please check commit d1310b2e0cd98eb1348553e69b73827b436dca7b, it was
> > used to differentiate page for metadata and for data, but I think
> > currently it's just a piece of legacy code.
> 
> Be that as it may - is there any reason why we are keeping this and can
> it be killed off?

Are we're talking about EXTENT_PAGE_PRIVATE? There's no
EXTENT_PAGE_MAPPED. There's some control dependency on the page private
bit and the value, so we should be careful and replace the function with
an assert (or a BUG_ON if it's a must-not-happen state). The page->private
points to an extent buffer, and if it's always an eb, then the
EXTENT_PAGE_PRIVATE is unused.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What is the purpose of EXTENT_PAGE_MAPPED

2017-10-30 Thread Nikolay Borisov


On 30.10.2017 15:55, David Sterba wrote:
> On Mon, Oct 30, 2017 at 03:21:51PM +0200, Nikolay Borisov wrote:
>>
>>
>> On 27.10.2017 07:17, Liu Bo wrote:
>>> On Tue, Oct 24, 2017 at 05:47:11AM -0500, Goldwyn Rodrigues wrote:

 EXTENT_PAGE_MAPPED gets set in set_page_extent_mapped(), but I don't see
 it being cross checked anytime. What is the purpose of setting it?
>>>
>>> Please check commit d1310b2e0cd98eb1348553e69b73827b436dca7b, it was
>>> used to differentiate page for metadata and for data, but I think
>>> currently it's just a piece of legacy code.
>>
>> Be that as it may - is there any reason why we are keeping this and can
>> it be killed off?
> 
> Are we're talking about EXTENT_PAGE_PRIVATE? There's no
> EXTENT_PAGE_MAPPED. There's some control dependency on the page private
> bit and the value, so we should be careful and replace the function with
> an assert (or a BUG_ON if it's a must-not-happen state). The page->private
> points to an extent buffer, and if it's always an eb, then the
>EXTENT_PAGE_PRIVATE is unused.

I guess I meant do we actually need: set_page_extent_mapped and all the
jazz happening in it  or is it a leftover (which I believe it is) ?

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What is the purpose of EXTENT_PAGE_MAPPED

2017-10-30 Thread Goldwyn Rodrigues


On 10/30/2017 09:01 AM, Nikolay Borisov wrote:
> 
> 
> On 30.10.2017 15:55, David Sterba wrote:
>> On Mon, Oct 30, 2017 at 03:21:51PM +0200, Nikolay Borisov wrote:
>>>
>>>
>>> On 27.10.2017 07:17, Liu Bo wrote:
 On Tue, Oct 24, 2017 at 05:47:11AM -0500, Goldwyn Rodrigues wrote:
>
> EXTENT_PAGE_MAPPED gets set in set_page_extent_mapped(), but I don't see
> it being cross checked anytime. What is the purpose of setting it?

 Please check commit d1310b2e0cd98eb1348553e69b73827b436dca7b, it was
 used to differentiate page for metadata and for data, but I think
 currently it's just a piece of legacy code.
>>>
>>> Be that as it may - is there any reason why we are keeping this and can
>>> it be killed off?
>>
>> Are we're talking about EXTENT_PAGE_PRIVATE? There's no
>> EXTENT_PAGE_MAPPED. There's some control dependency on the page private
>> bit and the value, so we should be careful and replace the function with
>> an assert (or a BUG_ON if it's a must-not-happen state). The page->private
>> points to an extent buffer, and if it's always an eb, then the
>> EXTENT_PAGE_PRIVATE is unused.
> 
> I guess I meant do we actually need: set_page_extent_mapped and all the
> jazz happening in it  or is it a leftover (which I believe it is) ?
> 

We definitely need set_page_extent_mapped() to set the page private for
the lowers layers to handle I/O, primarily writebacks. However, we may
not need EXTENT_PAGE_PRIVATE (yes, I got it wrong the first time). I am
still reading on this though.


-- 
Goldwyn
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"

2017-10-30 Thread Zak Kohler
Lakshmipathi.G posted on Mon, Oct 30, 2017 at 12:07 AM as excerpted:

> Can you share these steps as simple bash script? Currently I'm running
> few tests, I can check your script and share the results.

sure, but it is pretty trivial:

device1=/dev/disk/by-id/XX1
device2=/dev/disk/by-id/XX2
device3=/dev/disk/by-id/XX3

mkfs.btrfs -L OfflineJ $device1 $device2 $device3
mount -t btrfs $device1 /mnt
btrfs subvolume create /mnt/dataroot
mkdir -p /media/OfflineJ

umount /mnt
mount -o subvol=/dataroot $device1 /media/OfflineJ

> > The only thing I could think of is that the btrfs version that I used to 
> > mkfs
> > was not up to date. Is there a way to determine which version was used to
> > create the filesystem?
> >
> If I'm not wrong, i don't think we can get mkfs version from existing file
> system layout. what was your kernel version?

Not quite sure.
Oldest: archlinux linux-lts as of 2017.06.08
Most likely: archlinux linux-lts as of 2017.10.18

It may be possible that I updated and created the fs without
rebooting or reloading the kernel modules. But normally in that case
any interaction, even mounting, just fails loudly.


Duncan posted on Mon, Oct 30, 2017 at 12:09 AM as excerpted:

> and I downclocked it a notch (from its rated pc3200 to 3000, IIRC).  At

Hmm, I did have a mild overclock on this system when I used it as a desktop
but I thought I reverted it when I made it into a server. I'll double
check this.

> sometimes, but not always, trigger as well, but the most frequent symptom
> really was checksum verification failures untarring tbz2s.

I'll have to try and recreate this using my EXT4 system disk.


Zak Kohler posted on Sun, Oct 29, 2017 at 9:57 PM as excerpted:

> I'm running '$ sudo btrfs scrub start --offline --progress
> /dev/disk/by-id/WD-XX3' one more time to just to make sure that I
> wasn't reading something wrong.

Well even stranger news on the fourth run:

$ sudo btrfs scrub start --offline --progress /dev/disk/by-id/WD-XX3
Csum error: 175 # first run
Csum error: 112 # second run...
Csum error: 285 # third run...
Csum error: 0 # fourth run

Also can anyone confirm that the above --offline command checks the whole
filesystem regardless which device is specified on the cli?

> Only other think I can think to try is transferring those drives to
> another system.

I still need to try this also.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] btrfs: match btrfs_device->mode same as it used for open

2017-10-30 Thread David Sterba
On Fri, Oct 20, 2017 at 10:07:15PM +0800, Anand Jain wrote:
> We aren't setting the FMODE_WRITE when initializing btrfs_device
> structure and when calling blkdev_put, however we are setting it
> only when calling blkdev_get_by_path().

But this still does not say why this is a problem worth fixing. Nikolay
asked for it, and I would do the same, but why do we even have to ask
for that?

https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: add support for fallocate's zero range operation

2017-10-30 Thread David Sterba
On Wed, Oct 25, 2017 at 03:59:08PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> This implements support the zero range operation of fallocate. For now
> at least it's as simple as possible while reusing most of the existing
> fallocate and hole punching infrastructure.
> 
> Signed-off-by: Filipe Manana 
> ---

I'll add this to for-next so we can more testing coverage, review is
open.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: Fix transaction abort during failure in btrfs_rm_dev_item

2017-10-30 Thread David Sterba
On Mon, Oct 23, 2017 at 10:29:27AM -0600, Edmund Nadolski wrote:
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -1765,20 +1765,24 @@ static int btrfs_rm_dev_item(struct btrfs_fs_info 
> > *fs_info,
> > key.offset = device->devid;
> >  
> > ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
> > -   if (ret < 0)
> > -   goto out;
> > -
> > -   if (ret > 0) {
> > -   ret = -ENOENT;
> > +   if (ret) {
> > +   if (ret > 0)
> > +   ret = -ENOENT;
> > +   btrfs_abort_transaction(trans, ret);
> > +   btrfs_end_transaction(trans);
> > goto out;
> > }
> >  
> > ret = btrfs_del_item(trans, root, path);
> > -   if (ret)
> > -   goto out;
> > +   if (ret) {
> > +   btrfs_abort_transaction(trans, ret);
> > +   btrfs_end_transaction(trans);
> > +   }
> > +
> >  out:
> > btrfs_free_path(path);
> > -   btrfs_commit_transaction(trans);
> > +   if (!ret)
> > +   ret = btrfs_commit_transaction(trans);
> > return ret;
> >  }
> >  
> > 
> 
> Perhaps slightly simpler (and the 'out:' label maybe goes away):
> 
>   .
>   ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
>   if (ret > 0)
>   ret = -ENOENT;
>   else if (!ret)
>   ret = btrfs_del_item(trans, root, path);
> 
>   if (ret) {
>   btrfs_abort_transaction(trans, ret);
>   btrfs_end_transaction(trans);

This would merge 2 abort sites into one, btrfs_search_slot and
btrfs_del_item, so it wouldn't be obvious which one failed just from the
stack trace and line number.

V4 is ok, as it only joins return values from btrfs_search_slot, where
the positive value means "search slot was not able to find the key, but
this is where you should insert it". This really translates to ENOENT so
we're not losing any information.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: Make btrfs_async_run_delayed_root use a loop rather than multiple labels

2017-10-30 Thread David Sterba
On Mon, Oct 23, 2017 at 07:43:02PM +0800, Qu Wenruo wrote:
> 
> 
> On 2017年10月23日 18:51, Nikolay Borisov wrote:
> > Currently btrfs_async_run_delayed_root's implementation uses 3 goto labels 
> > to
> > mimic the functionality of a simple do {} while loop. Refactor the function
> > to use a do {} while construct, making intention clear and code easier to
> > follow. No functional changes
> > 
> > Signed-off-by: Nikolay Borisov 
> 
> Looks good to me.
> 
> Reviewed-by: Qu Wenruo 

1-2 added to next.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: remove rcu_barrier in btrfs_close_devices

2017-10-30 Thread David Sterba
On Wed, Oct 11, 2017 at 11:02:49AM -0600, Liu Bo wrote:
> On Wed, Oct 11, 2017 at 03:41:23PM +0800, Anand Jain wrote:
> > 
> > 
> > On 10/11/2017 02:11 PM, Anand Jain wrote:
> > > 
> > > 
> > > On 10/11/2017 05:51 AM, Liu Bo wrote:
> > > > It was introduced because btrfs used to do blkdev_put in a deferred
> > > > work, now that btrfs has put blkdev in place, this rcu_barrier can be
> > > > removed.
> > 
> >  On the 2nd thought, modprobe -r btrfs would still need rcu_barrier(), some
> > where else outside of umount context ?
> 
> Thanks a lot for the comments.
> 
> modprobe -r btrfs will do btrfs_cleanup_fs_uuids(), where it cleanup
> every %fs_devices on the list, but when we do btrfs_close_devices(), we
> have replaced the devices on the list with dummy ones which only have
> the same name and uuid, so modprobe -r btrfs will free those instead
> of what we were using, this change won't cause a problem for it.

Added to the changelog.

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: bail out gracefully rather than BUG_ON

2017-10-30 Thread Liu Bo
If a file's DIR_ITEM key is invalid (due to memory errors) and gets
written to disk, a future lookup_path can end up with kernel panic due
to BUG_ON().

This gets rid of the BUG_ON(), meanwhile output the corrupted key and
return ENOENT if it's invalid.

Signed-off-by: Liu Bo 
---
The diff doesn't show the logic well, 'goto out_err' will return with
assigning 0 to location->objectid, and the caller already has a check
for (location->objectid == 0) to return -ENOENT.

 fs/btrfs/inode.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d94e3f6..916cdc9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5500,6 +5500,14 @@ static int btrfs_inode_by_name(struct inode *dir, struct 
dentry *dentry,
goto out_err;
 
btrfs_dir_item_key_to_cpu(path->nodes[0], di, location);
+   if (location->type != BTRFS_INODE_ITEM_KEY &&
+   location->type != BTRFS_ROOT_ITEM_KEY) {
+   btrfs_warn(root->fs_info,
+ "%s gets something invalid in DIR_ITEM (name %s, directory ino %llu, 
location(%llu %u %llu))",
+  __func__, name, btrfs_ino(BTRFS_I(dir)),
+  location->objectid, location->type, 
location->offset);
+   goto out_err;
+   }
 out:
btrfs_free_path(path);
return ret;
@@ -5816,8 +5824,6 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, 
struct dentry *dentry)
return inode;
}
 
-   BUG_ON(location.type != BTRFS_ROOT_ITEM_KEY);
-
index = srcu_read_lock(&fs_info->subvol_srcu);
ret = fixup_tree_root_location(fs_info, dir, dentry,
   &location, &sub_root);
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] Btrfs: remove max_active and thresh

2017-10-30 Thread Liu Bo
First and foremost, here are the problems we have right now,

a) %thread_pool is configurable via mount option, however for those
   'workers' who really needs concurrency, there are at most
   "min(num_cpus+2, 8)" threads running to process works, and the
   value can't be tuned after mount because they're tagged with
   NO_THRESHOLD.

b) For THRESHOLD workers, btrfs is adjusting how concurrency will
   happen by starting with the minimum max_active and calling
   btrfs_workqueue_set_max() to grow it on demand, but it also has a
   upper limit, "min(num_cpus+2, 8)", which means at most we can have
   such many threads running at the same time.

In fact, kernel workqueue can be created on demand and destroyed once
no work needs to be processed.  The small max_active limitation (8)
from btrfs has prevented us from utilizing all available cpus, and
that is against why we choose workqueue.

What this patch does:

- resizing %thread_pool is totally removed as they are no long needed,
  while keeping its mount option for compatibility.

- The default "0" is passed when allocating workqueue, so the maximum
  number of running works is typically 256.  And all fields for
  limiting max_active are removed, including current_active,
  limit_active, thresh etc.

Signed-off-by: Liu Bo 
---
 fs/btrfs/async-thread.c  | 155 +++
 fs/btrfs/async-thread.h  |  27 +++--
 fs/btrfs/delayed-inode.c |   7 ++-
 fs/btrfs/disk-io.c   |  61 ++-
 fs/btrfs/scrub.c |  11 ++--
 fs/btrfs/super.c |  32 --
 6 files changed, 58 insertions(+), 235 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index e00c8a9..dd627ad 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -29,41 +29,6 @@
 #define WORK_ORDER_DONE_BIT 1
 #define WORK_HIGH_PRIO_BIT 2
 
-#define NO_THRESHOLD (-1)
-#define DFT_THRESHOLD (32)
-
-struct __btrfs_workqueue {
-   struct workqueue_struct *normal_wq;
-
-   /* File system this workqueue services */
-   struct btrfs_fs_info *fs_info;
-
-   /* List head pointing to ordered work list */
-   struct list_head ordered_list;
-
-   /* Spinlock for ordered_list */
-   spinlock_t list_lock;
-
-   /* Thresholding related variants */
-   atomic_t pending;
-
-   /* Up limit of concurrency workers */
-   int limit_active;
-
-   /* Current number of concurrency workers */
-   int current_active;
-
-   /* Threshold to change current_active */
-   int thresh;
-   unsigned int count;
-   spinlock_t thres_lock;
-};
-
-struct btrfs_workqueue {
-   struct __btrfs_workqueue *normal;
-   struct __btrfs_workqueue *high;
-};
-
 static void normal_work_helper(struct btrfs_work *work);
 
 #define BTRFS_WORK_HELPER(name)\
@@ -86,20 +51,6 @@ btrfs_work_owner(const struct btrfs_work *work)
return work->wq->fs_info;
 }
 
-bool btrfs_workqueue_normal_congested(const struct btrfs_workqueue *wq)
-{
-   /*
-* We could compare wq->normal->pending with num_online_cpus()
-* to support "thresh == NO_THRESHOLD" case, but it requires
-* moving up atomic_inc/dec in thresh_queue/exec_hook. Let's
-* postpone it until someone needs the support of that case.
-*/
-   if (wq->normal->thresh == NO_THRESHOLD)
-   return false;
-
-   return atomic_read(&wq->normal->pending) > wq->normal->thresh * 2;
-}
-
 BTRFS_WORK_HELPER(worker_helper);
 BTRFS_WORK_HELPER(delalloc_helper);
 BTRFS_WORK_HELPER(flush_delalloc_helper);
@@ -125,7 +76,7 @@ BTRFS_WORK_HELPER(scrubparity_helper);
 
 static struct __btrfs_workqueue *
 __btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info, const char *name,
-   unsigned int flags, int limit_active, int thresh)
+   unsigned int flags)
 {
struct __btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_KERNEL);
 
@@ -133,31 +84,15 @@ __btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info, 
const char *name,
return NULL;
 
ret->fs_info = fs_info;
-   ret->limit_active = limit_active;
atomic_set(&ret->pending, 0);
-   if (thresh == 0)
-   thresh = DFT_THRESHOLD;
-   /* For low threshold, disabling threshold is a better choice */
-   if (thresh < DFT_THRESHOLD) {
-   ret->current_active = limit_active;
-   ret->thresh = NO_THRESHOLD;
-   } else {
-   /*
-* For threshold-able wq, let its concurrency grow on demand.
-* Use minimal max_active at alloc time to reduce resource
-* usage.
-*/
-   ret->current_active = 1;
-   ret->thresh = thresh;
-   }
 
if (flags & WQ_HIGHPRI)
ret->normal_wq = alloc_workqueue("%s-%s-high", flags,
-ret->curren

[PATCH] Btrfs: kill threshold for submit_workers

2017-10-30 Thread Liu Bo
"submit_workers" is a workqueue that serves to collect and dispatch IOs
on each device in btrfs, thus the work that is queued on it is
per-device, which means at most there're as many works as the number
of devices owned by a btrfs.

Now we've set threshhold (=64) for "submit_workers" and the
'max_active' work of this workqueue is set to 1 and will be updated
accrodingly.

However, as the threshold (64) is the highest one and the 'max_active'
gets updated only when there're more works than its threshold at the
same time, the end result is that it's almostly unlikely to update
'max_active' because you'll need >64 devices to have a chance to do
that.

Given the above fact, in most cases, works on each device which
process IOs is run in order by only one kthread of 'submit_workers'.

It's OK for DUP and raid0 since at the same time IOs are always
submitted to one device, while for raid1 and raid10, where our primary
bio only completes when all cloned bios submitted by each raid1/10's
device complete[1], it's suboptimal.

This changes the threshold to 1 so that __btrfs_alloc_workqueue() can
use NO_THRESHOLD for 'submit_workers' and the 'max_active' will be
min(num_devices, thread_pool_size).

[1]: raid1 example,
primary bio
 /\
 bio1  bio2
  | |
 dev1  dev2
  | |
endio1 endio2
 \/
endio

Signed-off-by: Liu Bo 
---
 fs/btrfs/disk-io.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dfdab84..373d3f6f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2368,15 +2368,10 @@ static int btrfs_init_workqueues(struct btrfs_fs_info 
*fs_info,
fs_info->caching_workers =
btrfs_alloc_workqueue(fs_info, "cache", flags, max_active, 0);
 
-   /*
-* a higher idle thresh on the submit workers makes it much more
-* likely that bios will be send down in a sane order to the
-* devices
-*/
fs_info->submit_workers =
btrfs_alloc_workqueue(fs_info, "submit", flags,
  min_t(u64, fs_devices->num_devices,
-   max_active), 64);
+   max_active), 1);
 
fs_info->fixup_workers =
btrfs_alloc_workqueue(fs_info, "fixup", flags, 1, 0);
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: bail out gracefully rather than BUG_ON

2017-10-30 Thread Liu Bo
On Mon, Oct 30, 2017 at 11:14:38AM -0600, Liu Bo wrote:
> If a file's DIR_ITEM key is invalid (due to memory errors) and gets
> written to disk, a future lookup_path can end up with kernel panic due
> to BUG_ON().
> 
> This gets rid of the BUG_ON(), meanwhile output the corrupted key and
> return ENOENT if it's invalid.
>

The kernel panic is originally

Reported-by: Guillaume Bouchard 

Thanks,

-liubo
> Signed-off-by: Liu Bo 
> ---
> The diff doesn't show the logic well, 'goto out_err' will return with
> assigning 0 to location->objectid, and the caller already has a check
> for (location->objectid == 0) to return -ENOENT.
> 
>  fs/btrfs/inode.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d94e3f6..916cdc9 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5500,6 +5500,14 @@ static int btrfs_inode_by_name(struct inode *dir, 
> struct dentry *dentry,
>   goto out_err;
>  
>   btrfs_dir_item_key_to_cpu(path->nodes[0], di, location);
> + if (location->type != BTRFS_INODE_ITEM_KEY &&
> + location->type != BTRFS_ROOT_ITEM_KEY) {
> + btrfs_warn(root->fs_info,
> +   "%s gets something invalid in DIR_ITEM (name %s, directory ino %llu, 
> location(%llu %u %llu))",
> +__func__, name, btrfs_ino(BTRFS_I(dir)),
> +location->objectid, location->type, 
> location->offset);
> + goto out_err;
> + }
>  out:
>   btrfs_free_path(path);
>   return ret;
> @@ -5816,8 +5824,6 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, 
> struct dentry *dentry)
>   return inode;
>   }
>  
> - BUG_ON(location.type != BTRFS_ROOT_ITEM_KEY);
> -
>   index = srcu_read_lock(&fs_info->subvol_srcu);
>   ret = fixup_tree_root_location(fs_info, dir, dentry,
>  &location, &sub_root);
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: avoid losing data raid profile when deleting a device

2017-10-30 Thread Liu Bo
On Mon, Oct 16, 2017 at 11:53:08AM +0300, Nikolay Borisov wrote:
> 
> 
> On 13.10.2017 23:51, Liu Bo wrote:
> > On Wed, Oct 11, 2017 at 10:38:50AM +0300, Nikolay Borisov wrote:
> >>
> >>
> >> On 10.10.2017 20:53, Liu Bo wrote:
> >>> We've avoided data losing raid profile when doing balance, but it
> >>> turns out that deleting a device could also result in the same
> >>> problem
> >>>
> >>> This fixes the problem by creating an empty data chunk before
> >>> relocating the data chunk.
> >>
> >> Why is this needed - copy the metadata of the to-be-relocated chunk into
> >> the newly created empty chunk? I don't entirely understand that code but
> >> doesn't this seem a bit like a hack in order to stash some information?
> >> Perhaps you could elaborate the logic a bit more in the changelog?
> >>
> >>>
> >>> Metadata/System chunk are supposed to have non-zero bytes all the time
> >>> so their raid profile is persistent.
> >>
> >> I think this changelog is a bit scarce on detail as to the culprit of
> >> the problem. Could you perhaps put a sentence or two what the underlying
> >> logic which deletes the raid profile if a chunk is empty ?
> >>
> > 
> > Fair enough.
> > 
> > The problem is as same as what commit 2c9fe8355258 ("btrfs: Fix
> > lost-data-profile caused by balance bg") had fixed.
> > 
> > Similar to doing balance, deleting a device can also move all chunks
> > on this disk to other available disks, after 'move' successfully,
> > it'll remove those chunks.
> > 
> > If our last data chunk is empty and part of it happens to be on this
>   ^
> "Last data chunk" of which disk - the "to-be-removed" one or the disk to
> which chunks have been relocated?
>

It refers to the 'to-be-removed' disk.

> > disk, then there is no data chunk in this btrfs after deleting the
>^
> Which disk are you referring to - the one being removed?

Yes, the 'to-be-removed' disk.

Thanks,

-liubo

> > device successfully, any following write will try to create a new data
> > chunk which ends up with a single data chunk because the only
> > available data raid profile is 'single'.
> > 
> > thanks,
> > -liubo
> > 
> >>>
> >>> Reported-by: James Alandt 
> >>> Signed-off-by: Liu Bo 
> >>> ---
> >>>
> >>> v2: - return the correct error.
> >>> - move helper ahead of __btrfs_balance().
> >>>
> >>>  fs/btrfs/volumes.c | 84 
> >>> ++
> >>>  1 file changed, 65 insertions(+), 19 deletions(-)
> >>>
> >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >>> index 4a72c45..a74396d 100644
> >>> --- a/fs/btrfs/volumes.c
> >>> +++ b/fs/btrfs/volumes.c
> >>> @@ -3018,6 +3018,48 @@ static int btrfs_relocate_sys_chunks(struct 
> >>> btrfs_fs_info *fs_info)
> >>>   return ret;
> >>>  }
> >>>  
> >>> +/*
> >>> + * return 1 : allocate a data chunk successfully,
> >>> + * return <0: errors during allocating a data chunk,
> >>> + * return 0 : no need to allocate a data chunk.
> >>> + */
> >>> +static int btrfs_may_alloc_data_chunk(struct btrfs_fs_info *fs_info,
> >>> +   u64 chunk_offset)
> >>> +{
> >>> + struct btrfs_block_group_cache *cache;
> >>> + u64 bytes_used;
> >>> + u64 chunk_type;
> >>> +
> >>> + cache = btrfs_lookup_block_group(fs_info, chunk_offset);
> >>> + ASSERT(cache);
> >>> + chunk_type = cache->flags;
> >>> + btrfs_put_block_group(cache);
> >>> +
> >>> + if (chunk_type & BTRFS_BLOCK_GROUP_DATA) {
> >>> + spin_lock(&fs_info->data_sinfo->lock);
> >>> + bytes_used = fs_info->data_sinfo->bytes_used;
> >>> + spin_unlock(&fs_info->data_sinfo->lock);
> >>> +
> >>> + if (!bytes_used) {
> >>> + struct btrfs_trans_handle *trans;
> >>> + int ret;
> >>> +
> >>> + trans = btrfs_join_transaction(fs_info->tree_root);
> >>> + if (IS_ERR(trans))
> >>> + return PTR_ERR(trans);
> >>> +
> >>> + ret = btrfs_force_chunk_alloc(trans, fs_info,
> >>> +   BTRFS_BLOCK_GROUP_DATA);
> >>> + btrfs_end_transaction(trans);
> >>> + if (ret < 0)
> >>> + return ret;
> >>> +
> >>> + return 1;
> >>> + }
> >>> + }
> >>> + return 0;
> >>> +}
> >>> +
> >>>  static int insert_balance_item(struct btrfs_fs_info *fs_info,
> >>>  struct btrfs_balance_control *bctl)
> >>>  {
> >>> @@ -3476,7 +3518,6 @@ static int __btrfs_balance(struct btrfs_fs_info 
> >>> *fs_info)
> >>>   u32 count_meta = 0;
> >>>   u32 count_sys = 0;
> >>>   int chunk_reserved = 0;
> >>> - u64 bytes_used = 0;
> >>>  
> >>>   /* step one make some room on all the devices */
> >>>   devices = &fs_info->fs_devices->devices;
> >>> @@ -3635,28 +3676,21 @@ static int __btrfs_balance(struct btrfs_fs_info 
> >>> *fs_info)
> >>>   goto loop;
> >>>   }
> >>>  
> >>> - ASSERT(fs_info->data_sinfo);
> >>> -   

Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"

2017-10-30 Thread Chris Murphy
On Mon, Oct 30, 2017 at 2:57 AM, Zak Kohler  wrote:

> $ sudo btrfs scrub start --offline --progress /dev/disk/by-id/WD-XX1
> Scrub result:
> Tree bytes scrubbed: 5234425856
> Tree extents scrubbed: 638968
> Data bytes scrubbed: 4353723670528
> Data extents scrubbed: 374300
> Data bytes without csum: 533200896
> Read error: 0
> Verify error: 0
> Csum error: 150
>
> $ sudo btrfs scrub start --offline --progress /dev/disk/by-id/WD-XX2
> Scrub result:
> Tree bytes scrubbed: 5234425856
> Tree extents scrubbed: 638967
> Data bytes scrubbed: 4353723314176
> Data extents scrubbed: 374300
> Data bytes without csum: 533200896
> Read error: 0
> Verify error: 0
> Csum error: 238
>
> $ sudo btrfs scrub start --offline --progress /dev/disk/by-id/WD-XX3
> Scrub result:
> Tree bytes scrubbed: 5234491392
> Tree extents scrubbed: 638975
> Data bytes scrubbed: 4353723572224
> Data extents scrubbed: 374300
> Data bytes without csum: 533200896
> Read error: 0
> Verify error: 0
> Csum error: 175 #first run
> Csum error: 112 #second run...
> Csum error: 285 #third run...
>
> But I ran the /dev/disk/by-id/WD-XX3 device three times and you can
> see the result...


I expect these commands are the same, and involve all three drives in
the offline scrub each time. So you have five different results, but
all five involve csum errors. So the errors have a certain transience
to them, hence inconsistent results.

But the online scrub consistently reports zero errors. That to me
sounds like a bug in the offline scrub code. Maybe it's confused, and
reports data without csums (nodatacow) as csum errors? That does not
explain the inconsistency though.

And then you're getting an consistent failure, but at an inconsistent
location, with Btrfs send, ostensibly due to IO error, which sounds
like it's hitting a bad csum check.

It is entirely possible to get transient errors like this somewhere in
a storage stack that's otherwise not reported by the error detection
code in that layer. The thing I really don't understand is how you're
getting zero errors with conventional online scrub, every time.

On my tiny 23G installation I'm traveling with, I get the same results
with all three scrub methods on an NVMe drive. Zero errors. The
slighly larger spinning rust drives are not with me so I can't check
them for a while.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with file system

2017-10-30 Thread Chris Murphy
On Mon, Oct 30, 2017 at 4:31 AM, Dave  wrote:
> This is a very helpful thread. I want to share an interesting related story.
>
> We have a machine with 4 btrfs volumes and 4 Snapper configs. I
> recently discovered that Snapper timeline cleanup been turned off for
> 3 of those volumes. In the Snapper configs I found this setting:
>
> TIMELINE_CLEANUP="no"
>
> Normally that would be set to "yes". So I corrected the issue and set
> it to "yes" for the 3 volumes where it had not been set correctly.
>
> I suppose it was turned off temporarily and then somebody forgot to
> turn it back on.
>
> What I did not know, and what I did not realize was a critical piece
> of information, was how long timeline cleanup had been turned off and
> how many snapshots had accumulated on each volume in that time.
>
> I naively re-enabled Snapper timeline cleanup. The instant I started
> the  snapper-cleanup.service  the system was hosed. The ssh session
> became unresponsive, no other ssh sessions could be established and it
> was impossible to log into the system at the console.
>
> My subsequent investigation showed that the root filesystem volume
> accumulated more than 3000 btrfs snapshots. The two other affected
> volumes also had very large numbers of snapshots.
>
> Deleting a single snapshot in that situation would likely require
> hours. (I set up a test, but I ran out of patience before I was able
> to delete even a single snapshot.) My guess is that if we had been
> patient enough to wait for all the snapshots to be deleted, the
> process would have finished in some number of months (or maybe a
> year).
>
> We did not know most of this at the time, so we did what we usually do
> when a system becomes totally unresponsive -- we did a hard reset. Of
> course, we could never get the system to boot up again.
>
> Since we had backups, the easiest option became to replace that system
> -- not unlike what the OP decided to do. In our case, the hardware was
> not old, so we simply reformatted the drives and reinstalled Linux.
>
> That's a drastic consequence of changing TIMELINE_CLEANUP="no" to
> TIMELINE_CLEANUP="yes" in the snapper config.


Without a complete autopsy on the file system, it's unclear whether it
was fixable with available tools, and why it wouldn't mount normally,
or if necessary do its own autorecovery with one of the available
backup roots.

But off hand it sounds like hardware was sabotaging the expected write
ordering. How to test a given hardware setup for that, I think, is
really overdue. It affects literally every file system, and Linux
storage technology.

It kinda sounds like to me something other than supers is being
overwritten too soon, and that's why it's possible for none of the
backup roots to find a valid root tree, because all four possible root
trees either haven't actually been written yet (still) or they've been
overwritten, even though the super is updated. But again, it's
speculation, we don't actually know why your system was no longer
mountable.



>
> It's all part of the process of gaining critical experience with
> BTRFS. Whether or not BTRFS is ready for production use is (it seems
> to me) mostly a question of how knowledgeable and experienced are the
> people administering it.

"Btrfs is a copy on write filesystem for Linux aimed at implementing advanced
features while focusing on fault tolerance, repair and easy administration."

That is the current descriptive text at
Documentation/filesystems/btrfs.txt for some time now.


> In the various online discussions on this topic, all the focus is on
> whether or not BTRFS itself is production-ready. At the current
> maturity level of BTRFS, I think that's the wrong focus. The right
> focus is on how production-ready is the admin person or team (with
> respect to their BTRFS knowledge and experience). When a filesystem
> has been around for decades, most of the critical admin issues become
> fairly common knowledge, fairly widely known and easy to find. When a
> filesystem is newer, far fewer people understand the gotchas. Also, in
> older or widely used filesystems, when someone hits a gotcha, the
> response isn't "that filesystem is not ready for production". Instead
> the response is, "you should have known not to do that."



That is not a general purpose file system. It's a file system for
admins who understand where the bodies are buried.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Fix quota reservation leak on preallocated files

2017-10-30 Thread Justin Maggard
Commit c6887cd11149 (Btrfs: don't do nocow check unless we have to)
changed the behavior of __btrfs_buffered_write() so that it first tries
to get a data space reservation, and then skips the relatively expensive
nocow check if the reservation succeeded.

If we have quotas enabled, the data space reservation also includes a
quota reservation.  But in the rewrite case, the space has already been
accounted for in qgroups.  So btrfs_check_data_free_space() increases the
quota reservation, but it never gets decreased when the data actually
gets written and overwrites the pre-existing data.  So we're left with
both the qgroup and qgroup reservation accounting for the same space.

This commit adds the missing btrfs_qgroup_free_data() call in the case of
BTRFS_ORDERED_PREALLOC extents.

Signed-off-by: Justin Maggard 
---
 fs/btrfs/inode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d94e3f68b9b1..d7881ccf5310 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3006,6 +3006,8 @@ static int btrfs_finish_ordered_io(struct 
btrfs_ordered_extent *ordered_extent)
compress_type = ordered_extent->compress_type;
if (test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) {
BUG_ON(compress_type);
+   btrfs_qgroup_free_data(inode, NULL, ordered_extent->file_offset,
+  ordered_extent->len);
ret = btrfs_mark_extent_written(trans, BTRFS_I(inode),
ordered_extent->file_offset,
ordered_extent->file_offset +
-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: test if receive with qgroups corrupts metadata

2017-10-30 Thread Justin Maggard
This test case does some concurrent send/receives with qgroups enabled.
Currently (4.14-rc7) this usually results in btrfs check errors, and
often also results in a WARN_ON in record_root_in_trans().

Bisecting points to 6426c7ad697d (btrfs: qgroup: Fix qgroup accounting
when creating snapshot) as the culprit.

Signed-off-by: Justin Maggard 
---
 tests/btrfs/152 | 102 
 tests/btrfs/152.out |  13 +++
 tests/btrfs/group   |   1 +
 3 files changed, 116 insertions(+)
 create mode 100755 tests/btrfs/152
 create mode 100644 tests/btrfs/152.out

diff --git a/tests/btrfs/152 b/tests/btrfs/152
new file mode 100755
index 000..ebb88ed
--- /dev/null
+++ b/tests/btrfs/152
@@ -0,0 +1,102 @@
+#! /bin/bash
+# FS QA Test No. btrfs/152
+#
+# Test that incremental send/receive operations don't corrupt metadata when
+# qgroups are enabled.
+#
+#---
+#
+# Copyright (c) 2017 NETGEAR, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+
+rm -f $seqres.full
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+# Enable quotas
+$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT
+
+# Create 2 source and 4 destination subvolumes
+for subvol in subvol1 subvol2 recv1_1 recv1_2 recv2_1 recv2_2; do
+   $BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/$subvol | _filter_scratch
+done
+mkdir $SCRATCH_MNT/subvol{1,2}/.snapshots
+touch $SCRATCH_MNT/subvol{1,2}/foo
+
+# Create base snapshots and send them
+$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \
+   $SCRATCH_MNT/subvol1/.snapshots/1 | _filter_scratch
+$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \
+   $SCRATCH_MNT/subvol2/.snapshots/1 | _filter_scratch
+for recv in recv1_1 recv1_2 recv2_1 recv2_2; do
+   $BTRFS_UTIL_PROG send $SCRATCH_MNT/subvol1/.snapshots/1 2> /dev/null | \
+   $BTRFS_UTIL_PROG receive $SCRATCH_MNT/${recv} | _filter_scratch
+done
+
+# Now do 10 loops of concurrent incremental send/receives
+for i in `seq 1 10`; do
+   prev=$i
+   curr=$((i+1))
+
+   $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \
+   $SCRATCH_MNT/subvol1/.snapshots/${curr} > /dev/null
+   ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \
+   $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \
+   $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_1) > /dev/null &
+   ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \
+   $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \
+   $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_2) > /dev/null &
+
+   $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \
+   $SCRATCH_MNT/subvol2/.snapshots/${curr} > /dev/null
+   ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \
+   $SCRATCH_MNT/subvol2/.snapshots/${curr} 2> /dev/null | \
+   $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv2_1) > /dev/null &
+   ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \
+   $SCRATCH_MNT/subvol2/.snapshots/${curr} 2> /dev/null | \
+   $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv2_2) > /dev/null &
+   wait
+done
+
+_scratch_unmount
+
+status=0
+exit
diff --git a/tests/btrfs/152.out b/tests/btrfs/152.out
new file mode 100644
index 000..a95bb57
--- /dev/null
+++ b/tests/btrfs/152.out
@@ -0,0 +1,13 @@
+QA output created by 152
+Create subvolume 'SCRATCH_MNT/subvol1'
+Create subvolume 'SCRATCH_MNT/subvol2'
+Create subvolume 'SCRATCH_MNT/recv1_1'
+Create subvolume 'SCRATCH_MNT/recv1_2'
+Create subvolume 'SCRATCH_MNT/recv2_1'
+Create subvolume 'SCRATCH_MNT/recv2_2'
+Create a readonly snapshot of 'SCRATCH_MNT/subvol1' in 
'SCRATCH_MNT/subvol1/.snapshots/1'
+Create a readonly snapshot of 'SCRA

[PATCH] btrfs: test for qgroup reservation leaks with prealloc

2017-10-30 Thread Justin Maggard
This test case writes into pre-allocated space, then tries to fallocate
some more within the defined quota limit. Currently (4.14-rc7) this fails
with EDQUOT due to quota reservation leakage when writing into pre-
allocated space.

A possible fix has been sent to the ML as "btrfs: Fix quota reservation
leak on preallocated files"

Signed-off-by: Justin Maggard 
---
 tests/btrfs/153 | 72 +
 tests/btrfs/153.out |  2 ++
 tests/btrfs/group   |  1 +
 3 files changed, 75 insertions(+)
 create mode 100755 tests/btrfs/153
 create mode 100644 tests/btrfs/153.out

diff --git a/tests/btrfs/153 b/tests/btrfs/153
new file mode 100755
index 000..95a8095
--- /dev/null
+++ b/tests/btrfs/153
@@ -0,0 +1,72 @@
+#! /bin/bash
+# FS QA Test 153
+#
+# Test for leaking quota reservations on preallocated files.
+#
+#---
+#
+# Copyright (c) 2017 NETGEAR, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_btrfs_qgroup_report
+_require_xfs_io_command "falloc"
+
+_scratch_mkfs >/dev/null
+_scratch_mount
+
+_run_btrfs_util_prog quota enable $SCRATCH_MNT
+_run_btrfs_util_prog quota rescan -w $SCRATCH_MNT
+_run_btrfs_util_prog qgroup limit 100M 0/5 $SCRATCH_MNT
+
+testfile1=$SCRATCH_MNT/testfile1
+testfile2=$SCRATCH_MNT/testfile2
+$XFS_IO_PROG -fc "falloc 0 80M" $testfile1
+$XFS_IO_PROG -fc "pwrite 0 80M" $testfile1 > /dev/null
+$XFS_IO_PROG -fc "falloc 0 19M" $testfile2
+$XFS_IO_PROG -fc "pwrite 0 19M" $testfile2 > /dev/null
+
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/btrfs/153.out b/tests/btrfs/153.out
new file mode 100644
index 000..cd9fe8a
--- /dev/null
+++ b/tests/btrfs/153.out
@@ -0,0 +1,2 @@
+QA output created by 153
+Silence is golden
diff --git a/tests/btrfs/group b/tests/btrfs/group
index fb94461..15c2ecb 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -155,3 +155,4 @@
 150 auto quick dangerous
 151 auto quick
 152 auto quick metadata qgroup send
+153 auto quick qgroup
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Fix quota reservation leak on preallocated files

2017-10-30 Thread Qu Wenruo


On 2017年10月31日 06:29, Justin Maggard wrote:
> Commit c6887cd11149 (Btrfs: don't do nocow check unless we have to)
> changed the behavior of __btrfs_buffered_write() so that it first tries
> to get a data space reservation, and then skips the relatively expensive
> nocow check if the reservation succeeded.
> 
> If we have quotas enabled, the data space reservation also includes a
> quota reservation.  But in the rewrite case, the space has already been
> accounted for in qgroups.  So btrfs_check_data_free_space() increases the
> quota reservation, but it never gets decreased when the data actually
> gets written and overwrites the pre-existing data.  So we're left with
> both the qgroup and qgroup reservation accounting for the same space.
> 
> This commit adds the missing btrfs_qgroup_free_data() call in the case of
> BTRFS_ORDERED_PREALLOC extents.
> 
> Signed-off-by: Justin Maggard 

Reviewed-by: Qu Wenruo 

Thanks,
Qu
> ---
>  fs/btrfs/inode.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d94e3f68b9b1..d7881ccf5310 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3006,6 +3006,8 @@ static int btrfs_finish_ordered_io(struct 
> btrfs_ordered_extent *ordered_extent)
>   compress_type = ordered_extent->compress_type;
>   if (test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) {
>   BUG_ON(compress_type);
> + btrfs_qgroup_free_data(inode, NULL, ordered_extent->file_offset,
> +ordered_extent->len);
>   ret = btrfs_mark_extent_written(trans, BTRFS_I(inode),
>   ordered_extent->file_offset,
>   ordered_extent->file_offset +
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH RFC] Btrfs: remove max_active and thresh

2017-10-30 Thread Qu Wenruo


On 2017年10月31日 01:14, Liu Bo wrote:
> First and foremost, here are the problems we have right now,
> 
> a) %thread_pool is configurable via mount option, however for those
>'workers' who really needs concurrency, there are at most
>"min(num_cpus+2, 8)" threads running to process works, and the
>value can't be tuned after mount because they're tagged with
>NO_THRESHOLD.
> 
> b) For THRESHOLD workers, btrfs is adjusting how concurrency will
>happen by starting with the minimum max_active and calling
>btrfs_workqueue_set_max() to grow it on demand, but it also has a
>upper limit, "min(num_cpus+2, 8)", which means at most we can have
>such many threads running at the same time.

I was also wondering this when using kernel workqueue to replace btrfs
workqueue.
However at that time, oct-core CPU is even hard to find in servers.

> 
> In fact, kernel workqueue can be created on demand and destroyed once
> no work needs to be processed.  The small max_active limitation (8)
> from btrfs has prevented us from utilizing all available cpus, and
> that is against why we choose workqueue.


Yep, I'm also wondering should we keep the old 8 threads up limits.
Especially nowadays oct-core CPU is getting cheaper and cheaper (thanks
AMD and its Ryzen).

> 
> What this patch does:
> 
> - resizing %thread_pool is totally removed as they are no long needed,
>   while keeping its mount option for compatibility.
> 
> - The default "0" is passed when allocating workqueue, so the maximum
>   number of running works is typically 256.  And all fields for
>   limiting max_active are removed, including current_active,
>   limit_active, thresh etc.

Any benchmark will make this more persuasive.
Especially using low frequency but high thread count CPU.

The idea and the patch looks good to me.

Looking forward to some benchmark result.

Thanks,
Qu

> 
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/async-thread.c  | 155 
> +++
>  fs/btrfs/async-thread.h  |  27 +++--
>  fs/btrfs/delayed-inode.c |   7 ++-
>  fs/btrfs/disk-io.c   |  61 ++-
>  fs/btrfs/scrub.c |  11 ++--
>  fs/btrfs/super.c |  32 --
>  6 files changed, 58 insertions(+), 235 deletions(-)
> 
> diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
> index e00c8a9..dd627ad 100644
> --- a/fs/btrfs/async-thread.c
> +++ b/fs/btrfs/async-thread.c
> @@ -29,41 +29,6 @@
>  #define WORK_ORDER_DONE_BIT 1
>  #define WORK_HIGH_PRIO_BIT 2
>  
> -#define NO_THRESHOLD (-1)
> -#define DFT_THRESHOLD (32)
> -
> -struct __btrfs_workqueue {
> - struct workqueue_struct *normal_wq;
> -
> - /* File system this workqueue services */
> - struct btrfs_fs_info *fs_info;
> -
> - /* List head pointing to ordered work list */
> - struct list_head ordered_list;
> -
> - /* Spinlock for ordered_list */
> - spinlock_t list_lock;
> -
> - /* Thresholding related variants */
> - atomic_t pending;
> -
> - /* Up limit of concurrency workers */
> - int limit_active;
> -
> - /* Current number of concurrency workers */
> - int current_active;
> -
> - /* Threshold to change current_active */
> - int thresh;
> - unsigned int count;
> - spinlock_t thres_lock;
> -};
> -
> -struct btrfs_workqueue {
> - struct __btrfs_workqueue *normal;
> - struct __btrfs_workqueue *high;
> -};
> -
>  static void normal_work_helper(struct btrfs_work *work);
>  
>  #define BTRFS_WORK_HELPER(name)  \
> @@ -86,20 +51,6 @@ btrfs_work_owner(const struct btrfs_work *work)
>   return work->wq->fs_info;
>  }
>  
> -bool btrfs_workqueue_normal_congested(const struct btrfs_workqueue *wq)
> -{
> - /*
> -  * We could compare wq->normal->pending with num_online_cpus()
> -  * to support "thresh == NO_THRESHOLD" case, but it requires
> -  * moving up atomic_inc/dec in thresh_queue/exec_hook. Let's
> -  * postpone it until someone needs the support of that case.
> -  */
> - if (wq->normal->thresh == NO_THRESHOLD)
> - return false;
> -
> - return atomic_read(&wq->normal->pending) > wq->normal->thresh * 2;
> -}
> -
>  BTRFS_WORK_HELPER(worker_helper);
>  BTRFS_WORK_HELPER(delalloc_helper);
>  BTRFS_WORK_HELPER(flush_delalloc_helper);
> @@ -125,7 +76,7 @@ BTRFS_WORK_HELPER(scrubparity_helper);
>  
>  static struct __btrfs_workqueue *
>  __btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info, const char *name,
> - unsigned int flags, int limit_active, int thresh)
> + unsigned int flags)
>  {
>   struct __btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_KERNEL);
>  
> @@ -133,31 +84,15 @@ __btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info, 
> const char *name,
>   return NULL;
>  
>   ret->fs_info = fs_info;
> - ret->limit_active = limit_active;
>   atomic_set(&ret->pending, 0);
> - if (thresh == 0)
> -  

Re: [PATCH] btrfs: test if receive with qgroups corrupts metadata

2017-10-30 Thread Qu Wenruo


On 2017年10月31日 06:32, Justin Maggard wrote:
> This test case does some concurrent send/receives with qgroups enabled.
> Currently (4.14-rc7) this usually results in btrfs check errors, and
> often also results in a WARN_ON in record_root_in_trans().
> 
> Bisecting points to 6426c7ad697d (btrfs: qgroup: Fix qgroup accounting
> when creating snapshot) as the culprit.

Thanks for the report, I'll look into it.

BTW can this only be reproduced by concurrent run?
Will single thread also cause the problem?

Thanks,
Qu
> 
> Signed-off-by: Justin Maggard 
> ---
>  tests/btrfs/152 | 102 
> 
>  tests/btrfs/152.out |  13 +++
>  tests/btrfs/group   |   1 +
>  3 files changed, 116 insertions(+)
>  create mode 100755 tests/btrfs/152
>  create mode 100644 tests/btrfs/152.out
> 
> diff --git a/tests/btrfs/152 b/tests/btrfs/152
> new file mode 100755
> index 000..ebb88ed
> --- /dev/null
> +++ b/tests/btrfs/152
> @@ -0,0 +1,102 @@
> +#! /bin/bash
> +# FS QA Test No. btrfs/152
> +#
> +# Test that incremental send/receive operations don't corrupt metadata when
> +# qgroups are enabled.
> +#
> +#---
> +#
> +# Copyright (c) 2017 NETGEAR, Inc.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# real QA test starts here
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +
> +rm -f $seqres.full
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +
> +# Enable quotas
> +$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT
> +
> +# Create 2 source and 4 destination subvolumes
> +for subvol in subvol1 subvol2 recv1_1 recv1_2 recv2_1 recv2_2; do
> + $BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/$subvol | _filter_scratch
> +done
> +mkdir $SCRATCH_MNT/subvol{1,2}/.snapshots
> +touch $SCRATCH_MNT/subvol{1,2}/foo
> +
> +# Create base snapshots and send them
> +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \
> + $SCRATCH_MNT/subvol1/.snapshots/1 | _filter_scratch
> +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \
> + $SCRATCH_MNT/subvol2/.snapshots/1 | _filter_scratch
> +for recv in recv1_1 recv1_2 recv2_1 recv2_2; do
> + $BTRFS_UTIL_PROG send $SCRATCH_MNT/subvol1/.snapshots/1 2> /dev/null | \
> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/${recv} | _filter_scratch
> +done
> +
> +# Now do 10 loops of concurrent incremental send/receives
> +for i in `seq 1 10`; do
> + prev=$i
> + curr=$((i+1))
> +
> + $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \
> + $SCRATCH_MNT/subvol1/.snapshots/${curr} > /dev/null
> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \
> + $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \
> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_1) > /dev/null &
> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \
> + $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \
> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_2) > /dev/null &
> +
> + $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \
> + $SCRATCH_MNT/subvol2/.snapshots/${curr} > /dev/null
> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \
> + $SCRATCH_MNT/subvol2/.snapshots/${curr} 2> /dev/null | \
> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv2_1) > /dev/null &
> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \
> + $SCRATCH_MNT/subvol2/.snapshots/${curr} 2> /dev/null | \
> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv2_2) > /dev/null &
> + wait
> +done
> +
> +_scratch_unmount
> +
> +status=0
> +exit
> diff --git a/tests/btrfs/152.out b/tests/btrfs/152.out
> new file mode 100644
> index 000..a95bb57
> --- /dev/null
> +++ b/tests/btrfs/152.out
> @@ -0,

Re: [PATCH] btrfs: test if receive with qgroups corrupts metadata

2017-10-30 Thread Justin Maggard
On Mon, Oct 30, 2017 at 5:21 PM, Qu Wenruo  wrote:
>
>
> On 2017年10月31日 06:32, Justin Maggard wrote:
>> This test case does some concurrent send/receives with qgroups enabled.
>> Currently (4.14-rc7) this usually results in btrfs check errors, and
>> often also results in a WARN_ON in record_root_in_trans().
>>
>> Bisecting points to 6426c7ad697d (btrfs: qgroup: Fix qgroup accounting
>> when creating snapshot) as the culprit.
>
> Thanks for the report, I'll look into it.
>
> BTW can this only be reproduced by concurrent run?
> Will single thread also cause the problem?
>
> Thanks,
> Qu

I ran 1000 single-threaded passes with no failures, so I'm pretty sure
there must be multiple concurrent receives running to reproduce it.

-Justin

>>
>> Signed-off-by: Justin Maggard 
>> ---
>>  tests/btrfs/152 | 102 
>> 
>>  tests/btrfs/152.out |  13 +++
>>  tests/btrfs/group   |   1 +
>>  3 files changed, 116 insertions(+)
>>  create mode 100755 tests/btrfs/152
>>  create mode 100644 tests/btrfs/152.out
>>
>> diff --git a/tests/btrfs/152 b/tests/btrfs/152
>> new file mode 100755
>> index 000..ebb88ed
>> --- /dev/null
>> +++ b/tests/btrfs/152
>> @@ -0,0 +1,102 @@
>> +#! /bin/bash
>> +# FS QA Test No. btrfs/152
>> +#
>> +# Test that incremental send/receive operations don't corrupt metadata when
>> +# qgroups are enabled.
>> +#
>> +#---
>> +#
>> +# Copyright (c) 2017 NETGEAR, Inc.  All Rights Reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> +#---
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +tmp=/tmp/$$
>> +status=1 # failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> + cd /
>> + rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +
>> +# real QA test starts here
>> +_supported_fs btrfs
>> +_supported_os Linux
>> +_require_scratch
>> +
>> +rm -f $seqres.full
>> +
>> +_scratch_mkfs >>$seqres.full 2>&1
>> +_scratch_mount
>> +
>> +# Enable quotas
>> +$BTRFS_UTIL_PROG quota enable $SCRATCH_MNT
>> +
>> +# Create 2 source and 4 destination subvolumes
>> +for subvol in subvol1 subvol2 recv1_1 recv1_2 recv2_1 recv2_2; do
>> + $BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/$subvol | 
>> _filter_scratch
>> +done
>> +mkdir $SCRATCH_MNT/subvol{1,2}/.snapshots
>> +touch $SCRATCH_MNT/subvol{1,2}/foo
>> +
>> +# Create base snapshots and send them
>> +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \
>> + $SCRATCH_MNT/subvol1/.snapshots/1 | _filter_scratch
>> +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \
>> + $SCRATCH_MNT/subvol2/.snapshots/1 | _filter_scratch
>> +for recv in recv1_1 recv1_2 recv2_1 recv2_2; do
>> + $BTRFS_UTIL_PROG send $SCRATCH_MNT/subvol1/.snapshots/1 2> /dev/null | 
>> \
>> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/${recv} | _filter_scratch
>> +done
>> +
>> +# Now do 10 loops of concurrent incremental send/receives
>> +for i in `seq 1 10`; do
>> + prev=$i
>> + curr=$((i+1))
>> +
>> + $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol1 \
>> + $SCRATCH_MNT/subvol1/.snapshots/${curr} > /dev/null
>> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \
>> + $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \
>> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_1) > /dev/null &
>> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol1/.snapshots/${prev} \
>> + $SCRATCH_MNT/subvol1/.snapshots/${curr} 2> /dev/null | \
>> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv1_2) > /dev/null &
>> +
>> + $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/subvol2 \
>> + $SCRATCH_MNT/subvol2/.snapshots/${curr} > /dev/null
>> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \
>> + $SCRATCH_MNT/subvol2/.snapshots/${curr} 2> /dev/null | \
>> + $BTRFS_UTIL_PROG receive $SCRATCH_MNT/recv2_1) > /dev/null &
>> + ($BTRFS_UTIL_PROG send -p $SCRATCH_MNT/subvol2/.snapshots/${prev} \
>> + $SCRATCH_MNT/subvol2

Re: Problem with file system

2017-10-30 Thread Duncan
Dave posted on Sun, 29 Oct 2017 23:31:57 -0400 as excerpted:

> It's all part of the process of gaining critical experience with BTRFS.
> Whether or not BTRFS is ready for production use is (it seems to me)
> mostly a question of how knowledgeable and experienced are the people
> administering it.
> 
> In the various online discussions on this topic, all the focus is on
> whether or not BTRFS itself is production-ready. At the current maturity
> level of BTRFS, I think that's the wrong focus. The right focus is on
> how production-ready is the admin person or team (with respect to their
> BTRFS knowledge and experience). When a filesystem has been around for
> decades, most of the critical admin issues become fairly common
> knowledge, fairly widely known and easy to find. When a filesystem is
> newer, far fewer people understand the gotchas. Also, in older or widely
> used filesystems, when someone hits a gotcha, the response isn't "that
> filesystem is not ready for production". Instead the response is, "you
> should have known not to do that."

That's a view I hadn't seen before, but it seems reasonable and I like it.

Indeed, there were/are a few reasonably widely known caveats with both 
ext3 and reiserfs, for instance, and certainly some that apply to fat/
vfat/fat32, the three filesystems other than btrfs I know most about, and 
if anything they're past their prime, /not/ "still maturing", as btrfs is 
typically described.  For example, setting either of the two to writeback 
journaling and then losing data results in something akin to "you should 
have known not to do that unless you were prepared for the risk, as it's 
definitely a well known one."

Which of course was about my own reaction when Linus and the other powers 
that be decided to set ext3 to writeback journaling by default for a few 
kernel cycles.  Having lived thru that on reiserfs, I /knew/ where /that/ 
was headed, and sure enough...

Similarly, ext3's performance problems with fsync, because it effectively 
forces a full filesystem sync not just a file sync, are well known, as 
are the risks of storing a reiserfs in a loopback file on reiserfs and 
then trying to run a tree restore on the host, since it's known to mix up 
the two filesystems in that case.

It's thus a reasonable viewpoint to consider some of the btrfs quirks to 
be in the same category.  Of course btrfs being the first COW-based-fs 
most will have had experience with, and the first filesystem most will 
have experienced that handles raid, snapshotting, etc, it's definitely 
rather different and more complex than the filesystems most people are 
familiar with, and thus can only be expected to have rather different and 
more complex caveats than the filesystems most are familiar with, as well.

OTOH, there's definitely some known low-hanging-fruit in terms of ease of 
use, remaining to be implemented, tho I'd argue that we've reached the 
point where general stability is such that it has allowed the focus to 
gradually tilt toward implementing some of this, over the last year or 
so, and we're beginning to see the loose ends tied up in the 
documentation, for instance.  I'd say we are getting close, and your 
viewpoint is a definite argument in support of that.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] btrfs: match btrfs_device->mode same as it used for open

2017-10-30 Thread Anand Jain



On 10/30/2017 10:39 PM, David Sterba wrote:

On Fri, Oct 20, 2017 at 10:07:15PM +0800, Anand Jain wrote:

We aren't setting the FMODE_WRITE when initializing btrfs_device
structure and when calling blkdev_put, however we are setting it
only when calling blkdev_get_by_path().


But this still does not say why this is a problem worth fixing. Nikolay
asked for it, and I would do the same, but why do we even have to ask
for that?


 Here its just a cleanup of miss match of open mode and close modes.
 And there isn't any problem that I noticed.

Thanks, Anand



https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"

2017-10-30 Thread Duncan
Duncan posted on Mon, 30 Oct 2017 04:09:58 + as excerpted:

> Zak Kohler posted on Sun, 29 Oct 2017 21:57:00 -0400 as excerpted:
> 
>> So I ran memtest86+ 5.01 for >4 days:
>> Pass: 39 Errors: 0
> 
> Be aware that memtest86+ will detect some types of errors but not
> others.
> 
> In particular, some years ago I had some memory (DDR1/3-digit-Opteron
> era), actually registered as required by Opterons and ECC, that passed
> that sort of memory test because what the test /tests/ is memory cell
> retention (if you put a value in does it verify on read-back?), that was
> none-the-less bad memory in that at its rated speed it was unreliable at
> memory /transfers/.
> 
> Eventually that mobo got a BIOS update that could adjust memory
> clocking, and I downclocked it a notch[.] At
> the lower clock it was rock stable, even with reduced wait-states to
> make up a bit of the performance I was losing to the lower clock.  But I
> had to get a BIOS that could do it, first [...]

> That's what really amazed me about reiserfs, that it remained stable
> thru not only that hardware problem but various others I've had over the
> years that would have killed or made unworkable other filesystems.

BTW, one of those other hardware problems I had, the one that ultimately 
did in my old server-clase mobo, was leaky capacitors on the then 8-ish 
years old system.  It was of the generation that had problem capacitors, 
and it eventually succumbed...

The reason this is relevant is that it was the storage path that had the 
worst problems.  Before I figured out what the problem actually was I did 
try btrfs, with around kernel 3.6 at the time, and it really /was/ 
unusable due to checksum errors.  But I could still limp along with 
reiserfs...

One thing about the behavior I noticed, however, was that as the problem 
was developing, the system was more usable if I kept it reasonably cool.  
By the time I gave up on it, it was early summer here in Phoenix, and 
temperatures were climbing.  But I was sitting at home with the AC on, in 
a heavy winter jacket, wearing sweats under my pants and trying to type 
with gloves on my hands to keep warm, in ordered to cool down the 
computer so it'd work.  That's when I decided enough was enough and gave 
up on it.  I only found the burst capacitors, however, once I got the new 
mobo and was switching out the old one.  No WONDER it wasn't working 
right any more!

The rest of the system was actually reasonably stable, however.  I guess 
the worst caps were in the storage path.

But as I said, reiserfs was amazing.  I bought a SATA addon board with 
the same chipset as on the old mobo so I could boot the old monolithic 
kernel with those drivers builtin on the new mobo, and I didn't notice 
any corruption or anything.  But as I said, btrfs was entirely unusable 
on the old hardware, due to both checksum errors and large transactions 
(like trying to copy files over from reiserfs) ending up entirely 
reverted when I'd crash, instead of the partial completion I'd get on 
reiserfs, so I could at least reboot and start where it has crashed on 
reiserfs, instead of having to start over entirely, thus making no 
progress at all, which is what I was seeing on btrfs.

That reiserfs continued to work well enough to keep going so long under 
those conditions, while btrfs was entirely unworkable, and even more that 
once I was running good hardware again, I didn't see massive corruption 
on reiserfs as a result of trying to run it on so long on the bad 
hardware, was really /really/ amazing!

But like I said, I don't expect that btrfs, with its checksumming, with 
/ever/ be really workable on that sort of defective hardware.  It was 
just really amazing to me that reiserfs wasn't screwed up by it as well, 
as it had every right to be given the screwed up hardware I was trying to 
run it on.  No filesystem can be expected to go thru that and end up 
still usable, but somehow, reiserfs did.

Bottom line, it could be the storage path, not the memory or cpu.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE

2017-10-30 Thread Misono, Tomohiro
On 2017/10/30 17:32, Qu Wenruo wrote:
> 
> 
> On 2017年10月30日 16:14, Misono, Tomohiro wrote:
>> Currently, the top-level subvolume lacks the UUID. As a result, both
>> non-snapshot subvolume and snapshot of top-level subvolume do not have
>> Parent UUID and cannot be distinguisued. Therefore "fi show" of
>> top-level lists all the subvolumes which lacks the UUID in
>> "Snapshot(s)". Also, it lacks the otime information.
> 
> What about creating a wiki page for ROOT_ITEM and detailed restriction
> for its members?
> 
>>
>> Fix this by adding the UUID and otime at the mkfs time.  As a
>> consequence, snapshots of top-level subvolume now have a Parent UUID and
>> UUID tree will create an entry for top-level subvolume at mount time.
> 
> This patch also inspires me about the btrfs_create_tree() function I'm
> introducing should also populate its UUID and timespec.
> 
>>
>> Signed-off-by: Tomohiro Misono 
>> ---
>>  ctree.h   |  5 +
>>  mkfs/common.c | 14 ++
>>  mkfs/main.c   |  3 +++
>>  3 files changed, 22 insertions(+)
>>
>> diff --git a/ctree.h b/ctree.h
>> index 2280659..5737978 100644
>> --- a/ctree.h
>> +++ b/ctree.h
>> @@ -2071,6 +2071,11 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct 
>> btrfs_root_item,
>>   stransid, 64);
>>  BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item,
>>   rtransid, 64);
>> +static inline void btrfs_set_root_uuid(struct btrfs_root_item *root_item,
>> + u8 uuid[BTRFS_UUID_SIZE])
>> +{
>> +memcpy(root_item->uuid, uuid, BTRFS_UUID_SIZE);
>> +}
> 
> This is the stack version.
> 
> However there is no extent buffer version to set uuid as we just call
> write_extent_buffer(), so it seems unnecessary to me.
> 
>>  
>>  /* struct btrfs_root_backup */
>>  BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup,
>> diff --git a/mkfs/common.c b/mkfs/common.c
>> index c9ce10d..808d343 100644
>> --- a/mkfs/common.c
>> +++ b/mkfs/common.c
>> @@ -44,6 +44,7 @@ static int btrfs_create_tree_root(int fd, struct 
>> btrfs_mkfs_config *cfg,
>>  u32 itemoff;
>>  int ret = 0;
>>  int blk;
>> +u8 uuid[BTRFS_UUID_SIZE];
>>  
>>  memset(buf->data + sizeof(struct btrfs_header), 0,
>>  cfg->nodesize - sizeof(struct btrfs_header));
>> @@ -77,6 +78,19 @@ static int btrfs_create_tree_root(int fd, struct 
>> btrfs_mkfs_config *cfg,
>>  btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff);
>>  btrfs_set_item_size(buf, btrfs_item_nr(nritems),
>>  sizeof(root_item));
>> +if (blk == MKFS_FS_TREE) {
>> +time_t now = time(NULL);
>> +
>> +uuid_generate(uuid);
>> +btrfs_set_root_uuid(&root_item, uuid);
>> +btrfs_set_stack_timespec_sec(&root_item.otime, now);
>> +btrfs_set_stack_timespec_sec(&root_item.ctime, now);
>> +} else {
>> +memset(uuid, 0, BTRFS_UUID_SIZE);
> 
> This leads to the question about the UUID meaning of different trees.
> 
> AFAIK every tree created by kernel has its own UUID, no one uses all zero.
> 
> In kernel btrfs_create_tree(), it always calls uuid_le_gen() to generate
> a new UUID.
> So I'm wonder if such branch is really needed.
> And maybe we fix all tree creation to generate UUID.
> 
> Despite the question about the meaning of UUID for root_item, I think
> the patch really makes sense.
> 
> Thanks,
> Qu
> 
Hello,

Thanks for the whole comments.
I notice UUID (and otime/ctime) of ROOT_ITEM is not used nor updated currently 
expect subvolumes. Therefore I think it is better to keep these fields to zero
to express non-used status. (So, it may be better that uuid/quota tree is not 
hold UUID.)

Regards,
Tomohiro

>> +btrfs_set_root_uuid(&root_item, uuid);
>> +btrfs_set_stack_timespec_sec(&root_item.otime, 0);
>> +btrfs_set_stack_timespec_sec(&root_item.ctime, 0);
>> +}
>>  write_extent_buffer(buf, &root_item,
>>  btrfs_item_ptr_offset(buf, nritems),
>>  sizeof(root_item));
>> diff --git a/mkfs/main.c b/mkfs/main.c
>> index 1b4cabc..4184a2d 100644
>> --- a/mkfs/main.c
>> +++ b/mkfs/main.c
>> @@ -329,6 +329,7 @@ static int create_tree(struct btrfs_trans_handle *trans,
>>  struct btrfs_key location;
>>  struct btrfs_root_item root_item;
>>  struct extent_buffer *tmp;
>> +u8 uuid[BTRFS_UUID_SIZE] = {0};
>>  int ret;
>>  
>>  ret = btrfs_copy_root(trans, root, root->node, &tmp, objectid);
>> @@ -339,6 +340,8 @@ static int create_tree(struct btrfs_trans_handle *trans,
>>  btrfs_set_root_bytenr(&root_item, tmp->start);
>>  btrfs_set_root_level(&root_item, btrfs_header_level(tmp));
>>  btrfs_set_root_generation(&root_item, trans->transid);
>> +/* clear uuid of source tree */
>> +btrfs_set_root_

Re: [PATCH] btrfs-progs: ins: fix arg order in print_inode_item()

2017-10-30 Thread Misono, Tomohiro


On 2017/10/30 17:39, Qu Wenruo wrote:
> 
> 
> On 2017年10月30日 16:10, Misono, Tomohiro wrote:
>> In the print_inode_item(), the argument order of sequence and flags are
>> reversed:
>>
>> printf("... sequence %llu flags 0x%llx(%s)\n",
>>  ... 
>> (unsigned long long)btrfs_inode_flags(eb,ii),
>> (unsigned long long)btrfs_inode_sequence(eb, ii),
>>  ...)
>>
>> So, just fix it.
> 
> Not related to this patch, but considering you're going to enhance
> root_item creation, it's also a good idea to print such info like
> c/o/s/r time and parent/receive UUID.
> 
> More patches will never be a bad thing.

Actually parent/received UUID is printed if it exists but c/o/s/r time is not
(for ROOT_ITEM). So, I will send a patch.

Thanks,
Tomohiro 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE

2017-10-30 Thread Qu Wenruo


On 2017年10月31日 10:55, Misono, Tomohiro wrote:
> On 2017/10/30 17:32, Qu Wenruo wrote:
>>
>>
>> On 2017年10月30日 16:14, Misono, Tomohiro wrote:
>>> Currently, the top-level subvolume lacks the UUID. As a result, both
>>> non-snapshot subvolume and snapshot of top-level subvolume do not have
>>> Parent UUID and cannot be distinguisued. Therefore "fi show" of
>>> top-level lists all the subvolumes which lacks the UUID in
>>> "Snapshot(s)". Also, it lacks the otime information.
>>
>> What about creating a wiki page for ROOT_ITEM and detailed restriction
>> for its members?
>>
>>>
>>> Fix this by adding the UUID and otime at the mkfs time.  As a
>>> consequence, snapshots of top-level subvolume now have a Parent UUID and
>>> UUID tree will create an entry for top-level subvolume at mount time.
>>
>> This patch also inspires me about the btrfs_create_tree() function I'm
>> introducing should also populate its UUID and timespec.
>>
>>>
>>> Signed-off-by: Tomohiro Misono 
>>> ---
>>>  ctree.h   |  5 +
>>>  mkfs/common.c | 14 ++
>>>  mkfs/main.c   |  3 +++
>>>  3 files changed, 22 insertions(+)
>>>
>>> diff --git a/ctree.h b/ctree.h
>>> index 2280659..5737978 100644
>>> --- a/ctree.h
>>> +++ b/ctree.h
>>> @@ -2071,6 +2071,11 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct 
>>> btrfs_root_item,
>>>  stransid, 64);
>>>  BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item,
>>>  rtransid, 64);
>>> +static inline void btrfs_set_root_uuid(struct btrfs_root_item *root_item,
>>> +u8 uuid[BTRFS_UUID_SIZE])
>>> +{
>>> +   memcpy(root_item->uuid, uuid, BTRFS_UUID_SIZE);
>>> +}
>>
>> This is the stack version.
>>
>> However there is no extent buffer version to set uuid as we just call
>> write_extent_buffer(), so it seems unnecessary to me.
>>
>>>  
>>>  /* struct btrfs_root_backup */
>>>  BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup,
>>> diff --git a/mkfs/common.c b/mkfs/common.c
>>> index c9ce10d..808d343 100644
>>> --- a/mkfs/common.c
>>> +++ b/mkfs/common.c
>>> @@ -44,6 +44,7 @@ static int btrfs_create_tree_root(int fd, struct 
>>> btrfs_mkfs_config *cfg,
>>> u32 itemoff;
>>> int ret = 0;
>>> int blk;
>>> +   u8 uuid[BTRFS_UUID_SIZE];
>>>  
>>> memset(buf->data + sizeof(struct btrfs_header), 0,
>>> cfg->nodesize - sizeof(struct btrfs_header));
>>> @@ -77,6 +78,19 @@ static int btrfs_create_tree_root(int fd, struct 
>>> btrfs_mkfs_config *cfg,
>>> btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff);
>>> btrfs_set_item_size(buf, btrfs_item_nr(nritems),
>>> sizeof(root_item));
>>> +   if (blk == MKFS_FS_TREE) {
>>> +   time_t now = time(NULL);
>>> +
>>> +   uuid_generate(uuid);
>>> +   btrfs_set_root_uuid(&root_item, uuid);
>>> +   btrfs_set_stack_timespec_sec(&root_item.otime, now);
>>> +   btrfs_set_stack_timespec_sec(&root_item.ctime, now);
>>> +   } else {
>>> +   memset(uuid, 0, BTRFS_UUID_SIZE);
>>
>> This leads to the question about the UUID meaning of different trees.
>>
>> AFAIK every tree created by kernel has its own UUID, no one uses all zero.
>>
>> In kernel btrfs_create_tree(), it always calls uuid_le_gen() to generate
>> a new UUID.
>> So I'm wonder if such branch is really needed.
>> And maybe we fix all tree creation to generate UUID.
>>
>> Despite the question about the meaning of UUID for root_item, I think
>> the patch really makes sense.
>>
>> Thanks,
>> Qu
>>
> Hello,
> 
> Thanks for the whole comments.
> I notice UUID (and otime/ctime) of ROOT_ITEM is not used nor updated 
> currently 
> expect subvolumes. Therefore I think it is better to keep these fields to zero
> to express non-used status. (So, it may be better that uuid/quota tree is not 
> hold UUID.)

Makes sense.

I'll update kernel code to avoid UUID creation for non-fs trees.
Since they are not in UUID tree so there is no meaning creating uuid for
them.

Thanks,
Qu
> 
> Regards,
> Tomohiro
> 
>>> +   btrfs_set_root_uuid(&root_item, uuid);
>>> +   btrfs_set_stack_timespec_sec(&root_item.otime, 0);
>>> +   btrfs_set_stack_timespec_sec(&root_item.ctime, 0);
>>> +   }
>>> write_extent_buffer(buf, &root_item,
>>> btrfs_item_ptr_offset(buf, nritems),
>>> sizeof(root_item));
>>> diff --git a/mkfs/main.c b/mkfs/main.c
>>> index 1b4cabc..4184a2d 100644
>>> --- a/mkfs/main.c
>>> +++ b/mkfs/main.c
>>> @@ -329,6 +329,7 @@ static int create_tree(struct btrfs_trans_handle *trans,
>>> struct btrfs_key location;
>>> struct btrfs_root_item root_item;
>>> struct extent_buffer *tmp;
>>> +   u8 uuid[BTRFS_UUID_SIZE] = {0};
>>> int ret;
>>>  
>>> ret = btrfs_copy_root(trans, root, root->node, &tmp, objectid);
>>> @@ -339,6 +340

[PATCH] btrfs-progs: print-tree: Enehance uuid item print

2017-10-30 Thread Qu Wenruo
For key type BTRFS_UUID_KEY_SUBVOL or BTRFS_UUID_KEY_RECEIVED_SUBVOL the
key objectid and key offset are just half of the UUID.

However we just print the key as %llu, which is converted from little
endian, not byte order for UUID, nor the traditional 36 bytes human
readable uuid format.

Although true engineer can easily convert it in their brain, but to
make it easier for search, output the result UUID using the 36 chars format.

Cc: Misono Tomohiro 
Signed-off-by: Qu Wenruo 
---
Inspired by UUID related work from Misono.
---
 print-tree.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/print-tree.c b/print-tree.c
index 3c585e31f1fc..687f871db302 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -803,14 +803,25 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key)
}
 }
 
-static void print_uuid_item(struct extent_buffer *l, unsigned long offset,
-   u32 item_size)
+static void print_uuid_item(struct extent_buffer *l, int slot,
+   unsigned long offset, u32 item_size)
 {
+   struct btrfs_key key;
+   char uuid_str[BTRFS_UUID_UNPARSED_SIZE];
+   u8 uuid[BTRFS_UUID_SIZE];
+
+   /* Reassemble the uuid from key.objecitd and key.offset */
+   btrfs_item_key_to_cpu(l, &key, slot);
+   put_unaligned_le64(key.objectid, uuid);
+   put_unaligned_le64(key.offset, uuid + sizeof(u64));
+   uuid_unparse(uuid, uuid_str);
+
if (item_size & (sizeof(u64) - 1)) {
printf("btrfs: uuid item with illegal size %lu!\n",
   (unsigned long)item_size);
return;
}
+   printf("\t\tuuid %s\n", uuid_str);
while (item_size) {
__le64 subvol_id;
 
@@ -1297,7 +1308,7 @@ void btrfs_print_leaf(struct btrfs_root *root, struct 
extent_buffer *eb)
break;
case BTRFS_UUID_KEY_SUBVOL:
case BTRFS_UUID_KEY_RECEIVED_SUBVOL:
-   print_uuid_item(eb, btrfs_item_ptr_offset(eb, i),
+   print_uuid_item(eb, i, btrfs_item_ptr_offset(eb, i),
btrfs_item_size_nr(eb, i));
break;
case BTRFS_STRING_ITEM_KEY: {
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: ins: print uuid only if it exists

2017-10-30 Thread Misono, Tomohiro
UUID of ROOT_ITEM is empty if it is not subvolume (and not created by
kernel). So, just skip it if it is empty just like parent/received UUID.

Signed-off-by: Tomohiro Misono 
---
 print-tree.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/print-tree.c b/print-tree.c
index 8abd760..9d1b862 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -581,8 +581,10 @@ static void print_root_item(struct extent_buffer *leaf, 
int slot)
flags_str);
 
if (root_item.generation == root_item.generation_v2) {
-   uuid_unparse(root_item.uuid, uuid_str);
-   printf("\t\tuuid %s\n", uuid_str);
+   if (!empty_uuid(root_item.uuid)) {
+   uuid_unparse(root_item.uuid, uuid_str);
+   printf("\t\tuuid %s\n", uuid_str);
+   }
if (!empty_uuid(root_item.parent_uuid)) {
uuid_unparse(root_item.parent_uuid, uuid_str);
printf("\t\tparent_uuid %s\n", uuid_str);
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: ins: print c/o/s/r time of ROOT_ITEM

2017-10-30 Thread Misono, Tomohiro
Currently c/o/s/r time of ROOT_ITEM is not printed in print_root_item().
Fix this and print them if the values are not zero. print_timespec()
is moved forward to reuse.

Signed-off-by: Tomohiro Misono 
---
 ctree.h  | 32 
 print-tree.c | 52 
 2 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/ctree.h b/ctree.h
index 2280659..54a85fd 100644
--- a/ctree.h
+++ b/ctree.h
@@ -2072,6 +2072,38 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct 
btrfs_root_item,
 BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item,
 rtransid, 64);
 
+static inline struct btrfs_timespec *
+btrfs_root_ctime(struct btrfs_root_item *root_item)
+{
+   unsigned long ptr = (unsigned long)root_item;
+   ptr += offsetof(struct btrfs_root_item, ctime);
+   return (struct btrfs_timespec *)ptr;
+}
+
+static inline struct btrfs_timespec *
+btrfs_root_otime(struct btrfs_root_item *root_item)
+{
+   unsigned long ptr = (unsigned long)root_item;
+   ptr += offsetof(struct btrfs_root_item, otime);
+   return (struct btrfs_timespec *)ptr;
+}
+
+static inline struct btrfs_timespec *
+btrfs_root_stime(struct btrfs_root_item *root_item)
+{
+   unsigned long ptr = (unsigned long)root_item;
+   ptr += offsetof(struct btrfs_root_item, stime);
+   return (struct btrfs_timespec *)ptr;
+}
+
+static inline struct btrfs_timespec *
+btrfs_root_rtime(struct btrfs_root_item *root_item)
+{
+   unsigned long ptr = (unsigned long)root_item;
+   ptr += offsetof(struct btrfs_root_item, rtime);
+   return (struct btrfs_timespec *)ptr;
+}
+
 /* struct btrfs_root_backup */
 BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup,
   tree_root, 64);
diff --git a/print-tree.c b/print-tree.c
index 9d1b862..62f3163 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -551,6 +551,26 @@ static void root_flags_to_str(u64 flags, char *ret)
strcat(ret, "none");
 }
 
+static void print_timespec(struct extent_buffer *eb,
+   struct btrfs_timespec *timespec, const char *prefix,
+   const char *suffix)
+{
+   struct tm tm;
+   u64 tmp_u64;
+   u32 tmp_u32;
+   time_t tmp_time;
+   char timestamp[256];
+
+   tmp_u64 = btrfs_timespec_sec(eb, timespec);
+   tmp_u32 = btrfs_timespec_nsec(eb, timespec);
+   tmp_time = tmp_u64;
+   localtime_r(&tmp_time, &tm);
+   strftime(timestamp, sizeof(timestamp),
+   "%Y-%m-%d %H:%M:%S", &tm);
+   printf("%s%llu.%u (%s)%s", prefix, (unsigned long long)tmp_u64, tmp_u32,
+   timestamp, suffix);
+}
+
 static void print_root_item(struct extent_buffer *leaf, int slot)
 {
struct btrfs_root_item *ri;
@@ -600,6 +620,18 @@ static void print_root_item(struct extent_buffer *leaf, 
int slot)
btrfs_root_stransid(&root_item),
btrfs_root_rtransid(&root_item));
}
+   if (btrfs_timespec_sec(leaf, btrfs_root_ctime(ri)))
+   print_timespec(leaf, btrfs_root_ctime(ri),
+   "\t\tctime ", "\n");
+   if (btrfs_timespec_sec(leaf, btrfs_root_otime(ri)))
+   print_timespec(leaf, btrfs_root_otime(ri),
+   "\t\totime ", "\n");
+   if (btrfs_timespec_sec(leaf, btrfs_root_stime(ri)))
+   print_timespec(leaf, btrfs_root_stime(ri),
+   "\t\tstime ", "\n");
+   if (btrfs_timespec_sec(leaf, btrfs_root_rtime(ri)))
+   print_timespec(leaf, btrfs_root_rtime(ri),
+   "\t\trtime ", "\n");
}
 
btrfs_disk_key_to_cpu(&drop_key, &root_item.drop_progress);
@@ -858,26 +890,6 @@ static void inode_flags_to_str(u64 flags, char *ret)
strcat(ret, "none");
 }
 
-static void print_timespec(struct extent_buffer *eb,
-   struct btrfs_timespec *timespec, const char *prefix,
-   const char *suffix)
-{
-   struct tm tm;
-   u64 tmp_u64;
-   u32 tmp_u32;
-   time_t tmp_time;
-   char timestamp[256];
-
-   tmp_u64 = btrfs_timespec_sec(eb, timespec);
-   tmp_u32 = btrfs_timespec_nsec(eb, timespec);
-   tmp_time = tmp_u64;
-   localtime_r(&tmp_time, &tm);
-   strftime(timestamp, sizeof(timestamp),
-   "%Y-%m-%d %H:%M:%S", &tm);
-   printf("%s%llu.%u (%s)%s", prefix, (unsigned long long)tmp_u64, tmp_u32,
-   timestamp, suffix);
-}
-
 static void print_inode_item(struct extent_buffer *eb,
struct btrfs_inode_item *ii)
 {
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  h

Re: [PATCH] btrfs-progs: print-tree: Enehance uuid item print

2017-10-30 Thread Misono, Tomohiro


On 2017/10/31 13:03, Qu Wenruo wrote:
> For key type BTRFS_UUID_KEY_SUBVOL or BTRFS_UUID_KEY_RECEIVED_SUBVOL the
> key objectid and key offset are just half of the UUID.
> 
> However we just print the key as %llu, which is converted from little
> endian, not byte order for UUID, nor the traditional 36 bytes human
> readable uuid format.
> 
> Although true engineer can easily convert it in their brain, but to
> make it easier for search, output the result UUID using the 36 chars format.
> 
> Cc: Misono Tomohiro 
> Signed-off-by: Qu Wenruo 
> ---
> Inspired by UUID related work from Misono.
> ---
>  print-tree.c | 17 ++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/print-tree.c b/print-tree.c
> index 3c585e31f1fc..687f871db302 100644
> --- a/print-tree.c
> +++ b/print-tree.c
> @@ -803,14 +803,25 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key)
>   }
>  }
>  
> -static void print_uuid_item(struct extent_buffer *l, unsigned long offset,
> - u32 item_size)
> +static void print_uuid_item(struct extent_buffer *l, int slot,
> + unsigned long offset, u32 item_size)
>  {
> + struct btrfs_key key;
> + char uuid_str[BTRFS_UUID_UNPARSED_SIZE];
> + u8 uuid[BTRFS_UUID_SIZE];
> +
> + /* Reassemble the uuid from key.objecitd and key.offset */
> + btrfs_item_key_to_cpu(l, &key, slot);
> + put_unaligned_le64(key.objectid, uuid);
> + put_unaligned_le64(key.offset, uuid + sizeof(u64));
> + uuid_unparse(uuid, uuid_str);
> +
>   if (item_size & (sizeof(u64) - 1)) {
>   printf("btrfs: uuid item with illegal size %lu!\n",
>  (unsigned long)item_size);
>   return;
>   }
> + printf("\t\tuuid %s\n", uuid_str);
>   while (item_size) {
>   __le64 subvol_id;
>  
> @@ -1297,7 +1308,7 @@ void btrfs_print_leaf(struct btrfs_root *root, struct 
> extent_buffer *eb)
>   break;
>   case BTRFS_UUID_KEY_SUBVOL:
>   case BTRFS_UUID_KEY_RECEIVED_SUBVOL:
> - print_uuid_item(eb, btrfs_item_ptr_offset(eb, i),
> + print_uuid_item(eb, i, btrfs_item_ptr_offset(eb, i),
>   btrfs_item_size_nr(eb, i));
>   break;
>   case BTRFS_STRING_ITEM_KEY: {
> 

Reviewed-by: Tomohiro Misono 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: ins: print uuid only if it exists

2017-10-30 Thread Qu Wenruo


On 2017年10月31日 13:39, Misono, Tomohiro wrote:
> UUID of ROOT_ITEM is empty if it is not subvolume (and not created by
> kernel). So, just skip it if it is empty just like parent/received UUID.
> 
> Signed-off-by: Tomohiro Misono 

Reviewed-by: Qu Wenruo 

Thanks,
Qu
> ---
>  print-tree.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/print-tree.c b/print-tree.c
> index 8abd760..9d1b862 100644
> --- a/print-tree.c
> +++ b/print-tree.c
> @@ -581,8 +581,10 @@ static void print_root_item(struct extent_buffer *leaf, 
> int slot)
>   flags_str);
>  
>   if (root_item.generation == root_item.generation_v2) {
> - uuid_unparse(root_item.uuid, uuid_str);
> - printf("\t\tuuid %s\n", uuid_str);
> + if (!empty_uuid(root_item.uuid)) {
> + uuid_unparse(root_item.uuid, uuid_str);
> + printf("\t\tuuid %s\n", uuid_str);
> + }
>   if (!empty_uuid(root_item.parent_uuid)) {
>   uuid_unparse(root_item.parent_uuid, uuid_str);
>   printf("\t\tparent_uuid %s\n", uuid_str);
> 



signature.asc
Description: OpenPGP digital signature


Re: Problem with file system

2017-10-30 Thread Marat Khalili

On 31/10/17 00:37, Chris Murphy wrote:

But off hand it sounds like hardware was sabotaging the expected write
ordering. How to test a given hardware setup for that, I think, is
really overdue. It affects literally every file system, and Linux
storage technology.

It kinda sounds like to me something other than supers is being
overwritten too soon, and that's why it's possible for none of the
backup roots to find a valid root tree, because all four possible root
trees either haven't actually been written yet (still) or they've been
overwritten, even though the super is updated. But again, it's
speculation, we don't actually know why your system was no longer
mountable.
Just a detached view: I know hardware should respect ordering/barriers 
and such, but how hard is it really to avoid overwriting at least one 
complete metadata tree for half an hour (even better, yet another one 
for a day)? Just metadata, not data extents.


--

With Best Regards,
Marat Khalili
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: ins: print c/o/s/r time of ROOT_ITEM

2017-10-30 Thread Qu Wenruo


On 2017年10月31日 13:40, Misono, Tomohiro wrote:
> Currently c/o/s/r time of ROOT_ITEM is not printed in print_root_item().
> Fix this and print them if the values are not zero. print_timespec()
> is moved forward to reuse.
> 
> Signed-off-by: Tomohiro Misono 

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
>  ctree.h  | 32 
>  print-tree.c | 52 
>  2 files changed, 64 insertions(+), 20 deletions(-)
> 
> diff --git a/ctree.h b/ctree.h
> index 2280659..54a85fd 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -2072,6 +2072,38 @@ BTRFS_SETGET_STACK_FUNCS(root_stransid, struct 
> btrfs_root_item,
>  BTRFS_SETGET_STACK_FUNCS(root_rtransid, struct btrfs_root_item,
>rtransid, 64);
>  
> +static inline struct btrfs_timespec *
> +btrfs_root_ctime(struct btrfs_root_item *root_item)
> +{
> + unsigned long ptr = (unsigned long)root_item;
> + ptr += offsetof(struct btrfs_root_item, ctime);
> + return (struct btrfs_timespec *)ptr;
> +}
> +
> +static inline struct btrfs_timespec *
> +btrfs_root_otime(struct btrfs_root_item *root_item)
> +{
> + unsigned long ptr = (unsigned long)root_item;
> + ptr += offsetof(struct btrfs_root_item, otime);
> + return (struct btrfs_timespec *)ptr;
> +}
> +
> +static inline struct btrfs_timespec *
> +btrfs_root_stime(struct btrfs_root_item *root_item)
> +{
> + unsigned long ptr = (unsigned long)root_item;
> + ptr += offsetof(struct btrfs_root_item, stime);
> + return (struct btrfs_timespec *)ptr;
> +}
> +
> +static inline struct btrfs_timespec *
> +btrfs_root_rtime(struct btrfs_root_item *root_item)
> +{
> + unsigned long ptr = (unsigned long)root_item;
> + ptr += offsetof(struct btrfs_root_item, rtime);
> + return (struct btrfs_timespec *)ptr;
> +}
> +
>  /* struct btrfs_root_backup */
>  BTRFS_SETGET_STACK_FUNCS(backup_tree_root, struct btrfs_root_backup,
>  tree_root, 64);
> diff --git a/print-tree.c b/print-tree.c
> index 9d1b862..62f3163 100644
> --- a/print-tree.c
> +++ b/print-tree.c
> @@ -551,6 +551,26 @@ static void root_flags_to_str(u64 flags, char *ret)
>   strcat(ret, "none");
>  }
>  
> +static void print_timespec(struct extent_buffer *eb,
> + struct btrfs_timespec *timespec, const char *prefix,
> + const char *suffix)
> +{
> + struct tm tm;
> + u64 tmp_u64;
> + u32 tmp_u32;
> + time_t tmp_time;
> + char timestamp[256];
> +
> + tmp_u64 = btrfs_timespec_sec(eb, timespec);
> + tmp_u32 = btrfs_timespec_nsec(eb, timespec);
> + tmp_time = tmp_u64;
> + localtime_r(&tmp_time, &tm);
> + strftime(timestamp, sizeof(timestamp),
> + "%Y-%m-%d %H:%M:%S", &tm);
> + printf("%s%llu.%u (%s)%s", prefix, (unsigned long long)tmp_u64, tmp_u32,
> + timestamp, suffix);
> +}
> +
>  static void print_root_item(struct extent_buffer *leaf, int slot)
>  {
>   struct btrfs_root_item *ri;
> @@ -600,6 +620,18 @@ static void print_root_item(struct extent_buffer *leaf, 
> int slot)
>   btrfs_root_stransid(&root_item),
>   btrfs_root_rtransid(&root_item));
>   }
> + if (btrfs_timespec_sec(leaf, btrfs_root_ctime(ri)))
> + print_timespec(leaf, btrfs_root_ctime(ri),
> + "\t\tctime ", "\n");
> + if (btrfs_timespec_sec(leaf, btrfs_root_otime(ri)))
> + print_timespec(leaf, btrfs_root_otime(ri),
> + "\t\totime ", "\n");
> + if (btrfs_timespec_sec(leaf, btrfs_root_stime(ri)))
> + print_timespec(leaf, btrfs_root_stime(ri),
> + "\t\tstime ", "\n");
> + if (btrfs_timespec_sec(leaf, btrfs_root_rtime(ri)))
> + print_timespec(leaf, btrfs_root_rtime(ri),
> + "\t\trtime ", "\n");
>   }
>  
>   btrfs_disk_key_to_cpu(&drop_key, &root_item.drop_progress);
> @@ -858,26 +890,6 @@ static void inode_flags_to_str(u64 flags, char *ret)
>   strcat(ret, "none");
>  }
>  
> -static void print_timespec(struct extent_buffer *eb,
> - struct btrfs_timespec *timespec, const char *prefix,
> - const char *suffix)
> -{
> - struct tm tm;
> - u64 tmp_u64;
> - u32 tmp_u32;
> - time_t tmp_time;
> - char timestamp[256];
> -
> - tmp_u64 = btrfs_timespec_sec(eb, timespec);
> - tmp_u32 = btrfs_timespec_nsec(eb, timespec);
> - tmp_time = tmp_u64;
> - localtime_r(&tmp_time, &tm);
> - strftime(timestamp, sizeof(timestamp),
> - "%Y-%m-%d %H:%M:%S", &tm);
> - printf("%s%llu.%u (%s)%s", prefix, (unsigned long long)tmp_u64, tmp_u32,
> - timestamp, suffix);
> -}
> -
>  static void print_inode_item(struct extent_buffer

[PATCH 1/2] btrfs-progs: test/fsck: Introduce test images containing tree reloc tree

2017-10-30 Thread Qu Wenruo
Tree reloc tree is a special tree with very short life spawn.
It acts as a special snapshot for any tree, with related nodes/leaves or
EXTENT_DATA modified to point to new position.

Considering the short life spawn and its specialness, it should be quite
reasonable to keep them as both corner case for fsck and educational
dump for anyone interested in relocation.

Signed-off-by: Qu Wenruo 
---
 tests/fsck-tests/027-tree-reloc-tree/test.sh   |  22 +
 .../tree_reloc_for_data_reloc.img.xz   | Bin 0 -> 2112 bytes
 .../tree_reloc_for_fs_tree.img.xz  | Bin 0 -> 2424 bytes
 3 files changed, 22 insertions(+)
 create mode 100755 tests/fsck-tests/027-tree-reloc-tree/test.sh
 create mode 100644 
tests/fsck-tests/027-tree-reloc-tree/tree_reloc_for_data_reloc.img.xz
 create mode 100644 
tests/fsck-tests/027-tree-reloc-tree/tree_reloc_for_fs_tree.img.xz

diff --git a/tests/fsck-tests/027-tree-reloc-tree/test.sh 
b/tests/fsck-tests/027-tree-reloc-tree/test.sh
new file mode 100755
index ..559c4bc700bd
--- /dev/null
+++ b/tests/fsck-tests/027-tree-reloc-tree/test.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+# Make sure btrfs check won't report any false alert for valid image with
+# tree reloc tree.
+#
+# Also due to the short life spawn of tree reloc tree, save them as dump
+# example for later usage.
+
+source "$TOP/tests/common"
+
+check_prereq btrfs
+
+for img in *.img.xz
+do
+   image=$(extract_image "$img")
+   run_check_stdout "$TOP/btrfs" check "$image" 2>&1 |
+   grep -q "Errors found in extent allocation tree or chunk 
allocation"
+   if [ $? -eq 0 ]; then
+   rm -f "$image"
+   _fail "unexpected error occurred when checking $img"
+   fi
+   rm -f "$image"
+done
diff --git 
a/tests/fsck-tests/027-tree-reloc-tree/tree_reloc_for_data_reloc.img.xz 
b/tests/fsck-tests/027-tree-reloc-tree/tree_reloc_for_data_reloc.img.xz
new file mode 100644
index 
..66d8bde6b7a6c451eb7b3b6f4dbc6a919144d1f5
GIT binary patch
literal 2112
zcmV-G2*3CJH+ooF000E$*0e?f03iV!G&sfah5B~@LT>wRyj;C3^v%$$4d1r37
zhA1@4w_z}05@xu}qhN$h#=9A)MD}#c5t`w0DAA1ZzCzD>FO393JL%KESo(OESm)Se
zsSibx%8nF~Qojo*ma_q0pk(2sJJzy~$;uMaE)o>1vKb8VqnTOf1B
z^DYElxsTLnJ-2@t0RIq$YA_Uc6P|OH7-DzZ=zRt)R^=~+JYg-ai)BxPu7rrjpGu}4
zTDD{+Z`Q;`k`tk38Et8X+8yqB?oe!Q@AR$b8-XBMO#7JxvctkyJ(M;~Gz4}+g?mDV
z5?|k$9|heMleXlkp)$TnyH9!{0TkT(
z<3H)4UM-$f5Tee_H!mO7a-_~8vi;P$UX}tes_Bu@etL&IQr6A3TqKF@bR%~(m_gR*
z;SPQOrplP(i>JJFUI!pHly?zbn^-#r;4SwnYLXE5@<)o@S(>?UYxE{{Q(X0T)I1cN
zoeplPtCGGm{g~C#IN>}DHXZ;41F0?`xeMn(HAEq*MCej4ZJ*B;*9CWi_yWjE9Pmm@
z>>2=5eU&|=tn2;xulKMF*H3_C`i7boa^LDsL58^C>$y$L7bU-FeR$wAt-F840_ZB4
zHrm8V)k~c|Tsxr1XtzPiQ|*GP{m-b_<8SDCmw$T;aH5}ak-%NYza0~fG?DPpTaQ0n5Rn9DlO`{Mbfe#s?9tT;Dgv=&}409
zRvxmwB)Sx;;#~%L4-C0#eE@-pAH{i8F|Y%{2QX5@S!_)<5rG}7I*He}n!~ENhw=95
zq1CwQV`AuuwJYht!Mxu@hZMrDQxdy%<;%W8s1f&K*#wl;jZD`o1X5F!h2`^(f2W|Q
z?LD5%!%87{>Z)*MC;IEHbkP{)iYN;|P82N?90+ym%1E)#e7<0LKPG-fyzAh07L}^}
zTf(RS?7X#7p^TuABH5^?y?uUjVju&vXxT70sb4g@jZP*kK8)=TW=><1zL&@O_M@KM
zh1rgiXqFmg5EKcxSuEsN`R{k*k=GrMX>+>#*a_)&;dxC;3uHTJ?bCA{Vj_!PRgo2*
z)WFtif^0P)$HJlGj6Q;$loId_4OqDkat-|iEfu#@qfGmi
zS%f-Xe7(R!Uxi~N>Ua1T$*LWsy<@bK^oFctct|$u++dxdLRY@FB{QS#Jp@N@apFKa
zub7?`EBxgYwjut$@wen7`xa@TmNMUG#Uk3~KUZ_yWaOXZs-eH$kloTnvgET)C`woz
zjYI~+Xh4f(S2h@|itNP+y!R#%PbIyY(0^jrqV%JV`*$%1Ge~NDce(aR8iXt-yrXU0
z=@xsVbM&{F70Lo)kqEM}iG8D1%{QGGlQ45PYScBNbzW{G!K6HQz&9u4p^*gb)aG2r
ztG2;H8>8RVSdOfp7f~q8E>k+Kwn?g$+F8$z;4cEZvxVITJWTED!+R`6(mA+0B-_qC
zJsIHU!-+%XM;F}&i)W4=s9Y1~59S{;tHjV3#@pB@p)Y|_b18Xt@V7d$RvU_?VO4>z
zG3=K58gsz@I34qyZ3`&xa`yx?ev>Q9h#I}(btMPT
zq#bp-XdqDmp4)XKL&ck+`(-G232$HYERD9*cC8F@K=W>P64b1}-en()U$vc98j?69n
zVupjhY@t%^!(-SdbQ{D5+ezg;(m*>@H?`%>N$Qt8t^FamLhbH|e6u!1A2o}ie`#F_
zujRnW7#+)7e1Q3B+W&}hLVHMZ5iyc_4ZQX_bo>HSYux!aa$l|@xkv>#xedMW
zh?74dkLXz7Tk0EWJGP6}1@wMwRyj;C3^v%$$4d1wdA
zhA1@4FQtHC5@xu}qhN$h#=9A)MD}kZJDsuMAcn@)`6EaApWd{S>8%BnWfF=649R^9
zhSZ!gBqxWUwRV^Y$&R=6HvdFxjTEunmCftkc^4dvZx8UWdfa;I$XeqGd=1wHkb}6SL5fgModV(X}^PDMLC`~!Xt2yMwUQ})L|d-^i&C7B3IJ5
z*~l)NLguNp5_68vyjw>C8TZ$&70X186PlPn3FTje$fYuAl%pIkC`C!YRFZN;2uv*L
z*g&Ef)!<(YoZXwp2!7ezH0wp>Mzz$h+MSK+WsjcmF^jE`o}Gwn6qMmzjIkZ3#Qs>E
zW0GvL>i-bM81TB}z+v33!pPDo&+)!M1uMq+M$^MLZ6XzFT$6AJN|+ubXEsnWY(k`0^lA~#W4uk!fV|0qUvORLZF`Y5EOc;+H3sB
z2G)yQC&6KJDRmR#=b%~pLcl%cPaVeZBJNq!@@ly`BJ`yDnr;PbDni!&deCFBaWka~
z*$<;$Uw69e{DclkU%sqb`i@-&aPf-f>?g-&4UXsiWts3;h<=6Gvtu)k#YrRvx}r$0
z=SV&8RZ2tI&`xibB-yU+`XVnQ;odaS&Vln1UsHYI2$wg<{?8e(^StYAysd%!Dy@Y6
zPWgsIQ=2af@$$W`BnH2FY1MupDVNr**<*8wjf>|wJh|7_89E)A!9PBnM?VLvXVpc7
zy_Fv9tWoHez{yXP>{LFe%}7a+Z!xmMBS7D{&7(#$?hhzKW*fCy`uoILxVzSTC)X)n
z^(On{b0D6-_*3PYmt68B$d<=EUYwHI8vg-jG#e0?@?Pb8?=gNarwf-DcP%Ybr3?r8
zmpz46Mf1G>cRCY}&x3&z@~zetgwQeb8r9~u7BsuxI9X9Wf+>~vABg%sKRz(<$pkA>
z9Mxe~ytdJ9??4hprnrF2Qh6e)9xPSEbbetOX(el_5JNgbci9TdKv=;^x^eW;DDeaA
zxC|6|7+99Wo-2`cc_acVEFneoH@Y6p>I|zIm|rK+4v&MM)e%f9MV!&iTy<=b@7pE@
zit>o?pO#+^Oq3e%i;WJc=kn3n?vygauKq`SgouE&RBi+iW1}tY`S8`8fQIapSGz~ddESlv
z!iixF>e|f!L4k0ph&rTN@{hUT2cC?8%5P

[PATCH 2/2] btrfs-progs: print-tree: Print offset as tree objectid

2017-10-30 Thread Qu Wenruo
For case like tree reloc trees and subvolume trees, their key offset is
the tree id.

For case like tree treloc tree for data reloc tree,
The key will be outputted as:
(TREE_RELOC ROOT_ITEM 18446744073709551607)

The minus number is long and even guys with real engineer brains can't
easily get the meaning.

This patch will change the output format to:
(TREE_RELOC ROOT_ITEM DATA_RELOC_TREE)

While for special offset value like 0 or (u64)-1, it's still shown as
is.

Signed-off-by: Qu Wenruo 
---
 print-tree.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/print-tree.c b/print-tree.c
index 687f871db302..326fafcea763 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -794,6 +794,18 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key)
case BTRFS_UUID_KEY_RECEIVED_SUBVOL:
printf(" 0x%016llx)", (unsigned long long)offset);
break;
+
+   /*
+* For those key offset, they are points to tree id, print them
+* in human readable format other than tree id.
+* Especially useful for trees like data/tree reloc tree, whose
+* tree id can be minus.
+*/
+   case BTRFS_ROOT_ITEM_KEY:
+   printf(" ");
+   print_objectid(stdout, offset, type);
+   printf(")");
+   break;
default:
if (offset == (u64)-1)
printf(" -1)");
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Don't generate UUID for non-fs tree

2017-10-30 Thread Qu Wenruo
btrfs_create_tree() will unconditionally generate UUID for any root.
So for quota tree and data reloc tree created by kernel, they will have
unique UUIDs.

However UUID in root item is only referred by UUID tree, which only
records UUID for fs trees.
This makes unique UUIDs for quota/data reloc tree meaningless.

Leave the UUID as zero for non-fs tree, making btrfs-debug-tree output
less confusing.

Reported-by: Misono Tomohiro 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dfdab849037b..d85e04a675fe 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1403,7 +1403,7 @@ struct btrfs_root *btrfs_create_tree(struct 
btrfs_trans_handle *trans,
struct btrfs_root *root;
struct btrfs_key key;
int ret = 0;
-   uuid_le uuid;
+   uuid_le uuid = { 0 };
 
root = btrfs_alloc_root(fs_info, GFP_KERNEL);
if (!root)
@@ -1444,7 +1444,8 @@ struct btrfs_root *btrfs_create_tree(struct 
btrfs_trans_handle *trans,
btrfs_set_root_used(&root->root_item, leaf->len);
btrfs_set_root_last_snapshot(&root->root_item, 0);
btrfs_set_root_dirid(&root->root_item, 0);
-   uuid_le_gen(&uuid);
+   if (is_fstree(objectid))
+   uuid_le_gen(&uuid);
memcpy(root->root_item.uuid, uuid.b, BTRFS_UUID_SIZE);
root->root_item.drop_level = 0;
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html