Re: [PATCH 08/16] btrfs: add sanity check when resuming balance after mount

2018-04-09 Thread Anand Jain



On 04/04/2018 02:34 AM, David Sterba wrote:

Replace a WARN_ON with a proper check and message in case something goes
really wrong and resumed balance cannot set up its exclusive status.
The check is a user friendly assertion, I don't expect to ever happen
under normal circumstances.

Also document that the paused balance starts here and owns the exclusive
op status.

Signed-off-by: David Sterba 
---
  fs/btrfs/volumes.c | 16 ++--
  1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index eb78c1d0ce2b..843982a2cbdb 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3982,6 +3982,20 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
struct btrfs_key key;
int ret;
  
+	/*

+* This should never happen, as the paused balance state is recovered
+* during mount without any chance of other exclusive ops to collide.
+* Let it fail early and do not continue mount.
+*
+* Otherwise, this gives the exclusive op status to balance and keeps
+* in paused state until user intervention (cancel or umount).
+*/
+   if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
+   btrfs_err(fs_info,
+   "cannot set exclusive op status to resume balance");
+   return -EINVAL;
+   }
+



 Reviewed-by: Anand Jain 

Thanks, Anand



path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
@@ -4018,8 +4032,6 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
btrfs_balance_sys(leaf, item, &disk_bargs);
btrfs_disk_balance_args_to_cpu(&bctl->sys, &disk_bargs);
  
-	WARN_ON(test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags));

-
mutex_lock(&fs_info->volume_mutex);
mutex_lock(&fs_info->balance_mutex);
  


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/16] btrfs: cleanup helpers that reset balance state

2018-04-09 Thread Anand Jain



On 04/04/2018 02:34 AM, David Sterba wrote:

The function __cancel_balance name is confusing with the cancel
operation of balance and it really resets the state of balance back to
zero. The unset_balance_control helper is called only from one place and
simple enough to be inlined.

Signed-off-by: David Sterba 


Reviewed-by: Anand Jain 

A nitpick below.


---
  fs/btrfs/ioctl.c   |  8 
  fs/btrfs/volumes.c | 27 ---
  2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 582bde5b7eda..2f172122b63d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4653,10 +4653,10 @@ static long btrfs_ioctl_balance(struct file *file, void 
__user *arg)
  
  do_balance:

/*
-* Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP
-* goes to to btrfs_balance.  bctl is freed in __cancel_balance,
-* or, if restriper was paused all the way until unmount, in
-* free_fs_info.  The flag should be cleared after __cancel_balance.




+* Ownership of bctl and filesystem flag BTRFS_FS_EXCL_OP goes to to


 delete one 'to'.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: dump-super: Refactor print function and add extra check

2018-04-09 Thread Qu Wenruo
When manually patching super blocks, current validation check is pretty
weak (limited to magic number and csum) and doesn't provide extra check
for some obvious corruption (like invalid sectorsize).

This patch will enhance dump-super by:

1) Refactor print function and add checker helper macro
   Instead of manually call printf() and takes 2 lines to just print one
   member, introduce several helper macros to print/check value.
   Most of the callers just need 1 line.
   (Although the macro and helper itself offsets the saved lines)

2) Add extra check for some members
   The following members will be checked, and if invalid the suffix
   " [INVALID]" will be pended:
bytenr: alignment check
sys_array_size: maximum check
root:   alignment check
root_level: maximum check
chunk_root: alignment check
chunk_root_level:   maximum check
chunk_root_generation:  maximum check against root generation
log_root:   alignment check
log_root_level: maximum check
log_root_transid:   maximum check against root generation + 1
bytes_used: alignment check
sectorsize: power of 2 and 4K~64K range check
nodesize:   the same as sectorsize, plus minimum check
stripesize: the same as kernel check
cache_generation:   maximum check against root generation
uuid_tree_generation:   maximum check against root generation

3) output sequence change
   Put bytenr/level/generation together for the same root (tree root,
   chunk root, and log root).

4) Minor output change
   To make dev_item macro easier, the following output is changed:
dev_item.devid  ->  dev_item.id
dev_item.dev_group  ->  dev_item.group

Signed-off-by: Qu Wenruo 
---
 cmds-inspect-dump-super.c | 168 ++
 1 file changed, 99 insertions(+), 69 deletions(-)

diff --git a/cmds-inspect-dump-super.c b/cmds-inspect-dump-super.c
index 24588c87cce6..8000d2ace663 100644
--- a/cmds-inspect-dump-super.c
+++ b/cmds-inspect-dump-super.c
@@ -312,11 +312,42 @@ static void print_readable_super_flag(u64 flag)
 super_flags_num, BTRFS_SUPER_FLAG_SUPP);
 }
 
+/* Helper to print member in %llu */
+#define print_super(sb, member, spacer)
\
+   printf("%s%s%llu\n", #member, spacer, (u64) btrfs_super_##member(sb));
+
+/* Helper to print member in %llx */
+#define print_super_hex(sb, member, spacer)\
+   printf("%s%s%llx\n", #member, spacer, (u64) btrfs_super_##member(sb));
+
+/*
+ * Helper macro to wrap member checker
+ *
+ * Only support %llu output for member.
+ */
+#define print_check_super(sb, member, spacer, bad_condition)   \
+({ \
+   u64 value;  \
+   \
+   value = btrfs_super_##member(sb);   \
+   printf("%s%s%llu", #member, spacer, (u64) value);   \
+   if (bad_condition)  \
+   printf(" [INVALID]");   \
+   printf("\n");   \
+})
+
+/* Helper to print sb->dev_item members */
+#define print_super_dev(sb, member, spacer)\
+   printf("dev_item.%s%s%llu\n", #member, spacer,  \
+  (u64) btrfs_stack_device_##member(&sb->dev_item));
+
 static void dump_superblock(struct btrfs_super_block *sb, int full)
 {
int i;
char *s, buf[BTRFS_UUID_UNPARSED_SIZE];
+   int max_level = BTRFS_MAX_LEVEL; /* to save several chars */
u8 *p;
+   u64 super_gen;
u32 csum_size;
u16 csum_type;
 
@@ -348,10 +379,16 @@ static void dump_superblock(struct btrfs_super_block *sb, 
int full)
printf(" [DON'T MATCH]");
putchar('\n');
 
-   printf("bytenr\t\t\t%llu\n",
-   (unsigned long long)btrfs_super_bytenr(sb));
-   printf("flags\t\t\t0x%llx\n",
-   (unsigned long long)btrfs_super_flags(sb));
+   /*
+* Use btrfs minimal sector size as basic check for bytenr, since we
+* can't trust sector size from super block.
+* This 4K check should works fine for most architecture, and will be
+* just a little loose for 64K page system.
+*
+* And such 4K check will be used for other members too.
+*/
+   print_check_super(sb, bytenr, "\t\t\t", (!IS_ALIGNED(value, SZ_4K)));
+   print_super_hex(sb, flags, "\t\t\t");
print_readable_super_flag(btrfs_super_flags(sb));
 
   

Re: [PATCH] btrfs-progs: dump-super: Refactor print function and add extra check

2018-04-09 Thread Nikolay Borisov


On  9.04.2018 10:47, Qu Wenruo wrote:
> When manually patching super blocks, current validation check is pretty
> weak (limited to magic number and csum) and doesn't provide extra check
> for some obvious corruption (like invalid sectorsize).
> 
> This patch will enhance dump-super by:
> 
> 1) Refactor print function and add checker helper macro
>Instead of manually call printf() and takes 2 lines to just print one
>member, introduce several helper macros to print/check value.
>Most of the callers just need 1 line.
>(Although the macro and helper itself offsets the saved lines)
> 
> 2) Add extra check for some members
>The following members will be checked, and if invalid the suffix
>" [INVALID]" will be pended:
>   bytenr: alignment check
>   sys_array_size: maximum check
>   root:   alignment check
>   root_level: maximum check
>   chunk_root: alignment check
>   chunk_root_level:   maximum check
>   chunk_root_generation:  maximum check against root generation
>   log_root:   alignment check
>   log_root_level: maximum check
>   log_root_transid:   maximum check against root generation + 1
>   bytes_used: alignment check
>   sectorsize: power of 2 and 4K~64K range check
>   nodesize:   the same as sectorsize, plus minimum check
>   stripesize: the same as kernel check
>   cache_generation:   maximum check against root generation
>   uuid_tree_generation:   maximum check against root generation
> 
> 3) output sequence change
>Put bytenr/level/generation together for the same root (tree root,
>chunk root, and log root).
> 
> 4) Minor output change
>To make dev_item macro easier, the following output is changed:
>   dev_item.devid  ->  dev_item.id
>   dev_item.dev_group  ->  dev_item.group
> 
> Signed-off-by: Qu Wenruo 
> ---
>  cmds-inspect-dump-super.c | 168 ++
>  1 file changed, 99 insertions(+), 69 deletions(-)
> 
> diff --git a/cmds-inspect-dump-super.c b/cmds-inspect-dump-super.c
> index 24588c87cce6..8000d2ace663 100644
> --- a/cmds-inspect-dump-super.c
> +++ b/cmds-inspect-dump-super.c
> @@ -312,11 +312,42 @@ static void print_readable_super_flag(u64 flag)
>super_flags_num, BTRFS_SUPER_FLAG_SUPP);
>  }
>  
> +/* Helper to print member in %llu */
> +#define print_super(sb, member, spacer)  
> \
> + printf("%s%s%llu\n", #member, spacer, (u64) btrfs_super_##member(sb));
> +
> +/* Helper to print member in %llx */
> +#define print_super_hex(sb, member, spacer)  \
> + printf("%s%s%llx\n", #member, spacer, (u64) btrfs_super_##member(sb));

nit: I like the idea of the patch, however, I think the presence of the
"spacer" argument is a bit annoying. Why don't you make all the print
out to be aligned to the same level i.e hardcode \t\t\ or \t\t or whatever?
> +
> +/*
> + * Helper macro to wrap member checker
> + *
> + * Only support %llu output for member.
> + */
> +#define print_check_super(sb, member, spacer, bad_condition) \
> +({   \
> + u64 value;  \
> + \
> + value = btrfs_super_##member(sb);   \
> + printf("%s%s%llu", #member, spacer, (u64) value);   \
> + if (bad_condition)  \
> + printf(" [INVALID]");   \
> + printf("\n");   \
> +})
> +
> +/* Helper to print sb->dev_item members */
> +#define print_super_dev(sb, member, spacer)  \
> + printf("dev_item.%s%s%llu\n", #member, spacer,  \
> +(u64) btrfs_stack_device_##member(&sb->dev_item));
> +
>  static void dump_superblock(struct btrfs_super_block *sb, int full)
>  {
>   int i;
>   char *s, buf[BTRFS_UUID_UNPARSED_SIZE];
> + int max_level = BTRFS_MAX_LEVEL; /* to save several chars */
>   u8 *p;
> + u64 super_gen;
>   u32 csum_size;
>   u16 csum_type;
>  
> @@ -348,10 +379,16 @@ static void dump_superblock(struct btrfs_super_block 
> *sb, int full)
>   printf(" [DON'T MATCH]");
>   putchar('\n');
>  
> - printf("bytenr\t\t\t%llu\n",
> - (unsigned long long)btrfs_super_bytenr(sb));
> - printf("flags\t\t\t0x%llx\n",
> - (unsigned long long)btrfs_super_flags(sb));
> + /*
> +  * Use btrfs minimal sector size as basic check for bytenr, since we
> +  * can't trust sector size from super block.
> +  * This 4K check should work

Re: [PATCH] btrfs-progs: dump-super: Refactor print function and add extra check

2018-04-09 Thread Qu Wenruo


On 2018年04月09日 16:06, Nikolay Borisov wrote:
> 
> 
> On  9.04.2018 10:47, Qu Wenruo wrote:
>> When manually patching super blocks, current validation check is pretty
>> weak (limited to magic number and csum) and doesn't provide extra check
>> for some obvious corruption (like invalid sectorsize).
>>
>> This patch will enhance dump-super by:
>>
>> 1) Refactor print function and add checker helper macro
>>Instead of manually call printf() and takes 2 lines to just print one
>>member, introduce several helper macros to print/check value.
>>Most of the callers just need 1 line.
>>(Although the macro and helper itself offsets the saved lines)
>>
>> 2) Add extra check for some members
>>The following members will be checked, and if invalid the suffix
>>" [INVALID]" will be pended:
>>  bytenr: alignment check
>>  sys_array_size: maximum check
>>  root:   alignment check
>>  root_level: maximum check
>>  chunk_root: alignment check
>>  chunk_root_level:   maximum check
>>  chunk_root_generation:  maximum check against root generation
>>  log_root:   alignment check
>>  log_root_level: maximum check
>>  log_root_transid:   maximum check against root generation + 1
>>  bytes_used: alignment check
>>  sectorsize: power of 2 and 4K~64K range check
>>  nodesize:   the same as sectorsize, plus minimum check
>>  stripesize: the same as kernel check
>>  cache_generation:   maximum check against root generation
>>  uuid_tree_generation:   maximum check against root generation
>>
>> 3) output sequence change
>>Put bytenr/level/generation together for the same root (tree root,
>>chunk root, and log root).
>>
>> 4) Minor output change
>>To make dev_item macro easier, the following output is changed:
>>  dev_item.devid  ->  dev_item.id
>>  dev_item.dev_group  ->  dev_item.group
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>>  cmds-inspect-dump-super.c | 168 ++
>>  1 file changed, 99 insertions(+), 69 deletions(-)
>>
>> diff --git a/cmds-inspect-dump-super.c b/cmds-inspect-dump-super.c
>> index 24588c87cce6..8000d2ace663 100644
>> --- a/cmds-inspect-dump-super.c
>> +++ b/cmds-inspect-dump-super.c
>> @@ -312,11 +312,42 @@ static void print_readable_super_flag(u64 flag)
>>   super_flags_num, BTRFS_SUPER_FLAG_SUPP);
>>  }
>>  
>> +/* Helper to print member in %llu */
>> +#define print_super(sb, member, spacer) 
>> \
>> +printf("%s%s%llu\n", #member, spacer, (u64) btrfs_super_##member(sb));
>> +
>> +/* Helper to print member in %llx */
>> +#define print_super_hex(sb, member, spacer) \
>> +printf("%s%s%llx\n", #member, spacer, (u64) btrfs_super_##member(sb));
> 
> nit: I like the idea of the patch, however, I think the presence of the
> "spacer" argument is a bit annoying. Why don't you make all the print
> out to be aligned to the same level i.e hardcode \t\t\ or \t\t or whatever?

Makes sense.
Since currently all data output is aligned to "\t\t\t", it's not that
hard to do alignment calculation in the macro.

Thanks,
Qu

>> +
>> +/*
>> + * Helper macro to wrap member checker
>> + *
>> + * Only support %llu output for member.
>> + */
>> +#define print_check_super(sb, member, spacer, bad_condition)
>> \
>> +({  \
>> +u64 value;  \
>> +\
>> +value = btrfs_super_##member(sb);   \
>> +printf("%s%s%llu", #member, spacer, (u64) value);   \
>> +if (bad_condition)  \
>> +printf(" [INVALID]");   \
>> +printf("\n");   \
>> +})
>> +
>> +/* Helper to print sb->dev_item members */
>> +#define print_super_dev(sb, member, spacer) \
>> +printf("dev_item.%s%s%llu\n", #member, spacer,  \
>> +   (u64) btrfs_stack_device_##member(&sb->dev_item));
>> +
>>  static void dump_superblock(struct btrfs_super_block *sb, int full)
>>  {
>>  int i;
>>  char *s, buf[BTRFS_UUID_UNPARSED_SIZE];
>> +int max_level = BTRFS_MAX_LEVEL; /* to save several chars */
>>  u8 *p;
>> +u64 super_gen;
>>  u32 csum_size;
>>  u16 csum_type;
>>  
>> @@ -348,10 +379,16 @@ static void dump_superblock(struct btrfs_super_block 
>> *sb, int full)
>>  printf(" [DON'T MATCH]");
>>  putchar('\n');
>>  
>> -printf("bytenr\t\t\t%llu\n",
>> -(unsigned long long)btrfs_super_bytenr(sb));
>> -pr

Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start

2018-04-09 Thread Anand Jain



On 04/04/2018 02:34 AM, David Sterba wrote:

The volume mutex does not protect against anything in this case, the
comment about scrub is right but not related to locking and looks
confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
and confusing too.

The device_list_mutex is not held here to protect device lookup, but in
this case device replace cannot run in parallel with device removal (due
to exclusive op protection), so we don't need further locking here.


Agreed, usage of device_list_mutex and volume_mutex is kind of redundant.

There are unfinished features in btrfs volume, such as device offline 
while it was mounted (patches in the ML).


It's better to replace volume_mutex with device_list_mutex instead of 
removing it, as we might need it in the context mentioned above.


Or it's not a good idea to clean up until all the features are in place 
otherwise we end up adding the locks again at some point.


Thanks, Anand



Signed-off-by: David Sterba 
---
  fs/btrfs/dev-replace.c | 7 +--
  fs/btrfs/volumes.c | 4 
  2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 346bd460f8e7..ba011af9b0d2 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -426,18 +426,13 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
struct btrfs_device *tgt_device = NULL;
struct btrfs_device *src_device = NULL;
  
-	/* the disk copy procedure reuses the scrub code */

-   mutex_lock(&fs_info->volume_mutex);
ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
srcdev_name, &src_device);
-   if (ret) {
-   mutex_unlock(&fs_info->volume_mutex);
+   if (ret)
return ret;
-   }
  
  	ret = btrfs_init_dev_replace_tgtdev(fs_info, tgtdev_name,

src_device, &tgt_device);
-   mutex_unlock(&fs_info->volume_mutex);
if (ret)
return ret;
  
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c

index b073ab4c3c70..0ae29cd69363 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2198,10 +2198,6 @@ int btrfs_find_device_missing_or_by_path(struct 
btrfs_fs_info *fs_info,
struct btrfs_device *tmp;
  
  		devices = &fs_info->fs_devices->devices;

-   /*
-* It is safe to read the devices since the volume_mutex
-* is held by the caller.
-*/
list_for_each_entry(tmp, devices, dev_list) {
if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
&tmp->dev_state) && !tmp->bdev) {


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start

2018-04-09 Thread Nikolay Borisov


On  9.04.2018 11:39, Anand Jain wrote:
> 
> 
> On 04/04/2018 02:34 AM, David Sterba wrote:
>> The volume mutex does not protect against anything in this case, the
>> comment about scrub is right but not related to locking and looks
>> confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
>> and confusing too.
>>
>> The device_list_mutex is not held here to protect device lookup, but in
>> this case device replace cannot run in parallel with device removal (due
>> to exclusive op protection), so we don't need further locking here.
> 
> Agreed, usage of device_list_mutex and volume_mutex is kind of redundant.
> 
> There are unfinished features in btrfs volume, such as device offline
> while it was mounted (patches in the ML).
> 
> It's better to replace volume_mutex with device_list_mutex instead of
> removing it, as we might need it in the context mentioned above.
> 
> Or it's not a good idea to clean up until all the features are in place
> otherwise we end up adding the locks again at some point.

It's better that cleanups go so they leave the code in a better shape
than it is. Then when/if the missing features are complete and the
patches that implement them contain proper explanation how locking
works the code can be committed.

Taking into account how there is preparatory code in btrfs for features
which haven't landed for quite some time (i.e the in-band dedup stuff)
I'm against people future-proofing without clear path to merging the
feature the future-proofing pertains to.

> 
> Thanks, Anand
> 
> 
>> Signed-off-by: David Sterba 
>> ---
>>   fs/btrfs/dev-replace.c | 7 +--
>>   fs/btrfs/volumes.c | 4 
>>   2 files changed, 1 insertion(+), 10 deletions(-)
>>
>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>> index 346bd460f8e7..ba011af9b0d2 100644
>> --- a/fs/btrfs/dev-replace.c
>> +++ b/fs/btrfs/dev-replace.c
>> @@ -426,18 +426,13 @@ int btrfs_dev_replace_start(struct btrfs_fs_info
>> *fs_info,
>>   struct btrfs_device *tgt_device = NULL;
>>   struct btrfs_device *src_device = NULL;
>>   -    /* the disk copy procedure reuses the scrub code */
>> -    mutex_lock(&fs_info->volume_mutex);
>>   ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
>>   srcdev_name, &src_device);
>> -    if (ret) {
>> -    mutex_unlock(&fs_info->volume_mutex);
>> +    if (ret)
>>   return ret;
>> -    }
>>     ret = btrfs_init_dev_replace_tgtdev(fs_info, tgtdev_name,
>>   src_device, &tgt_device);
>> -    mutex_unlock(&fs_info->volume_mutex);
>>   if (ret)
>>   return ret;
>>   diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index b073ab4c3c70..0ae29cd69363 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -2198,10 +2198,6 @@ int btrfs_find_device_missing_or_by_path(struct
>> btrfs_fs_info *fs_info,
>>   struct btrfs_device *tmp;
>>     devices = &fs_info->fs_devices->devices;
>> -    /*
>> - * It is safe to read the devices since the volume_mutex
>> - * is held by the caller.
>> - */
>>   list_for_each_entry(tmp, devices, dev_list) {
>>   if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
>>   &tmp->dev_state) && !tmp->bdev) {
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs-progs: dump-super: Refactor print function and add extra check

2018-04-09 Thread Qu Wenruo
When manually patching super blocks, current validation check is pretty
weak (limited to magic number and csum) and doesn't provide extra check
for some obvious corruption (like invalid sectorsize).

This patch will enhance dump-super by:

1) Refactor print function and add checker helper macro
   Instead of manually call printf() and takes 2 lines to just print one
   member, introduce several helper macros to print/check value.
   And now caller don't need to calculate the indent manually, it will
   be calculated automatically in the helper macros.
   Most of the callers just need 1 line.
   (Although the macro and helper itself offsets the saved lines)

2) Add extra check for some members
   The following members will be checked, and if invalid the suffix
   " [INVALID]" will be pended:
bytenr: alignment check
sys_array_size: maximum check
root:   alignment check
root_level: maximum check
chunk_root: alignment check
chunk_root_level:   maximum check
chunk_root_generation:  maximum check against root generation
log_root:   alignment check
log_root_level: maximum check
log_root_transid:   maximum check against root generation + 1
bytes_used: alignment check
sectorsize: power of 2 and 4K~64K range check
nodesize:   the same as sectorsize, plus minimum check
stripesize: the same as kernel check
cache_generation:   maximum check against root generation
uuid_tree_generation:   maximum check against root generation

3) output sequence change
   Put bytenr/level/generation together for the same root (tree root,
   chunk root, and log root).

4) Minor output change
   To make dev_item macro easier, the following output is changed:
dev_item.devid  ->  dev_item.id
dev_item.dev_group  ->  dev_item.group

Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Generate tab based indent string in helper macro instead of passing spacer
  string from outside, suggested by Nikolay.
  (In fact, if using %*s it would be much easier, however it needs extra
   rework for existing code as they still use tab as indent)
---
 cmds-inspect-dump-super.c | 206 +-
 1 file changed, 137 insertions(+), 69 deletions(-)

diff --git a/cmds-inspect-dump-super.c b/cmds-inspect-dump-super.c
index 24588c87cce6..68d6f59ad727 100644
--- a/cmds-inspect-dump-super.c
+++ b/cmds-inspect-dump-super.c
@@ -312,11 +312,80 @@ static void print_readable_super_flag(u64 flag)
 super_flags_num, BTRFS_SUPER_FLAG_SUPP);
 }
 
+#define INDENT_BUFFER_LEN  80
+#define TAB_LEN8
+#define SUPER_INDENT   24
+
+/* helper to generate tab based indent string */
+static void generate_tab_indent(char *buf, unsigned int indent)
+{
+   buf[0] = '\0';
+   for (indent = round_up(indent, TAB_LEN); indent > 0; indent -= TAB_LEN)
+   strncat(buf, "\t", INDENT_BUFFER_LEN);
+}
+
+/* Helper to print member in %llu */
+#define print_super(sb, member)
\
+({ \
+   int indent = SUPER_INDENT - strlen(#member);\
+   char indent_str[INDENT_BUFFER_LEN]; \
+   \
+   generate_tab_indent(indent_str, indent);\
+   printf("%s%s%llu\n", #member, indent_str,   \
+  (u64) btrfs_super_##member(sb)); \
+})
+
+/* Helper to print member in %llx */
+#define print_super_hex(sb, member)\
+({ \
+   int indent = SUPER_INDENT - strlen(#member);\
+   char indent_str[INDENT_BUFFER_LEN]; \
+   \
+   generate_tab_indent(indent_str, indent);\
+   printf("%s%s0x%llx\n", #member, indent_str, \
+  (u64) btrfs_super_##member(sb)); \
+})
+
+/*
+ * Helper macro to wrap member checker
+ *
+ * Only support %llu output for member.
+ */
+#define print_check_super(sb, member, bad_condition)   \
+({ \
+   int indent = SUPER_INDENT - strlen(#member);\
+   char indent_str[INDENT_BUFFER_LEN]; \
+   u64 value;  \
+  

Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start

2018-04-09 Thread Anand Jain



On 04/09/2018 04:54 PM, Nikolay Borisov wrote:



On  9.04.2018 11:39, Anand Jain wrote:



On 04/04/2018 02:34 AM, David Sterba wrote:

The volume mutex does not protect against anything in this case, the
comment about scrub is right but not related to locking and looks
confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
and confusing too.

The device_list_mutex is not held here to protect device lookup, but in
this case device replace cannot run in parallel with device removal (due
to exclusive op protection), so we don't need further locking here.


Agreed, usage of device_list_mutex and volume_mutex is kind of redundant.

There are unfinished features in btrfs volume, such as device offline
while it was mounted (patches in the ML).

It's better to replace volume_mutex with device_list_mutex instead of
removing it, as we might need it in the context mentioned above.

Or it's not a good idea to clean up until all the features are in place
otherwise we end up adding the locks again at some point.


It's better that cleanups go so they leave the code in a better shape
than it is. Then when/if the missing features are complete and the
patches that implement them contain proper explanation how locking
works the code can be committed.

Taking into account how there is preparatory code in btrfs for features
which haven't landed for quite some time (i.e the in-band dedup stuff)
I'm against people future-proofing without clear path to merging the
feature the future-proofing pertains to.


 Ok. Lets clean it up first.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Couldn't read tree root BTRFS - SEED

2018-04-09 Thread Senén Vidal Blanco
Hello Qu,
Thank you very much for the answer. I am sorry to answer late because before 
bothering more, I decided to perform another cloning test.
In this time he has joined the second device correctly.
The problem has occurred because they are BCACHE devices, and although the 
SEED disk does not vary over time, buy the read-write yes, with which I have 
made sure this time empty the cache before performing the cloning.

Now, when inserting the discs I have managed to mount everything correctly:
Label: 'SEED_MD2'  uuid: 285b16fa-b297-43ed-bb3b-311950729eb6
Total devices 2 FS bytes used 2.00TiB
devid1 size 5.44TiB used 1.97TiB path /dev/bcache2
devid2 size 1.80TiB used 122.03GiB path /dev/bcache1

The truth is that the BTRFS-BCACHE-MDADM combination is complex, but the 
results are really satisfactory and enormous versatility is achieved.

To thank the BTRFS team and other responsible people for doing a great job.


El jueves, 5 de abril de 2018 20:57:49 (CEST) Qu Wenruo escribió:
> On 2018年04月05日 19:27, Senén Vidal Blanco wrote:
> > Hello,
> > I have a problem when mounting the btrfs system with two units (one in
> > SEED
> > mode and the other in read-write mode)
> > 
> > Initially it is functioning correctly in production:
> > 
> > btrfs fi sh
> > Label: 'SEED_MD2'  uuid: 285b16fa-b297-43ed-bb3b-311950729eb6
> > 
> >Total devices 2 FS bytes used 2.00TiB
> >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1
> >devid2 size 1.80TiB used 119.03GiB path /dev/bcache0
> > 
> > Label: 'BOOT'  uuid: ce8fd2ef-975c-417a-b90c-280f8d324c44
> > 
> >Total devices 1 FS bytes used 536.86MiB
> >devid1 size 1.86GiB used 1.15GiB path /dev/md1
> > 
> > Label: 'SEED_MD2'  uuid: 851e4474-d375-4a25-b202-949e51f05877
> > 
> >Total devices 1 FS bytes used 1.94TiB
> >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1
> > 
> > The bcache0 and bcache1 units are what I mention.
> > bcache1 is the SEED
> > bcache0 is the read-write
> > 
> > When it comes to mounting the two copy mirror devices that is in
> > production I have this result:
> > 
> > parent transid verify failed on 2184045428736 wanted 269593 found 266275
> > parent transid verify failed on 2184045428736 wanted 269593 found 266275
> 
> It's not the problem that btrfs fails to assemble correct seed/rw
> devices list, it's something wrong (metadata corruption) happened.
> 
> > Ignoring transid failure
> 
> So this is btrfs-progs (btrfs check). Kernel doesn't ignore any transid
> failure AFAIK.
> 
> > Couldn't map the block 2287467560960
> > No mapping for 2287467560960-2287467577344
> > Couldn't map the block 2287467560960
> > bytenr mismatch, want=2287467560960, have=0
> > Couldn't read tree root
> > Label: 'SEED_MD2'  uuid: 851e4474-d375-4a25-b202-949e51f05877
> > 
> >Total devices 1 FS bytes used 1.94TiB
> >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1
> > 
> > Label: 'SEED_MD2'  uuid: 285b16fa-b297-43ed-bb3b-311950729eb6
> > 
> >Total devices 2 FS bytes used 1.98TiB
> >devid2 size 1.80TiB used 102.03GiB path /dev/bcache2
> >*** Some devices missing
> > 
> > Would there be any way to tell BTRFS that the missing unit is the one that
> > corresponds to the SEED unit so that it correctly rebuilds the file
> > system?
> 
> The rw device is corrupted.
> 
> To ensure your seed device is safe, please run "btrfs check  device>" to see if there is any problem.
> 
> From current output, it seems that your chunk tree is corrupted, so that
> btrfs check can't even map the logical address for your tree root.
> 
> And to further debug the corrupted fs, please attach the following data:
> 
> 1) btrfs inspect dump-super -fa 
> 2) btrfs inspect dump-tree -t chunk 
> 
> Thanks,
> Qu
> 
> > Thank you.

-- 
Senén Vidal Blanco - SGISoft S.L.
 
Tlf.: 986413322 - 660923711
GPG ID 466431A8AF01F99A
http://www.sgisoft.com/
--
 


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] fstests: generic test for fsync after fallocate

2018-04-09 Thread Dave Chinner
On Sun, Apr 08, 2018 at 10:07:54AM +0800, Eryu Guan wrote:
> On Thu, Apr 05, 2018 at 10:56:14PM +0100, fdman...@kernel.org wrote:
> > From: Filipe Manana 
> > 
> > Test that fsync operations preserve extents allocated with fallocate(2)
> > that are placed beyond a file's size.
> > 
> > This test is motivated by a bug found in btrfs where unwritten extents
> > beyond the inode's i_size were not preserved after a fsync and power
> > failure. The btrfs bug is fixed by the following patch for the linux
> > kernel:
> > 
> >  "Btrfs: fix loss of prealloc extents past i_size after fsync log replay"
> > 
> > Signed-off-by: Filipe Manana 
> 
> Hmm, xfs fails this test, while ext4 passes.
> 
> # diff -u tests/generic/483.out 
> /root/workspace/xfstests/results//xfs_4k_crc/generic/483.out.bad
> --- tests/generic/483.out   2018-04-07 23:35:00.55511 +0800
> +++ /root/workspace/xfstests/results//xfs_4k_crc/generic/483.out.bad
> 2018-04-07 23:39:48.780659707 +0800
> @@ -6,5 +6,5 @@
>  0: [0..511]: data
>  1: [512..2559]: unwritten
>  File baz fiemap:
> -0: [0..511]: data
> -1: [512..6143]: unwritten
> +0: [0..895]: data
> +1: [896..6143]: unwritten

Perfectly valid result from the test.

> > +# A file which already has a prealloc extent beyond its size.
> > +# The fsync done on it is motivated by differences in the btrfs 
> > implementation
> > +# of fsync (first fsync has different logic from subsequent fsyncs).
> > +$XFS_IO_PROG -f -c "pwrite -S 0xf1 0 256K" \
> > +-c "falloc -k 256K 768K" \
> > +-c "fsync" \
> > +$SCRATCH_MNT/baz >/dev/null

Delayed allocation extending the file to 256k means we'll have
speculative prealloc of data beyond 256k - it should be somewhere
around 448k IIRC.

Now, falloc doesn't guarantee unwritten extents are allocated - it
just guarantees space is allocated. unwritten extents are an
optimisation to avoid needing to zero the extents pre-allocated
within EOF. i.e. we can quite safely allocate data extents beyond
EOF rather than unwritten extents, and this is not a bug.

In this case, the delalloc extent is completely allocated as written
data, but nothing is written beyond EOF @ 256k.  Hence it reports as
a data extent, not an unwritten extent, and this is a perfectly
valid thing to do.

> > +# Allocate another extent beyond the size of file baz.
> > +$XFS_IO_PROG -c "falloc -k 1M 2M"\
> > +-c "fsync" \
> > +$SCRATCH_MNT/baz

You also cannot assume that two separate preallocations beyond EOF
are going to be contiguous (i.e. it could be two separate extents.

What you should just be checking is that there are extents allocated
covering EOF to 3MB, not the exactly size, shape and type of extents
are allocated.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: generic test for fsync after fallocate

2018-04-09 Thread Filipe Manana
On Mon, Apr 9, 2018 at 10:51 AM, Dave Chinner  wrote:
> On Sun, Apr 08, 2018 at 10:07:54AM +0800, Eryu Guan wrote:
>> On Thu, Apr 05, 2018 at 10:56:14PM +0100, fdman...@kernel.org wrote:
>> > From: Filipe Manana 
>> >
>> > Test that fsync operations preserve extents allocated with fallocate(2)
>> > that are placed beyond a file's size.
>> >
>> > This test is motivated by a bug found in btrfs where unwritten extents
>> > beyond the inode's i_size were not preserved after a fsync and power
>> > failure. The btrfs bug is fixed by the following patch for the linux
>> > kernel:
>> >
>> >  "Btrfs: fix loss of prealloc extents past i_size after fsync log replay"
>> >
>> > Signed-off-by: Filipe Manana 
>>
>> Hmm, xfs fails this test, while ext4 passes.
>>
>> # diff -u tests/generic/483.out 
>> /root/workspace/xfstests/results//xfs_4k_crc/generic/483.out.bad
>> --- tests/generic/483.out   2018-04-07 23:35:00.55511 +0800
>> +++ /root/workspace/xfstests/results//xfs_4k_crc/generic/483.out.bad
>> 2018-04-07 23:39:48.780659707 +0800
>> @@ -6,5 +6,5 @@
>>  0: [0..511]: data
>>  1: [512..2559]: unwritten
>>  File baz fiemap:
>> -0: [0..511]: data
>> -1: [512..6143]: unwritten
>> +0: [0..895]: data
>> +1: [896..6143]: unwritten
>
> Perfectly valid result from the test.

Still very surprising without knowing specifics about xfs (as you
describe below).

>
>> > +# A file which already has a prealloc extent beyond its size.
>> > +# The fsync done on it is motivated by differences in the btrfs 
>> > implementation
>> > +# of fsync (first fsync has different logic from subsequent fsyncs).
>> > +$XFS_IO_PROG -f -c "pwrite -S 0xf1 0 256K" \
>> > +-c "falloc -k 256K 768K" \
>> > +-c "fsync" \
>> > +$SCRATCH_MNT/baz >/dev/null
>
> Delayed allocation extending the file to 256k means we'll have
> speculative prealloc of data beyond 256k - it should be somewhere
> around 448k IIRC.
>
> Now, falloc doesn't guarantee unwritten extents are allocated - it
> just guarantees space is allocated. unwritten extents are an
> optimisation to avoid needing to zero the extents pre-allocated
> within EOF. i.e. we can quite safely allocate data extents beyond
> EOF rather than unwritten extents, and this is not a bug.
>
> In this case, the delalloc extent is completely allocated as written
> data, but nothing is written beyond EOF @ 256k.  Hence it reports as
> a data extent, not an unwritten extent, and this is a perfectly
> valid thing to do.
>
>> > +# Allocate another extent beyond the size of file baz.
>> > +$XFS_IO_PROG -c "falloc -k 1M 2M"\
>> > +-c "fsync" \
>> > +$SCRATCH_MNT/baz
>
> You also cannot assume that two separate preallocations beyond EOF
> are going to be contiguous (i.e. it could be two separate extents.

I really don't care about that.
Just want to check that allocated space is not lost, don't care if
that space is covered by 1, 2 or more extents, or whether they are
contiguous or not.
>
> What you should just be checking is that there are extents allocated
> covering EOF to 3MB, not the exactly size, shape and type of extents
> are allocated.

How to do such check, for a generic test, without using fiemap?

thanks

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: generic test for fsync after fallocate

2018-04-09 Thread Dave Chinner
On Mon, Apr 09, 2018 at 11:00:52AM +0100, Filipe Manana wrote:
> On Mon, Apr 9, 2018 at 10:51 AM, Dave Chinner  wrote:
> > On Sun, Apr 08, 2018 at 10:07:54AM +0800, Eryu Guan wrote:
> >> On Thu, Apr 05, 2018 at 10:56:14PM +0100, fdman...@kernel.org wrote:
> > You also cannot assume that two separate preallocations beyond EOF
> > are going to be contiguous (i.e. it could be two separate extents.
> 
> I really don't care about that.
> Just want to check that allocated space is not lost, don't care if
> that space is covered by 1, 2 or more extents, or whether they are
> contiguous or not.

Sure, but that's not what the test does :/

> > What you should just be checking is that there are extents allocated
> > covering EOF to 3MB, not the exactly size, shape and type of extents
> > are allocated.
> 
> How to do such check, for a generic test, without using fiemap?

We already do such checks using fiemap, yes?

i.e. there's a bunch of _filter*fiemap() functions in common/punch
which should show you exactly how to do this i.e. filter all the
unwritten/data extents to be the same name, then they coalesce into
single ranges

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace

2018-04-09 Thread David Sterba
On Sat, Apr 07, 2018 at 06:43:19PM +0800, Anand Jain wrote:
> On 04/07/2018 02:42 PM, Anand Jain wrote:
> > On 04/04/2018 02:34 AM, David Sterba wrote:
> >> The device replace is paused by unmount or read only remount, and
> >> resumed on next mount or write remount.
> >>
> >> The exclusive status should be checked properly as it's a global
> >> invariant and we must not allow 2 operations run. In this case, the
> >> balance can be also paused and resumed under same conditions. It's
> >> always checked first so dev-replace could see the EXCL_OP already taken,
> >> BUT, the ioctl would never let start both at the same time.
> >>
> >> Replace the WARN_ON with message and return 0, indicating no error as
> >> this is purely theoretical and the user will be informed. Resolving that
> >> manually should be possible by waiting for the other operation to finish
> >> or cancel the paused state.
> > 
> >   So if there is a paused balance and replace is triggered, a write
> >   remount will reverse the process? 
> 
>   Ok, I am answering my own question:
>   Even if the balance is paused that's considered as an exclusive
>   operation in progress and does not let replace to start. So there
>   is no scenario where paused-balance and replace could co-exist.

The code is there to handle a theoretical but possible scenario, when the
balance item and replace items are both on the filesytem image.

>   So an asset will be better insted of return 0.

An assert may not be compile in and we still don't want to let
dev-replace and balance run together. A WARN_ON or BUG in this case is
too intrusive, a human readable error message says what to do, compared
to a stacktrace and insight from somebody else.

> + if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
> + btrfs_info(fs_info,
> + "cannot resume dev-replace, other exclusive operation running");
> + return 0;
> + }
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start

2018-04-09 Thread David Sterba
On Mon, Apr 09, 2018 at 04:39:03PM +0800, Anand Jain wrote:
> 
> 
> On 04/04/2018 02:34 AM, David Sterba wrote:
> > The volume mutex does not protect against anything in this case, the
> > comment about scrub is right but not related to locking and looks
> > confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
> > and confusing too.
> > 
> > The device_list_mutex is not held here to protect device lookup, but in
> > this case device replace cannot run in parallel with device removal (due
> > to exclusive op protection), so we don't need further locking here.
> 
> Agreed, usage of device_list_mutex and volume_mutex is kind of redundant.
> 
> There are unfinished features in btrfs volume, such as device offline 
> while it was mounted (patches in the ML).
> 
> It's better to replace volume_mutex with device_list_mutex instead of 
> removing it, as we might need it in the context mentioned above.

The device_list_mutex will not do anything useful here. It's taken in
contexts where we need to make sure the device list is locked for writes
or we don't want to let the device disappear (superblock write).

As dev-replace requires the device to exist throughout the whole
operation (src_device), the EXCL_OP bit is the correct protection and we
can't hold device_list_mutex throughout the whole
btrfs_dev_replace_start.

> Or it's not a good idea to clean up until all the features are in place 
> otherwise we end up adding the locks again at some point.

The lock is redundant and no new code should depend on it. If you're
going to add a more fine grained locking of devices, then we can revisit
what's the best way to do it and I'm quite sure this will not require
re-adding the volume_mutex as it is used now.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] Tracepoint updates

2018-04-09 Thread David Sterba
Switch the inode number type to u64, plus other cleanups and fixups.

David Sterba (7):
  btrfs: tracepoints, use correct type for inode number
  btrfs: tracepoints, use %llu instead of %Lu
  btrfs: tracepoints, drop unnecessary ULL casts
  btrfs: tracepoints, fix whitespace in strings
  btrfs: tracepoints, use extended format with UUID where possible
  btrfs: tests: pass fs_info to extent_map tests
  btrfs: use fs_info for btrfs_handle_em_exist tracepoint

 fs/btrfs/extent_map.c |   6 +-
 fs/btrfs/extent_map.h |   3 +-
 fs/btrfs/inode.c  |   2 +-
 fs/btrfs/tests/extent-map-tests.c |  60 ++
 include/trace/events/btrfs.h  | 225 +++---
 5 files changed, 158 insertions(+), 138 deletions(-)

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] btrfs: tracepoints, use %llu instead of %Lu

2018-04-09 Thread David Sterba
For consistency, use the %llu form.

Signed-off-by: David Sterba 
---
 include/trace/events/btrfs.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index dafd58ad86ec..200c45911919 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1002,7 +1002,7 @@ TRACE_EVENT(btrfs_space_reservation,
__entry->reserve= reserve;
),
 
-   TP_printk_btrfs("%s: %Lu %s %Lu", __get_str(type), __entry->val,
+   TP_printk_btrfs("%s: %llu %s %llu", __get_str(type), __entry->val,
__entry->reserve ? "reserve" : "release",
__entry->bytes)
 );
@@ -1141,7 +1141,7 @@ TRACE_EVENT(find_free_extent,
__entry->data   = data;
),
 
-   TP_printk_btrfs("root=%Lu(%s) len=%Lu empty_size=%Lu flags=%Lu(%s)",
+   TP_printk_btrfs("root=%llu(%s) len=%llu empty_size=%llu flags=%llu(%s)",
  show_root_type(BTRFS_EXTENT_TREE_OBJECTID),
  __entry->num_bytes, __entry->empty_size, __entry->data,
  __print_flags((unsigned long)__entry->data, "|",
@@ -1170,8 +1170,8 @@ DECLARE_EVENT_CLASS(btrfs__reserve_extent,
__entry->len= len;
),
 
-   TP_printk_btrfs("root=%Lu(%s) block_group=%Lu flags=%Lu(%s) "
- "start=%Lu len=%Lu",
+   TP_printk_btrfs("root=%llu(%s) block_group=%llu flags=%llu(%s) "
+ "start=%llu len=%llu",
  show_root_type(BTRFS_EXTENT_TREE_OBJECTID),
  __entry->bg_objectid,
  __entry->flags, __print_flags((unsigned long)__entry->flags,
@@ -1222,8 +1222,8 @@ TRACE_EVENT(btrfs_find_cluster,
__entry->min_bytes  = min_bytes;
),
 
-   TP_printk_btrfs("block_group=%Lu flags=%Lu(%s) start=%Lu len=%Lu "
- "empty_size=%Lu min_bytes=%Lu", __entry->bg_objectid,
+   TP_printk_btrfs("block_group=%llu flags=%llu(%s) start=%llu len=%llu "
+ "empty_size=%llu min_bytes=%llu", __entry->bg_objectid,
  __entry->flags,
  __print_flags((unsigned long)__entry->flags, "|",
BTRFS_GROUP_FLAGS), __entry->start,
@@ -1244,7 +1244,7 @@ TRACE_EVENT(btrfs_failed_cluster_setup,
__entry->bg_objectid= block_group->key.objectid;
),
 
-   TP_printk_btrfs("block_group=%Lu", __entry->bg_objectid)
+   TP_printk_btrfs("block_group=%llu", __entry->bg_objectid)
 );
 
 TRACE_EVENT(btrfs_setup_cluster,
@@ -1273,8 +1273,8 @@ TRACE_EVENT(btrfs_setup_cluster,
__entry->bitmap = bitmap;
),
 
-   TP_printk_btrfs("block_group=%Lu flags=%Lu(%s) window_start=%Lu "
- "size=%Lu max_size=%Lu bitmap=%d",
+   TP_printk_btrfs("block_group=%llu flags=%llu(%s) window_start=%llu "
+ "size=%llu max_size=%llu bitmap=%d",
  __entry->bg_objectid,
  __entry->flags,
  __print_flags((unsigned long)__entry->flags, "|",
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] btrfs: tracepoints, drop unnecessary ULL casts

2018-04-09 Thread David Sterba
The (unsigned long long) casts are not necessary since long ago.

Signed-off-by: David Sterba 
---
 include/trace/events/btrfs.h | 124 +--
 1 file changed, 62 insertions(+), 62 deletions(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 200c45911919..06e8b8bdfb42 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -123,7 +123,7 @@ TRACE_EVENT(btrfs_transaction_commit,
 
TP_printk_btrfs("root = %llu(%s), gen = %llu",
  show_root_type(__entry->root_objectid),
- (unsigned long long)__entry->generation)
+ __entry->generation)
 );
 
 DECLARE_EVENT_CLASS(btrfs__inode,
@@ -156,12 +156,12 @@ DECLARE_EVENT_CLASS(btrfs__inode,
TP_printk_btrfs("root=%llu(%s) gen=%llu ino=%llu blocks=%llu "
  "disk_i_size=%llu last_trans=%llu logged_trans=%llu",
  show_root_type(__entry->root_objectid),
- (unsigned long long)__entry->generation,
- (unsigned long long)__entry->ino,
+ __entry->generation,
+ __entry->ino,
  (unsigned long long)__entry->blocks,
- (unsigned long long)__entry->disk_i_size,
- (unsigned long long)__entry->last_trans,
- (unsigned long long)__entry->logged_trans)
+ __entry->disk_i_size,
+ __entry->last_trans,
+ __entry->logged_trans)
 );
 
 DEFINE_EVENT(btrfs__inode, btrfs_inode_new,
@@ -244,12 +244,12 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
  "block_len=%llu flags=%s refs=%u "
  "compress_type=%u",
  show_root_type(__entry->root_objectid),
- (unsigned long long)__entry->ino,
- (unsigned long long)__entry->start,
- (unsigned long long)__entry->len,
- (unsigned long long)__entry->orig_start,
+ __entry->ino,
+ __entry->start,
+ __entry->len,
+ __entry->orig_start,
  show_map_type(__entry->block_start),
- (unsigned long long)__entry->block_len,
+ __entry->block_len,
  show_map_flags(__entry->flags),
  __entry->refs, __entry->compress_type)
 );
@@ -281,12 +281,12 @@ TRACE_EVENT(btrfs_handle_em_exist,
TP_printk("start=%llu len=%llu "
  "existing(start=%llu len=%llu) "
  "em(start=%llu len=%llu)",
- (unsigned long long)__entry->start,
- (unsigned long long)__entry->len,
- (unsigned long long)__entry->e_start,
- (unsigned long long)__entry->e_len,
- (unsigned long long)__entry->map_start,
- (unsigned long long)__entry->map_len)
+ __entry->start,
+ __entry->len,
+ __entry->e_start,
+ __entry->e_len,
+ __entry->map_start,
+ __entry->map_len)
 );
 
 /* file extent item */
@@ -477,13 +477,13 @@ DECLARE_EVENT_CLASS(btrfs__ordered_extent,
  "bytes_left=%llu flags=%s compress_type=%d "
  "refs=%d",
  show_root_type(__entry->root_objectid),
- (unsigned long long)__entry->ino,
- (unsigned long long)__entry->file_offset,
- (unsigned long long)__entry->start,
- (unsigned long long)__entry->len,
- (unsigned long long)__entry->disk_len,
- (unsigned long long)__entry->truncated_len,
- (unsigned long long)__entry->bytes_left,
+ __entry->ino,
+ __entry->file_offset,
+ __entry->start,
+ __entry->len,
+ __entry->disk_len,
+ __entry->truncated_len,
+ __entry->bytes_left,
  show_ordered_flags(__entry->flags),
  __entry->compress_type, __entry->refs)
 );
@@ -561,7 +561,7 @@ DECLARE_EVENT_CLASS(btrfs__writepage,
  "range_end=%llu for_kupdate=%d "
  "for_reclaim=%d range_cyclic=%d writeback_index=%lu",
  show_root_type(__entry->root_objectid),
- (unsigned long long)__entry->ino, __entry->index,
+ __entry->ino, __entry->index,
  __entry->nr_to_write, __entry->pages_skipped,
  __entry->range_start, __entry->range_end,
  __entry->for_kupdate,
@@ -605,9 +605,9 @@ TRACE_EVENT(btrfs_writepage_end_io_hook,
TP_printk_btrfs("root=%llu(%s) ino=%llu page_index=%lu start=%llu "
  "end=%llu uptodate=%d",
  show_root_type(__entry->root_objectid),
- (unsigned long long)__entry->ino, (unsigned 
long)__entry->index,
-  

[PATCH 6/7] btrfs: tests: pass fs_info to extent_map tests

2018-04-09 Thread David Sterba
Preparatory work to pass fs_info to btrfs_add_extent_mapping so we can
get a better tracepoint message. Extent maps do not need fs_info for
anything so we only add a dummy one without any other initialization.

Signed-off-by: David Sterba 
---
 fs/btrfs/tests/extent-map-tests.c | 52 +++
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/tests/extent-map-tests.c 
b/fs/btrfs/tests/extent-map-tests.c
index c23bd00bdd92..33760a7f580d 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -60,7 +60,8 @@ static void free_extent_map_tree(struct extent_map_tree 
*em_tree)
  *->add_extent_mapping(0, 16K)
  *-> #handle -EEXIST
  */
-static void test_case_1(struct extent_map_tree *em_tree)
+static void test_case_1(struct btrfs_fs_info *fs_info,
+   struct extent_map_tree *em_tree)
 {
struct extent_map *em;
u64 start = 0;
@@ -125,7 +126,8 @@ static void test_case_1(struct extent_map_tree *em_tree)
  * Reading the inline ending up with EEXIST, ie. read an inline
  * extent and discard page cache and read it again.
  */
-static void test_case_2(struct extent_map_tree *em_tree)
+static void test_case_2(struct btrfs_fs_info *fs_info,
+   struct extent_map_tree *em_tree)
 {
struct extent_map *em;
int ret;
@@ -182,7 +184,8 @@ static void test_case_2(struct extent_map_tree *em_tree)
free_extent_map_tree(em_tree);
 }
 
-static void __test_case_3(struct extent_map_tree *em_tree, u64 start)
+static void __test_case_3(struct btrfs_fs_info *fs_info,
+   struct extent_map_tree *em_tree, u64 start)
 {
struct extent_map *em;
u64 len = SZ_4K;
@@ -248,14 +251,16 @@ static void __test_case_3(struct extent_map_tree 
*em_tree, u64 start)
  *   -> add_extent_mapping()
  *-> add_extent_mapping()
  */
-static void test_case_3(struct extent_map_tree *em_tree)
+static void test_case_3(struct btrfs_fs_info *fs_info,
+   struct extent_map_tree *em_tree)
 {
-   __test_case_3(em_tree, 0);
-   __test_case_3(em_tree, SZ_8K);
-   __test_case_3(em_tree, (12 * 1024ULL));
+   __test_case_3(fs_info, em_tree, 0);
+   __test_case_3(fs_info, em_tree, SZ_8K);
+   __test_case_3(fs_info, em_tree, (12 * 1024ULL));
 }
 
-static void __test_case_4(struct extent_map_tree *em_tree, u64 start)
+static void __test_case_4(struct btrfs_fs_info *fs_info,
+   struct extent_map_tree *em_tree, u64 start)
 {
struct extent_map *em;
u64 len = SZ_4K;
@@ -337,30 +342,45 @@ static void __test_case_4(struct extent_map_tree 
*em_tree, u64 start)
  * # handle -EEXIST when adding
  * # [0, 32K)
  */
-static void test_case_4(struct extent_map_tree *em_tree)
+static void test_case_4(struct btrfs_fs_info *fs_info,
+   struct extent_map_tree *em_tree)
 {
-   __test_case_4(em_tree, 0);
-   __test_case_4(em_tree, SZ_4K);
+   __test_case_4(fs_info, em_tree, 0);
+   __test_case_4(fs_info, em_tree, SZ_4K);
 }
 
 int btrfs_test_extent_map(void)
 {
+   struct btrfs_fs_info *fs_info = NULL;
struct extent_map_tree *em_tree;
 
test_msg("Running extent_map tests\n");
 
+   /*
+* Note: the fs_info is not set up completely, we only need
+* fs_info::fsid for the tracepoint.
+*/
+   fs_info = btrfs_alloc_dummy_fs_info(PAGE_SIZE, PAGE_SIZE);
+   if (!fs_info) {
+   test_msg("Couldn't allocate dummy fs info\n");
+   return -ENOMEM;
+   }
+
em_tree = kzalloc(sizeof(*em_tree), GFP_KERNEL);
if (!em_tree)
/* Skip the test on error. */
-   return 0;
+   goto out;
 
extent_map_tree_init(em_tree);
 
-   test_case_1(em_tree);
-   test_case_2(em_tree);
-   test_case_3(em_tree);
-   test_case_4(em_tree);
+   test_case_1(fs_info, em_tree);
+   test_case_2(fs_info, em_tree);
+   test_case_3(fs_info, em_tree);
+   test_case_4(fs_info, em_tree);
 
kfree(em_tree);
+out:
+   btrfs_free_dummy_fs_info(fs_info);
+
return 0;
 }
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] btrfs: tracepoints, use extended format with UUID where possible

2018-04-09 Thread David Sterba
Most of the strings are prefixed by the UUID of the filesystem that
generates the message, however there are a few events that still
opencode the macro magic and can be converted to the common macros.

Signed-off-by: David Sterba 
---
 include/trace/events/btrfs.h | 30 --
 1 file changed, 12 insertions(+), 18 deletions(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 1773355e9365..9be469706d30 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -666,8 +666,7 @@ TRACE_EVENT(btrfs_add_block_group,
 
TP_ARGS(fs_info, block_group, create),
 
-   TP_STRUCT__entry(
-   __array(u8, fsid,   BTRFS_FSID_SIZE )
+   TP_STRUCT__entry_btrfs(
__field(u64,offset  )
__field(u64,size)
__field(u64,flags   )
@@ -676,8 +675,7 @@ TRACE_EVENT(btrfs_add_block_group,
__field(int,create  )
),
 
-   TP_fast_assign(
-   memcpy(__entry->fsid, fs_info->fsid, BTRFS_FSID_SIZE);
+   TP_fast_assign_btrfs(fs_info,
__entry->offset = block_group->key.objectid;
__entry->size   = block_group->key.offset;
__entry->flags  = block_group->flags;
@@ -687,9 +685,9 @@ TRACE_EVENT(btrfs_add_block_group,
__entry->create = create;
),
 
-   TP_printk("%pU: block_group offset=%llu size=%llu "
+   TP_printk_btrfs("block_group offset=%llu size=%llu "
  "flags=%llu(%s) bytes_used=%llu bytes_super=%llu "
- "create=%d", __entry->fsid,
+ "create=%d",
  __entry->offset,
  __entry->size,
  __entry->flags,
@@ -1020,24 +1018,22 @@ TRACE_EVENT(btrfs_trigger_flush,
 
TP_ARGS(fs_info, flags, bytes, flush, reason),
 
-   TP_STRUCT__entry(
-   __array(u8, fsid,   BTRFS_FSID_SIZE )
+   TP_STRUCT__entry_btrfs(
__field(u64,flags   )
__field(u64,bytes   )
__field(int,flush   )
__string(   reason, reason  )
),
 
-   TP_fast_assign(
-   memcpy(__entry->fsid, fs_info->fsid, BTRFS_FSID_SIZE);
+   TP_fast_assign_btrfs(fs_info,
__entry->flags  = flags;
__entry->bytes  = bytes;
__entry->flush  = flush;
__assign_str(reason, reason)
),
 
-   TP_printk("%pU: %s: flush=%d(%s) flags=%llu(%s) bytes=%llu",
- __entry->fsid, __get_str(reason), __entry->flush,
+   TP_printk_btrfs("%s: flush=%d(%s) flags=%llu(%s) bytes=%llu",
+ __get_str(reason), __entry->flush,
  show_flush_action(__entry->flush),
  __entry->flags,
  __print_flags((unsigned long)__entry->flags, "|",
@@ -1061,24 +1057,22 @@ TRACE_EVENT(btrfs_flush_space,
 
TP_ARGS(fs_info, flags, num_bytes, state, ret),
 
-   TP_STRUCT__entry(
-   __array(u8, fsid,   BTRFS_FSID_SIZE )
+   TP_STRUCT__entry_btrfs(
__field(u64,flags   )
__field(u64,num_bytes   )
__field(int,state   )
__field(int,ret )
),
 
-   TP_fast_assign(
-   memcpy(__entry->fsid, fs_info->fsid, BTRFS_FSID_SIZE);
+   TP_fast_assign_btrfs(fs_info,
__entry->flags  =   flags;
__entry->num_bytes  =   num_bytes;
__entry->state  =   state;
__entry->ret=   ret;
),
 
-   TP_printk("%pU: state=%d(%s) flags=%llu(%s) num_bytes=%llu ret=%d",
- __entry->fsid, __entry->state,
+   TP_printk_btrfs("state=%d(%s) flags=%llu(%s) num_bytes=%llu ret=%d",
+ __entry->state,
  show_flush_state(__entry->state),
  __entry->flags,
  __print_flags((unsigned long)__entry->flags, "|",
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] btrfs: tracepoints, fix whitespace in strings

2018-04-09 Thread David Sterba
The preferred style is to avoid spaces between key and value and no
commas between key=values.

Signed-off-by: David Sterba 
---
 include/trace/events/btrfs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 06e8b8bdfb42..1773355e9365 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -121,7 +121,7 @@ TRACE_EVENT(btrfs_transaction_commit,
__entry->root_objectid  = root->root_key.objectid;
),
 
-   TP_printk_btrfs("root = %llu(%s), gen = %llu",
+   TP_printk_btrfs("root=%llu(%s) gen=%llu",
  show_root_type(__entry->root_objectid),
  __entry->generation)
 );
@@ -656,7 +656,7 @@ TRACE_EVENT(btrfs_sync_fs,
__entry->wait   = wait;
),
 
-   TP_printk_btrfs("wait = %d", __entry->wait)
+   TP_printk_btrfs("wait=%d", __entry->wait)
 );
 
 TRACE_EVENT(btrfs_add_block_group,
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] btrfs: use fs_info for btrfs_handle_em_exist tracepoint

2018-04-09 Thread David Sterba
We really want to know to which filesystem the extent map events belong,
but as it cannot be reached from the extent_map pointers, we need to
pass it down the callchain.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_map.c |  6 --
 fs/btrfs/extent_map.h |  3 ++-
 fs/btrfs/inode.c  |  2 +-
 fs/btrfs/tests/extent-map-tests.c |  8 
 include/trace/events/btrfs.h  | 12 +++-
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 53a0633c6ef7..581b42d23e0d 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -517,6 +517,7 @@ static noinline int merge_extent_mapping(struct 
extent_map_tree *em_tree,
 
 /**
  * btrfs_add_extent_mapping - add extent mapping into em_tree
+ * @fs_info - used for tracepoint
  * @em_tree - the extent tree into which we want to insert the extent mapping
  * @em_in   - extent we are inserting
  * @start   - start of the logical range btrfs_get_extent() is requesting
@@ -534,7 +535,8 @@ static noinline int merge_extent_mapping(struct 
extent_map_tree *em_tree,
  * Return 0 on success, otherwise -EEXIST.
  *
  */
-int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
+int btrfs_add_extent_mapping(struct btrfs_fs_info *fs_info,
+struct extent_map_tree *em_tree,
 struct extent_map **em_in, u64 start, u64 len)
 {
int ret;
@@ -552,7 +554,7 @@ int btrfs_add_extent_mapping(struct extent_map_tree 
*em_tree,
 
existing = search_extent_mapping(em_tree, start, len);
 
-   trace_btrfs_handle_em_exist(existing, em, start, len);
+   trace_btrfs_handle_em_exist(fs_info, existing, em, start, len);
 
/*
 * existing will always be non-NULL, since there must be
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index f6f8ba114977..f55c8b4ef120 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -91,6 +91,7 @@ int unpin_extent_cache(struct extent_map_tree *tree, u64 
start, u64 len, u64 gen
 void clear_em_logging(struct extent_map_tree *tree, struct extent_map *em);
 struct extent_map *search_extent_mapping(struct extent_map_tree *tree,
 u64 start, u64 len);
-int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
+int btrfs_add_extent_mapping(struct btrfs_fs_info *fs_info,
+struct extent_map_tree *em_tree,
 struct extent_map **em_in, u64 start, u64 len);
 #endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1f091c2358a4..18c31006865f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7092,7 +7092,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode 
*inode,
 
err = 0;
write_lock(&em_tree->lock);
-   err = btrfs_add_extent_mapping(em_tree, &em, start, len);
+   err = btrfs_add_extent_mapping(fs_info, em_tree, &em, start, len);
write_unlock(&em_tree->lock);
 out:
 
diff --git a/fs/btrfs/tests/extent-map-tests.c 
b/fs/btrfs/tests/extent-map-tests.c
index 33760a7f580d..d6c687267572 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -104,7 +104,7 @@ static void test_case_1(struct btrfs_fs_info *fs_info,
em->len = len;
em->block_start = start;
em->block_len = len;
-   ret = btrfs_add_extent_mapping(em_tree, &em, em->start, em->len);
+   ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, 
em->len);
if (ret)
test_msg("case1 [%llu %llu]: ret %d\n", start, start + len, 
ret);
if (em &&
@@ -168,7 +168,7 @@ static void test_case_2(struct btrfs_fs_info *fs_info,
em->len = SZ_1K;
em->block_start = EXTENT_MAP_INLINE;
em->block_len = (u64)-1;
-   ret = btrfs_add_extent_mapping(em_tree, &em, em->start, em->len);
+   ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, em->start, 
em->len);
if (ret)
test_msg("case2 [0 1K]: ret %d\n", ret);
if (em &&
@@ -214,7 +214,7 @@ static void __test_case_3(struct btrfs_fs_info *fs_info,
em->len = SZ_16K;
em->block_start = 0;
em->block_len = SZ_16K;
-   ret = btrfs_add_extent_mapping(em_tree, &em, start, len);
+   ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, start, len);
if (ret)
test_msg("case3 [0x%llx 0x%llx): ret %d\n",
 start, start + len, ret);
@@ -301,7 +301,7 @@ static void __test_case_4(struct btrfs_fs_info *fs_info,
em->len = SZ_32K;
em->block_start = 0;
em->block_len = SZ_32K;
-   ret = btrfs_add_extent_mapping(em_tree, &em, start, len);
+   ret = btrfs_add_extent_mapping(fs_info, em_tree, &em, start, len);
if (ret)
test_msg("case4 [0x%llx 0x%llx): ret %d\n",
 start, len, ret);

[PATCH 1/7] btrfs: tracepoints, use correct type for inode number

2018-04-09 Thread David Sterba
The size of ino_t depends on 32/64bit architecture type. Btrfs stores
the full 64bit inode anyway so we should use it.

Signed-off-by: David Sterba 
---
 include/trace/events/btrfs.h | 47 ++--
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 965c650a5273..dafd58ad86ec 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -133,7 +133,7 @@ DECLARE_EVENT_CLASS(btrfs__inode,
TP_ARGS(inode),
 
TP_STRUCT__entry_btrfs(
-   __field(ino_t,  ino )
+   __field(u64,  ino   )
__field(blkcnt_t,  blocks   )
__field(u64,  disk_i_size   )
__field(u64,  generation)
@@ -143,7 +143,7 @@ DECLARE_EVENT_CLASS(btrfs__inode,
),
 
TP_fast_assign_btrfs(btrfs_sb(inode->i_sb),
-   __entry->ino= inode->i_ino;
+   __entry->ino= btrfs_ino(BTRFS_I(inode));
__entry->blocks = inode->i_blocks;
__entry->disk_i_size  = BTRFS_I(inode)->disk_i_size;
__entry->generation = BTRFS_I(inode)->generation;
@@ -153,11 +153,11 @@ DECLARE_EVENT_CLASS(btrfs__inode,
BTRFS_I(inode)->root->root_key.objectid;
),
 
-   TP_printk_btrfs("root=%llu(%s) gen=%llu ino=%lu blocks=%llu "
+   TP_printk_btrfs("root=%llu(%s) gen=%llu ino=%llu blocks=%llu "
  "disk_i_size=%llu last_trans=%llu logged_trans=%llu",
  show_root_type(__entry->root_objectid),
  (unsigned long long)__entry->generation,
- (unsigned long)__entry->ino,
+ (unsigned long long)__entry->ino,
  (unsigned long long)__entry->blocks,
  (unsigned long long)__entry->disk_i_size,
  (unsigned long long)__entry->last_trans,
@@ -443,7 +443,7 @@ DECLARE_EVENT_CLASS(btrfs__ordered_extent,
TP_ARGS(inode, ordered),
 
TP_STRUCT__entry_btrfs(
-   __field(ino_t,  ino )
+   __field(u64,  ino   )
__field(u64,  file_offset   )
__field(u64,  start )
__field(u64,  len   )
@@ -457,7 +457,7 @@ DECLARE_EVENT_CLASS(btrfs__ordered_extent,
),
 
TP_fast_assign_btrfs(btrfs_sb(inode->i_sb),
-   __entry->ino= inode->i_ino;
+   __entry->ino= btrfs_ino(BTRFS_I(inode));
__entry->file_offset= ordered->file_offset;
__entry->start  = ordered->start;
__entry->len= ordered->len;
@@ -528,7 +528,7 @@ DECLARE_EVENT_CLASS(btrfs__writepage,
TP_ARGS(page, inode, wbc),
 
TP_STRUCT__entry_btrfs(
-   __field(ino_t,  ino )
+   __field(u64,ino )
__field(pgoff_t,  index )
__field(long,   nr_to_write )
__field(long,   pages_skipped   )
@@ -542,7 +542,7 @@ DECLARE_EVENT_CLASS(btrfs__writepage,
),
 
TP_fast_assign_btrfs(btrfs_sb(inode->i_sb),
-   __entry->ino= inode->i_ino;
+   __entry->ino= btrfs_ino(BTRFS_I(inode));
__entry->index  = page->index;
__entry->nr_to_write= wbc->nr_to_write;
__entry->pages_skipped  = wbc->pages_skipped;
@@ -556,12 +556,12 @@ DECLARE_EVENT_CLASS(btrfs__writepage,
 BTRFS_I(inode)->root->root_key.objectid;
),
 
-   TP_printk_btrfs("root=%llu(%s) ino=%lu page_index=%lu "
+   TP_printk_btrfs("root=%llu(%s) ino=%llu page_index=%lu "
  "nr_to_write=%ld pages_skipped=%ld range_start=%llu "
  "range_end=%llu for_kupdate=%d "
  "for_reclaim=%d range_cyclic=%d writeback_index=%lu",
  show_root_type(__entry->root_objectid),
- (unsigned long)__entry->ino, __entry->index,
+ (unsigned long long)__entry->ino, __entry->index,
  __entry->nr_to_write, __entry->pages_skipped,
  __entry->range_start, __entry->range_end,
  __entry->for_kupdate,
@@ -584,7 +584,7 @@ TRACE_EVENT(btrfs_writepage_end_io_hook,
TP_ARGS(page, start, end, uptodate),
 
TP_STRUCT__entry_btrfs(
-   __field(ino_t,   ino)
+   __field(u64, ino)
__field(pgoff_t, index  )
__field(u64, start  )
 

[PATCH v2] fstests: generic test for fsync after fallocate

2018-04-09 Thread fdmanana
From: Filipe Manana 

Test that fsync operations preserve extents allocated with fallocate(2)
that are placed beyond a file's size.

This test is motivated by a bug found in btrfs where unwritten extents
beyond the inode's i_size were not preserved after a fsync and power
failure. The btrfs bug is fixed by the following patch for the linux
kernel:

 "Btrfs: fix loss of prealloc extents past i_size after fsync log replay"

Signed-off-by: Filipe Manana 
---

V2: Use different fiemap filter to not relly on written vs unwritten
differences in extents beyond eof, since has xfs a specific but
valid behaviour. Also check file sizes after the power failure
since the new fiemap filter now merges written and unwritten
extents.

 tests/generic/482 | 124 ++
 tests/generic/482.out |  13 ++
 tests/generic/group   |   1 +
 3 files changed, 138 insertions(+)
 create mode 100755 tests/generic/482
 create mode 100644 tests/generic/482.out

diff --git a/tests/generic/482 b/tests/generic/482
new file mode 100755
index ..a1947693
--- /dev/null
+++ b/tests/generic/482
@@ -0,0 +1,124 @@
+#! /bin/bash
+# FSQA Test No. 482
+#
+# Test that fsync operations preserve extents allocated with fallocate(2) that
+# are placed beyond a file's size.
+#
+#---
+#
+# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   _cleanup_flakey
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/dmflakey
+. ./common/punch
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_dm_target flakey
+_require_xfs_io_command "falloc" "-k"
+_require_xfs_io_command "fiemap"
+
+rm -f $seqres.full
+
+_scratch_mkfs >>$seqres.full 2>&1
+_require_metadata_journaling $SCRATCH_DEV
+_init_flakey
+_mount_flakey
+
+# Create our test files.
+$XFS_IO_PROG -f -c "pwrite -S 0xea 0 256K" $SCRATCH_MNT/foo >/dev/null
+
+# Create a file with many extents. We later want to shrink truncate it and
+# add a prealloc extent beyond its new size.
+for ((i = 1; i <= 500; i++)); do
+   offset=$(((i - 1) * 4 * 1024))
+   $XFS_IO_PROG -f -s -c "pwrite -S 0xcf $offset 4K" \
+   $SCRATCH_MNT/bar >/dev/null
+done
+
+# A file which already has a prealloc extent beyond its size.
+# The fsync done on it is motivated by differences in the btrfs implementation
+# of fsync (first fsync has different logic from subsequent fsyncs).
+$XFS_IO_PROG -f -c "pwrite -S 0xf1 0 256K" \
+-c "falloc -k 256K 768K" \
+-c "fsync" \
+$SCRATCH_MNT/baz >/dev/null
+
+# Make sure everything done so far is durably persisted.
+sync
+
+# Allocate an extent beyond the size of the first test file and fsync it.
+$XFS_IO_PROG -c "falloc -k 256K 1M"\
+-c "fsync" \
+$SCRATCH_MNT/foo
+
+# Do a shrinking truncate of our test file, add a prealloc extent to it after
+# its new size and fsync it.
+$XFS_IO_PROG -c "truncate 256K" \
+-c "falloc -k 256K 1M"\
+-c "fsync" \
+$SCRATCH_MNT/bar
+
+# Allocate another extent beyond the size of file baz.
+$XFS_IO_PROG -c "falloc -k 1M 2M"\
+-c "fsync" \
+$SCRATCH_MNT/baz
+
+# Simulate a power failure and mount the filesystem to check that the extents
+# previously allocated were not lost and the file sizes are correct.
+_flakey_drop_and_remount
+
+echo "File foo fiemap:"
+$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/foo | _filter_hole_fiemap
+echo "File foo size:"
+stat --format %s $SCRATCH_MNT/foo
+
+echo "File bar fiemap:"
+$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/bar | _filter_hole_fiemap
+echo "File bar size:"
+stat --format %s $SCRATCH_MNT/bar
+
+echo "File baz fiemap:"
+$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/baz | _filter_hole_fiemap
+echo "File baz size:"
+stat --format %s $SCRATCH_MNT/baz
+
+_unmount_flakey
+_cleanup_flakey
+
+status=0
+exi

Re: [PATCH v2] fstests: test btrfs fsync after hole punching with no-holes mode

2018-04-09 Thread Eryu Guan
On Sun, Apr 08, 2018 at 09:46:24AM +0100, Filipe Manana wrote:
> On Sun, Apr 8, 2018 at 8:46 AM, Eryu Guan  wrote:
> > On Wed, Mar 28, 2018 at 12:55:30PM +0100, fdman...@kernel.org wrote:
> >> From: Filipe Manana 
> >>
> >> Test that when we have the no-holes mode enabled and a specific metadata
> >> layout, if we punch a hole and fsync the file, at replay time the whole
> >> hole was preserved.
> >>
> >> This issue is fixed by the following btrfs patch for the linux kernel:
> >>
> >>   "Btrfs: fix fsync after hole punching when using no-holes feature"
> >>
> >> Signed-off-by: Filipe Manana 
> >> ---
> >>
> >> V2: Made the test work when selinux is enabled, and made it use direct IO
> >> writes to ensure 256K extents.
> >
> > Test fails with selinux enabled now on unpatched kernel. But I found
> > that, in my release testing, test still fails when testing with current
> > Linus tree (HEAD is 642e7fd23353, without selinux this time), which
> > should contain the mentioned fix. Does that mean the bug is not fully
> > fixed?
> 
> The bug is fully fixed. But HEAD 642e7fd23353 does not contain the
> fix. The current  linus' master has it.

I'll double check.. Thanks for the heads-up!

Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: let mkfs return nozero value on thin provision device

2018-04-09 Thread David Sterba
On Tue, Apr 03, 2018 at 04:39:45PM +0800, Su Yue wrote:
> when mkfs.btrfs on a thin provision device which has very small
> backing size and big virtual size, all code works well in
> mkfs.btrfs until close_ctree() is called.
> close_ctree() fails to sync device due to small backing size
> while closing devices.
> However, mkfs returns 0 in such situation which causes failure of
> xfstests generic/405.
> 
> So, let mkfs returns nonzero value if previous steps succeeded but
> close_ctree() failed.
> Then xfstests generic/405 passes now.
> 
> Signed-off-by: Su Yue 

Applied, thanks. Can you please write a test for that?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: remove unused function path_cat and path_cat3

2018-04-09 Thread David Sterba
On Mon, Apr 09, 2018 at 01:43:30PM +0800, Gu Jinxiang wrote:
> Since function path_cat and path_cat3 are not used anymore, remove them.

It's good to document when the functions where deprecated, 4.0, which
lets us check that the period was long enough for any users to adapt as
it's in the public interface.

Anyway, 16 releases should be enough. Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] btrfs-progs: Remove btrfs-debug-tree command

2018-04-09 Thread David Sterba
On Tue, Mar 27, 2018 at 10:45:43AM +0300, Nikolay Borisov wrote:
> There is already a replacement in the face of btrfs inspect-internal
> dump-tree. And this command is just a simple wrapper around it. Just
> remove it and adjust the show-blocks script to call the main btrfs
> binary to achieve the same effect.
> 
> Signed-off-by: Nikolay Borisov 
> ---
> 
> V2 (analogical changes to other patches so changelog is present only in the 
> first patch): 
>  * Remove tool reference from .gitignore and Documentation/Makefile.in
>  * Document that the standalone tools are removed in the STANDALONE section 
> in 
>  the documentation.

1-4 applied, thanks. I've added the exact versions when the tools were
deprecated as it's not the same one, references to the issue and removed
some Makefile leftovers.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Use more loose open ctree flags for dump-tree and restore

2018-04-09 Thread David Sterba
On Fri, Apr 06, 2018 at 02:39:19PM +0800, Qu Wenruo wrote:
> Corrupted extent tree (either the root node or leaf) can normally block
> us from open the fs.
> As normally open_ctree() has the following call chain:
> __open_ctree_fd()
> |- btrfs_setup_all_roots()
>|- btrfs_read_block_groups()
>   And we will search block group items in extent tree.
> 
> And considering how block group items are scattered around the whole
> extent tree, any error would block the fs from being mounted.
> 
> Fortunately, we already have OPEN_CTREE_NO_BLOCK_GROUPS flags to disable
> block group items search, which will not only allow us to open some
> fs, but also hugely speed up open time.
> 
> Currently dump-tree and btrfs-restore is ensured that they care nothing
> about block group items. So specify OPEN_CTREE_NO_BLOCK_GROUPS flag as
> default.
> 
> Also fix a typo where dump-tree is using OPEN_CTREE_FS_PARTIAL, which
> should be OPEN_CTREE_PARTIAL.
> This makes dump-tree do more check and can sometimes fail to open
> certain filesystems.

Oh, that's too easy to get wrong, we should rename one or the other.

> Reported-by: Christoph Anton Mitterer 
> Fixes: 8698a2b9ba89 ("btrfs-progs: Allow inspect dump-tree to show specified 
> tree block even some tree roots are corrupted")
> Signed-off-by: Qu Wenruo 
> ---
>  cmds-inspect-dump-tree.c | 4 +++-
>  cmds-restore.c   | 3 ++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
> index 7defb7164a49..8be976041543 100644
> --- a/cmds-inspect-dump-tree.c
> +++ b/cmds-inspect-dump-tree.c
> @@ -303,7 +303,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   int uuid_tree_only = 0;
>   int roots_only = 0;
>   int root_backups = 0;
> - unsigned open_ctree_flags = OPEN_CTREE_FS_PARTIAL;
> + /* Speed up open_ctree() and continue if extent tree is corrupted */
> + unsigned open_ctree_flags = OPEN_CTREE_PARTIAL |
> + OPEN_CTREE_NO_BLOCK_GROUPS;

We could consider that separatelly, whether to allow dumping a partially
created filesystem, ie. OPEN_CTREE_FS_PARTIAL added by a new option.
This would be really for debugging/testing purposes, but could be useful
when the filesystem is prefilled with files or if we enable the runtime
features and the final write fails for some reason.

>   u64 block_bytenr;
>   struct btrfs_root *tree_root_scan;
>   u64 tree_id = 0;
> diff --git a/cmds-restore.c b/cmds-restore.c
> index ade35f0f880f..b43bd2ac6502 100644
> --- a/cmds-restore.c
> +++ b/cmds-restore.c
> @@ -1282,7 +1282,8 @@ static struct btrfs_root *open_fs(const char *dev, u64 
> root_location,
>   for (i = super_mirror; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>   bytenr = btrfs_sb_offset(i);
>   fs_info = open_ctree_fs_info(dev, bytenr, root_location, 0,
> -  OPEN_CTREE_PARTIAL);
> +  OPEN_CTREE_PARTIAL |
> +  OPEN_CTREE_NO_BLOCK_GROUPS);
>   if (fs_info)
>   break;
>   fprintf(stderr, "Could not open root, trying backup super\n");

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Let function find_device to be consistent with kernel

2018-04-09 Thread David Sterba
On Mon, Apr 02, 2018 at 11:30:12AM +0800, Gu Jinxiang wrote:
> Make find_device to be consistent with kernel according
> 35c70103a528 ("btrfs: refactor find_device helper")
> 
> And, modify the compare condition from both devid and uuid to
> devid or devid and uuid according
> 8f18cf13396c ("Btrfs: Make the resizer work based on shrinking and growing 
> devices")
> 
> Signed-off-by: Gu Jinxiang 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Remove duplicate value-get for data_extents_scrubbed

2018-04-09 Thread David Sterba
On Mon, Apr 02, 2018 at 06:11:17PM +0800, Gu Jinxiang wrote:
> Get data_extents_scrubbed value for twice, since there is only
> one data_extents_scrubbed in struct btrfs_scrub_progress, remove
> the duplicate one.
> 
> Signed-off-by: Gu Jinxiang 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Use more loose open ctree flags for dump-tree and restore

2018-04-09 Thread Qu Wenruo


On 2018年04月09日 22:12, David Sterba wrote:
> On Fri, Apr 06, 2018 at 02:39:19PM +0800, Qu Wenruo wrote:
>> Corrupted extent tree (either the root node or leaf) can normally block
>> us from open the fs.
>> As normally open_ctree() has the following call chain:
>> __open_ctree_fd()
>> |- btrfs_setup_all_roots()
>>|- btrfs_read_block_groups()
>>   And we will search block group items in extent tree.
>>
>> And considering how block group items are scattered around the whole
>> extent tree, any error would block the fs from being mounted.
>>
>> Fortunately, we already have OPEN_CTREE_NO_BLOCK_GROUPS flags to disable
>> block group items search, which will not only allow us to open some
>> fs, but also hugely speed up open time.
>>
>> Currently dump-tree and btrfs-restore is ensured that they care nothing
>> about block group items. So specify OPEN_CTREE_NO_BLOCK_GROUPS flag as
>> default.
>>
>> Also fix a typo where dump-tree is using OPEN_CTREE_FS_PARTIAL, which
>> should be OPEN_CTREE_PARTIAL.
>> This makes dump-tree do more check and can sometimes fail to open
>> certain filesystems.
> 
> Oh, that's too easy to get wrong, we should rename one or the other.

Of course, I'll add extra patches for this.

> 
>> Reported-by: Christoph Anton Mitterer 
>> Fixes: 8698a2b9ba89 ("btrfs-progs: Allow inspect dump-tree to show specified 
>> tree block even some tree roots are corrupted")
>> Signed-off-by: Qu Wenruo 
>> ---
>>  cmds-inspect-dump-tree.c | 4 +++-
>>  cmds-restore.c   | 3 ++-
>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
>> index 7defb7164a49..8be976041543 100644
>> --- a/cmds-inspect-dump-tree.c
>> +++ b/cmds-inspect-dump-tree.c
>> @@ -303,7 +303,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>>  int uuid_tree_only = 0;
>>  int roots_only = 0;
>>  int root_backups = 0;
>> -unsigned open_ctree_flags = OPEN_CTREE_FS_PARTIAL;
>> +/* Speed up open_ctree() and continue if extent tree is corrupted */
>> +unsigned open_ctree_flags = OPEN_CTREE_PARTIAL |
>> +OPEN_CTREE_NO_BLOCK_GROUPS;
> 
> We could consider that separatelly, whether to allow dumping a partially
> created filesystem, ie. OPEN_CTREE_FS_PARTIAL added by a new option.

For dump-tree, there is really no limitation here.
As long as chunk tree is OK, we should be OK to do "-b" dump.

For "-t" or all tree (default) dump, with chunk tree opened only, we
could try our best to print any good trees.
So I don't get the point to read all other trees.

Thanks,
Qu

> This would be really for debugging/testing purposes, but could be useful
> when the filesystem is prefilled with files or if we enable the runtime
> features and the final write fails for some reason.
> 
>>  u64 block_bytenr;
>>  struct btrfs_root *tree_root_scan;
>>  u64 tree_id = 0;
>> diff --git a/cmds-restore.c b/cmds-restore.c
>> index ade35f0f880f..b43bd2ac6502 100644
>> --- a/cmds-restore.c
>> +++ b/cmds-restore.c
>> @@ -1282,7 +1282,8 @@ static struct btrfs_root *open_fs(const char *dev, u64 
>> root_location,
>>  for (i = super_mirror; i < BTRFS_SUPER_MIRROR_MAX; i++) {
>>  bytenr = btrfs_sb_offset(i);
>>  fs_info = open_ctree_fs_info(dev, bytenr, root_location, 0,
>> - OPEN_CTREE_PARTIAL);
>> + OPEN_CTREE_PARTIAL |
>> + OPEN_CTREE_NO_BLOCK_GROUPS);
>>  if (fs_info)
>>  break;
>>  fprintf(stderr, "Could not open root, trying backup super\n");
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs-progs: tests/fsck: Add test case to check if btrfs check can skip data csum verfication for metadata dump

2018-04-09 Thread David Sterba
On Tue, Apr 03, 2018 at 01:39:47PM +0800, Qu Wenruo wrote:
> Signed-off-by: Qu Wenruo 
> ---
>  .../031-metadatadump-check-data-csum/test.sh   | 32 
> ++
>  1 file changed, 32 insertions(+)
>  create mode 100755 tests/fsck-tests/031-metadatadump-check-data-csum/test.sh
> 
> diff --git a/tests/fsck-tests/031-metadatadump-check-data-csum/test.sh 
> b/tests/fsck-tests/031-metadatadump-check-data-csum/test.sh
> new file mode 100755
> index ..9f14c4122f4b
> --- /dev/null
> +++ b/tests/fsck-tests/031-metadatadump-check-data-csum/test.sh
> @@ -0,0 +1,32 @@
> +#!/bin/bash
> +# To check if "btrfs check" can detect metadata dump (restored by 
> btrfs-iamge)
> +# and ignore --check-data-csum option
> +
> +source "$TEST_TOP/common"
> +
> +check_prereq btrfs
> +check_prereq mkfs.btrfs
> +check_prereq btrfs-image
> +setup_root_helper
> +prepare_test_dev
> + 
> +tmp=$(mktemp --tmpdir btrfs-progs-fsck.XXX)

For a single temporary file I'd suggest not to use mktemp, so I'll remove
the line.

> +run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f "$TEST_DEV"
> +run_check_mount_test_dev
> +
> +run_check $SUDO_HELPER dd if=/dev/urandom of="$TEST_MNT/file" bs=4k count=16
> +run_check_umount_test_dev
> +
> +run_check $SUDO_HELPER "$TOP/btrfs-image" "$TEST_DEV" "$tmp.image"
> +
> +# use prepare_test_dev() to wipe all existing data on $TEST_DEV
> +# so there is no way that restored image could have mathcing data csum
> +prepare_test_dev
> +
> +run_check $SUDO_HELPER "$TOP/btrfs-image" -r "$tmp.image" "$TEST_DEV"
> +
> +# Should not report any error
> +run_check "$TOP/btrfs" check --check-data-csum "$TEST_DEV"
> +
> +rm -rf -- "$tmp*"
> -- 
> 2.16.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: check: Skip data csum verification for metadata dump

2018-04-09 Thread David Sterba
On Tue, Apr 03, 2018 at 01:39:46PM +0800, Qu Wenruo wrote:
> For metadata dump (fs restored by btrfs-image), since no data is
> restored check sum verification will definitely report error.
> 
> Add such check in check_csums() and prompt for user.
> 
> Issue: #103
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Do not add extra slash if given path end with it

2018-04-09 Thread David Sterba
On Thu, Mar 29, 2018 at 05:11:19PM +0800, Gu Jinxiang wrote:
> When use a given path end with a slash like below,
> the output of path will have double slash.
> 
> Do not add extra slash if there is already one in the given
> path.
> 
> $ btrfs filesystem du ./test/
> output:
> Total   Exclusive  Set shared  Filename
> 0.00B   0.00B   -  /home/gujx/device/tmp/test//foo
> 
> Signed-off-by: Gu Jinxiang 

Applied, thanks.

I was wondering if we should remove all trailing slashes, but I think
that the chance of that happening often is too low. The tab completion
will append only a single tab and that's about the case I'd care.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start

2018-04-09 Thread Anand Jain



On 04/09/2018 07:53 PM, David Sterba wrote:

On Mon, Apr 09, 2018 at 04:39:03PM +0800, Anand Jain wrote:



On 04/04/2018 02:34 AM, David Sterba wrote:

The volume mutex does not protect against anything in this case, the
comment about scrub is right but not related to locking and looks
confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
and confusing too.

The device_list_mutex is not held here to protect device lookup, but in
this case device replace cannot run in parallel with device removal (due
to exclusive op protection), so we don't need further locking here.


Agreed, usage of device_list_mutex and volume_mutex is kind of redundant.

There are unfinished features in btrfs volume, such as device offline
while it was mounted (patches in the ML).

It's better to replace volume_mutex with device_list_mutex instead of
removing it, as we might need it in the context mentioned above.


The device_list_mutex will not do anything useful here. It's taken in
contexts where we need to make sure the device list is locked for writes
or we don't want to let the device disappear (superblock write).

As dev-replace requires the device to exist throughout the whole
operation (src_device), the EXCL_OP bit is the correct protection and we
can't hold device_list_mutex throughout the whole
btrfs_dev_replace_start.

>

Or it's not a good idea to clean up until all the features are in place
otherwise we end up adding the locks again at some point.


The lock is redundant and no new code should depend on it. If you're
going to add a more fine grained locking of devices, then we can revisit
what's the best way to do it and I'm quite sure this will not require
re-adding the volume_mutex as it is used now.


Agreed.

Thanks, Anand



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/16] btrfs: kill btrfs_fs_info::volume_mutex

2018-04-09 Thread Anand Jain



On 04/04/2018 02:34 AM, David Sterba wrote:

Mutual exclusion of device add/rm and balance was done by the volume
mutex up to version 3.7. The commit 5ac00addc7ac091109 ("Btrfs: disallow
mutually exclusive admin operations from user mode") added a bit that
essentially tracked the same information.

The status bit has an advantage over a mutex that it can be set without
restrictions of function context, so it started to be used in the
mount-time resuming of balance or device replace.

But we don't really need to track the same information in two ways.

1) After the previous cleanups, the main ioctl handlers for
add/del/resize copy the EXCL_OP bit next to the volume mutex, here
it's clearly safe.

2) Resuming balance during mount or after rw remount will set only the
EXCL_OP bit and the volume_mutex is held in the kernel thread that
calls btrfs_balance.

3) Resuming device replace during mount or after rw remount is done
after balance and is excluded by the EXCL_OP bit. It does not take
the volume_mutex at all and completely relies on the EXCL_OP bit.

4) The resuming of balance and dev-replace cannot hapen at the same time
as the ioctls cannot be started in parallel. Nevertheless, a crafted
image could trigger that and a warning is printed.

5) Balance is normally excluded by EXCL_OP and also uses own mutex to
protect against concurrent access to its status data. There's some
trickery to maintain the right lock nesting in case we need to
reexamine the status in btrfs_ioctl_balance. The volume_mutex is
removed and the unlock/lock sequence is left in place as we might
expect other waiters to proceed.

6) Similar to 5, the unlock/lock sequence is kept in
btrfs_cancel_balance to allow waiters to continue. >
Signed-off-by: David Sterba 


 Thanks for cleaning volume_mutex.

 Reviewed-by: Anand Jain 

Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/16] btrfs: remove wrong use of volume_mutex from btrfs_dev_replace_start

2018-04-09 Thread Anand Jain



On 04/04/2018 02:34 AM, David Sterba wrote:

The volume mutex does not protect against anything in this case, the
comment about scrub is right but not related to locking and looks
confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
and confusing too.

The device_list_mutex is not held here to protect device lookup, but in
this case device replace cannot run in parallel with device removal (due
to exclusive op protection), so we don't need further locking here.

Signed-off-by: David Sterba 


 Reviewed-by: Anand Jain 

Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] btrfs-progs: extent buffer related refactor and cleanup

2018-04-09 Thread David Sterba
On Fri, Mar 30, 2018 at 01:48:52PM +0800, Qu Wenruo wrote:
> The patchset can be fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/eb_cleanup
> 
> Just like kernel cleanup and refactors, this patchset will embed
> btrfs_fs_info structure into extent_buffer.
> 
> And fixes several possible NULL pointer dereference (although not
> utilized in btrfs-progs yet).
> 
> Changelog:
> v2:
>   Embarrassingly, I forgot to install reiserfsprogs in my development
>   machine, so the 3rd patch lacks one call site in
>   convert/source-reiserfs.c.
> 
> Qu Wenruo (5):
>   btrfs-progs: extent_io: Fix NULL pointer dereference in
> free_extent_buffer_final()
>   btrfs-progs: extent_io: Init eb->lru to avoid NULL pointer dereference
>   btrfs-progs: extent_io: Refactor alloc_extent_buffer() to follow
> kernel parameters
>   btrfs-progs: Unify btrfs_leaf_free_psace() parameter with kernel
>   btrfs-progs: print-tree: Remove btrfs_root parameter

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Use more loose open ctree flags for dump-tree and restore

2018-04-09 Thread David Sterba
On Mon, Apr 09, 2018 at 10:29:34PM +0800, Qu Wenruo wrote:
> >> --- a/cmds-inspect-dump-tree.c
> >> +++ b/cmds-inspect-dump-tree.c
> >> @@ -303,7 +303,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
> >>int uuid_tree_only = 0;
> >>int roots_only = 0;
> >>int root_backups = 0;
> >> -  unsigned open_ctree_flags = OPEN_CTREE_FS_PARTIAL;
> >> +  /* Speed up open_ctree() and continue if extent tree is corrupted */
> >> +  unsigned open_ctree_flags = OPEN_CTREE_PARTIAL |
> >> +  OPEN_CTREE_NO_BLOCK_GROUPS;
> > 
> > We could consider that separatelly, whether to allow dumping a partially
> > created filesystem, ie. OPEN_CTREE_FS_PARTIAL added by a new option.
> 
> For dump-tree, there is really no limitation here.
> As long as chunk tree is OK, we should be OK to do "-b" dump.
> 
> For "-t" or all tree (default) dump, with chunk tree opened only, we
> could try our best to print any good trees.
> So I don't get the point to read all other trees.

Yeah, eg. a bad filesystem magic number does not stop dump-tree, so no
change is needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] btrfs: open code set_balance_control

2018-04-09 Thread Anand Jain



On 04/04/2018 02:34 AM, David Sterba wrote:

The helper is quite simple and I'd like to see the locking in the
caller.

Signed-off-by: David Sterba 


For the version at:
 git://github.com/kdave/btrfs-devel dev/remove-volume-mutex

Reviewed-by: Anand Jain 

Thanks, Anand


---
  fs/btrfs/volumes.c | 20 
  1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e93c7ba44d06..db9f72c6af85 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3178,21 +3178,6 @@ static void update_balance_args(struct 
btrfs_balance_control *bctl)
}
  }
  
-/*

- * Should be called with balance mutex held to protect against checking the
- * balance status or progress. Same goes for reset_balance_state.
- */
-static void set_balance_control(struct btrfs_balance_control *bctl)
-{
-   struct btrfs_fs_info *fs_info = bctl->fs_info;
-
-   BUG_ON(fs_info->balance_ctl);
-
-   spin_lock(&fs_info->balance_lock);
-   fs_info->balance_ctl = bctl;
-   spin_unlock(&fs_info->balance_lock);
-}
-
  /*
   * Clear the balance status in fs_info and delete the balance item from disk.
   */
@@ -3878,7 +3863,10 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
  
  	if (!(bctl->flags & BTRFS_BALANCE_RESUME)) {

BUG_ON(ret == -EEXIST);
-   set_balance_control(bctl);
+   BUG_ON(fs_info->balance_ctl);
+   spin_lock(&fs_info->balance_lock);
+   fs_info->balance_ctl = bctl;
+   spin_unlock(&fs_info->balance_lock);
} else {
BUG_ON(ret != -EEXIST);
spin_lock(&fs_info->balance_lock);


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: let mkfs return nozero value on thin provision device

2018-04-09 Thread Damenly Su
Sorry for the previous reply in HTML format which is not delivered to
the mail list.

I mean there is already xfstests generic/405 for the situation.
Thin provision devices need to call dmsetup to setup.
I wonder If calling dmsetup in progs tests is acceptable or not.
If it is, I will do it.

Thanks,
Su

On Mon, Apr 9, 2018 at 9:25 PM, David Sterba  wrote:
> On Tue, Apr 03, 2018 at 04:39:45PM +0800, Su Yue wrote:
>> when mkfs.btrfs on a thin provision device which has very small
>> backing size and big virtual size, all code works well in
>> mkfs.btrfs until close_ctree() is called.
>> close_ctree() fails to sync device due to small backing size
>> while closing devices.
>> However, mkfs returns 0 in such situation which causes failure of
>> xfstests generic/405.
>>
>> So, let mkfs returns nonzero value if previous steps succeeded but
>> close_ctree() failed.
>> Then xfstests generic/405 passes now.
>>
>> Signed-off-by: Su Yue 
>
> Applied, thanks. Can you please write a test for that?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: let mkfs return nozero value on thin provision device

2018-04-09 Thread David Sterba
On Mon, Apr 09, 2018 at 11:46:32PM +0800, Damenly Su wrote:
> Sorry for the previous reply in HTML format which is not delivered to
> the mail list.
> 
> I mean there is already xfstests generic/405 for the situation.
> Thin provision devices need to call dmsetup to setup.
> I wonder If calling dmsetup in progs tests is acceptable or not.
> If it is, I will do it.

Yes dmsetup is already used eg in test
mkfs-tests/005-long-device-name-for-ssd .
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs-progs: dump-super: Refactor print function and add extra check

2018-04-09 Thread Goffredo Baroncelli
Hi Qu,

On 04/09/2018 11:19 AM, Qu Wenruo wrote:
> When manually patching super blocks, current validation check is pretty
> weak (limited to magic number and csum) and doesn't provide extra check
> for some obvious corruption (like invalid sectorsize).
[...]
> 
> Signed-off-by: Qu Wenruo 
> ---
> changelog:
> v2:
>   Generate tab based indent string in helper macro instead of passing spacer
>   string from outside, suggested by Nikolay.
>   (In fact, if using %*s it would be much easier, however it needs extra
>rework for existing code as they still use tab as indent)
> ---
>  cmds-inspect-dump-super.c | 206 +-
>  1 file changed, 137 insertions(+), 69 deletions(-)
> 
> diff --git a/cmds-inspect-dump-super.c b/cmds-inspect-dump-super.c
> index 24588c87cce6..68d6f59ad727 100644
> --- a/cmds-inspect-dump-super.c
> +++ b/cmds-inspect-dump-super.c
> @@ -312,11 +312,80 @@ static void print_readable_super_flag(u64 flag)
>super_flags_num, BTRFS_SUPER_FLAG_SUPP);
>  }
>  
> +#define INDENT_BUFFER_LEN80
> +#define TAB_LEN  8
> +#define SUPER_INDENT 24
> +
> +/* helper to generate tab based indent string */
> +static void generate_tab_indent(char *buf, unsigned int indent)
> +{
> + buf[0] = '\0';
> + for (indent = round_up(indent, TAB_LEN); indent > 0; indent -= TAB_LEN)
> + strncat(buf, "\t", INDENT_BUFFER_LEN);
> +}
> +
> +/* Helper to print member in %llu */
> +#define print_super(sb, member)  
> \
> +({   \
> + int indent = SUPER_INDENT - strlen(#member);\
> + char indent_str[INDENT_BUFFER_LEN]; \
> + \
> + generate_tab_indent(indent_str, indent);\
> + printf("%s%s%llu\n", #member, indent_str,   \
> +(u64) btrfs_super_##member(sb)); \

Why not something like:

static const char tabs[] = "\t\t\t\t";\
int i = (sizeof(#member)+6)/8;\
if (i > sizeof(tabs)-1)   \
i = sizeof(tabs-1);   \
u64 value = (u64)btrfs_super_##member(sb);\
printf("%s%s" format, \
#member, tabs+i, value);  


so to get rid  of generate_tab_indent and indent_str

> +})
> +
> +/* Helper to print member in %llx */
> +#define print_super_hex(sb, member)  \
> +({   \
> + int indent = SUPER_INDENT - strlen(#member);\
> + char indent_str[INDENT_BUFFER_LEN]; \
> + \
> + generate_tab_indent(indent_str, indent);\
> + printf("%s%s0x%llx\n", #member, indent_str, \
> +(u64) btrfs_super_##member(sb)); \
> +})
> +
> +/*
> + * Helper macro to wrap member checker
> + *
> + * Only support %llu output for member.
> + */
> +#define print_check_super(sb, member, bad_condition) \
> +({   \
> + int indent = SUPER_INDENT - strlen(#member);\
> + char indent_str[INDENT_BUFFER_LEN]; \
> + u64 value;  \
> + \
> + generate_tab_indent(indent_str, indent);\
> + value = btrfs_super_##member(sb);   \
> + printf("%s%s%llu", #member, indent_str, (u64) value);   \
> + if (bad_condition)  \
> + printf(" [INVALID]");   \

What about printing also the condition: something like

+   if (bad_condition)  \
+   printf(" [INVALID '%s']", #bad_condition);  \

it would be even better a "good_condition":

so instead of:
+   print_check_super(sb, bytenr, (!IS_ALIGNED(value, SZ_4K)));
do
+   print_check_super(sb, bytenr, (IS_ALIGNED(value, SZ_4K)));

and

+   if (!good_condition)\
+   printf(" [ERROR: required '%s']", #good_condition); \


All above functions could be written as:

  #define __print_and_check(sb, member, format, expected)   \
do{   \
static const char tabs[] = "\t\t\t\t";\
int i = (sizeof(#member)+6)/8; 

Re: [PATCH] Btrfs: do not abort transaction when failing to insert hole extent

2018-04-09 Thread Liu Bo
On Fri, Apr 6, 2018 at 10:43 AM, Liu Bo  wrote:
> On Fri, Apr 6, 2018 at 6:21 AM, David Sterba  wrote:
>> On Thu, Apr 05, 2018 at 11:58:16AM -0700, Liu Bo wrote:
>>> On Thu, Apr 5, 2018 at 9:48 AM, David Sterba  wrote:
>>> > On Sat, Mar 31, 2018 at 06:11:55AM +0800, Liu Bo wrote:
>>> >> This is running in a typical write path, not inside a critical path
>>> >> where we have to abort the running transaction, so it's OK to return
>>> >> errors to callers and eventually to userspace.
>>> >
>>> > I'm not sure this is entierly correct, several other places do not abort
>>> > after btrfs_drop_extents as there's nothing that would leave the
>>> > structres in some half-state.
>>> >
>>> >> Signed-off-by: Liu Bo 
>>> >> ---
>>> >>  fs/btrfs/inode.c | 5 +
>>> >>  1 file changed, 1 insertion(+), 4 deletions(-)
>>> >>
>>> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> >> index c7b75dd..b9310f8 100644
>>> >> --- a/fs/btrfs/inode.c
>>> >> +++ b/fs/btrfs/inode.c
>>> >> @@ -4939,16 +4939,13 @@ static int maybe_insert_hole(struct btrfs_root 
>>> >> *root, struct inode *inode,
>>> >>
>>> >>   ret = btrfs_drop_extents(trans, root, inode, offset, offset + len, 
>>> >> 1);
>>> >>   if (ret) {
>>> >> - btrfs_abort_transaction(trans, ret);
>>> >>   btrfs_end_transaction(trans);
>>> >>   return ret;
>>> >>   }
>>> >>
>>> >>   ret = btrfs_insert_file_extent(trans, root, 
>>> >> btrfs_ino(BTRFS_I(inode)),
>>> >>   offset, 0, 0, len, 0, len, 0, 0, 0);
>>> >
>>> > But here the extents have been already dropped and missing to insert the
>>> > items does not seem to lead to a consistent state.
>>> >
>>> > It's possible that I'm missing something. In a call path that can be
>>> > safely rolled back even with a started transaction, we don't need to
>>> > abort in all cases. But if the rollback requires some non-trivial
>>> > modifications, I don't see options how to avoid the abort.
>>> >
>>> > __btrfs_drop_extents does a lot of state changes and can itself fail
>>> > in the middle of dropping the range, aborting looks like the safest
>>> > option.
>>> >
>>>
>>> As maybe_insert_hole is only called by btrfs_cont_expand here, which
>>> means it's a really hole, I don't expect drop_extents would drop
>>> anything, we can remove this drop_extents and put an assert after
>>> btrfs_insert_file_extent for checking EEXIST.
>>
>> Sounds good.
>>
>
> Let me make a v2 and have a fstests run.
>

It turns out that the btrfs_drop_extents() here is quite necessary
since fallocate(2) has a FALLOC_FL_KEEP_SIZE option, and when that
happens, a hole extent would be appended between the EOF and fallocate
range's start, then a later truncate up would have to drop these hole
extents in order to expand with a new hole...

As I don't see a way to gracefully solve this except keeping
drop_extents(), lets drop this patch instead.

thanks,
liubo

> thanks,
> liubo
>
>>> It's different from punch hole where we need to explicitly drop an
>>> actual extent and replace it with a hole range.
>>
>> Right, that's what I didn't see at first.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Use more loose open ctree flags for dump-tree and restore

2018-04-09 Thread Qu Wenruo


On 2018年04月09日 23:05, David Sterba wrote:
> On Mon, Apr 09, 2018 at 10:29:34PM +0800, Qu Wenruo wrote:
 --- a/cmds-inspect-dump-tree.c
 +++ b/cmds-inspect-dump-tree.c
 @@ -303,7 +303,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
int uuid_tree_only = 0;
int roots_only = 0;
int root_backups = 0;
 -  unsigned open_ctree_flags = OPEN_CTREE_FS_PARTIAL;
 +  /* Speed up open_ctree() and continue if extent tree is corrupted */
 +  unsigned open_ctree_flags = OPEN_CTREE_PARTIAL |
 +  OPEN_CTREE_NO_BLOCK_GROUPS;
>>>
>>> We could consider that separatelly, whether to allow dumping a partially
>>> created filesystem, ie. OPEN_CTREE_FS_PARTIAL added by a new option.
>>
>> For dump-tree, there is really no limitation here.
>> As long as chunk tree is OK, we should be OK to do "-b" dump.
>>
>> For "-t" or all tree (default) dump, with chunk tree opened only, we
>> could try our best to print any good trees.
>> So I don't get the point to read all other trees.
> 
> Yeah, eg. a bad filesystem magic number does not stop dump-tree, so no
> change is needed.

Bad magic will still stop dump-tree, super block check is not affected
by open ctree flags.
The only way to block dump-tree is corrupted chunk tree, which mostly
makes the fs garbage.

Even root tree is corrupted, we can still specify bytenr manually to
salvage what we need.

The point here is still the same, for read-only operations where we
don't care about the cross-reference, there is no need to care about
extent tree at all, and skip block group item search could save us a lot
of time.

Thanks,
Qu
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs-progs: dump-super: Refactor print function and add extra check

2018-04-09 Thread Qu Wenruo


On 2018年04月10日 05:50, Goffredo Baroncelli wrote:
> Hi Qu,
> 
> On 04/09/2018 11:19 AM, Qu Wenruo wrote:
>> When manually patching super blocks, current validation check is pretty
>> weak (limited to magic number and csum) and doesn't provide extra check
>> for some obvious corruption (like invalid sectorsize).
> [...]
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>> changelog:
>> v2:
>>   Generate tab based indent string in helper macro instead of passing spacer
>>   string from outside, suggested by Nikolay.
>>   (In fact, if using %*s it would be much easier, however it needs extra
>>rework for existing code as they still use tab as indent)
>> ---
>>  cmds-inspect-dump-super.c | 206 +-
>>  1 file changed, 137 insertions(+), 69 deletions(-)
>>
>> diff --git a/cmds-inspect-dump-super.c b/cmds-inspect-dump-super.c
>> index 24588c87cce6..68d6f59ad727 100644
>> --- a/cmds-inspect-dump-super.c
>> +++ b/cmds-inspect-dump-super.c
>> @@ -312,11 +312,80 @@ static void print_readable_super_flag(u64 flag)
>>   super_flags_num, BTRFS_SUPER_FLAG_SUPP);
>>  }
>>  
>> +#define INDENT_BUFFER_LEN   80
>> +#define TAB_LEN 8
>> +#define SUPER_INDENT24
>> +
>> +/* helper to generate tab based indent string */
>> +static void generate_tab_indent(char *buf, unsigned int indent)
>> +{
>> +buf[0] = '\0';
>> +for (indent = round_up(indent, TAB_LEN); indent > 0; indent -= TAB_LEN)
>> +strncat(buf, "\t", INDENT_BUFFER_LEN);
>> +}
>> +
>> +/* Helper to print member in %llu */
>> +#define print_super(sb, member) 
>> \
>> +({  \
>> +int indent = SUPER_INDENT - strlen(#member);\
>> +char indent_str[INDENT_BUFFER_LEN]; \
>> +\
>> +generate_tab_indent(indent_str, indent);\
>> +printf("%s%s%llu\n", #member, indent_str,   \
>> +   (u64) btrfs_super_##member(sb)); \
> 
> Why not something like:
> 
> static const char tabs[] = "\t\t\t\t";\
> int i = (sizeof(#member)+6)/8;\
> if (i > sizeof(tabs)-1)   \
> i = sizeof(tabs-1);   \
> u64 value = (u64)btrfs_super_##member(sb);\
> printf("%s%s" format, \
> #member, tabs+i, value);  
> 
> 
> so to get rid  of generate_tab_indent and indent_str

And we need to call such functions in each helper macros, with
duplicated codes.

> 
>> +})
>> +
>> +/* Helper to print member in %llx */
>> +#define print_super_hex(sb, member) \
>> +({  \
>> +int indent = SUPER_INDENT - strlen(#member);\
>> +char indent_str[INDENT_BUFFER_LEN]; \
>> +\
>> +generate_tab_indent(indent_str, indent);\
>> +printf("%s%s0x%llx\n", #member, indent_str, \
>> +   (u64) btrfs_super_##member(sb)); \
>> +})
>> +
>> +/*
>> + * Helper macro to wrap member checker
>> + *
>> + * Only support %llu output for member.
>> + */
>> +#define print_check_super(sb, member, bad_condition)
>> \
>> +({  \
>> +int indent = SUPER_INDENT - strlen(#member);\
>> +char indent_str[INDENT_BUFFER_LEN]; \
>> +u64 value;  \
>> +\
>> +generate_tab_indent(indent_str, indent);\
>> +value = btrfs_super_##member(sb);   \
>> +printf("%s%s%llu", #member, indent_str, (u64) value);   \
>> +if (bad_condition)  \
>> +printf(" [INVALID]");   \
> 
> What about printing also the condition: something like
> 
> + if (bad_condition)  \
> + printf(" [INVALID '%s']", #bad_condition);  \
> 
> it would be even better a "good_condition":

When passing random stream to dump-super, such reason will make output
quite nasty.
So just INVALID to info the user that some of the members don't look
valid is good enough, as the tool is only to help guys who are going to
manually patching superblocks.

Thanks,
Qu

> 
> so instead of:
> + print_check_super(sb, bytenr, (!IS_ALIGNED(value, SZ

[PATCH] btrfs-progs: remove meaningless field write of scrub status file

2018-04-09 Thread Gu Jinxiang
Since ret must be 0 when goes to scrub status file's write,
so scrub_write_buf(fd, buf, ret) writes nothing.
And when I check the process for read scrub status file,
there is no process for this field.
So, remove it.

Signed-off-by: Gu Jinxiang 
---
 cmds-scrub.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/cmds-scrub.c b/cmds-scrub.c
index 8cf1e5df..efd7db94 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -753,7 +753,6 @@ static int scrub_write_file(int fd, const char *fsid,
scrub_write_buf(fd, ":", 1) ||
scrub_writev(fd, buf, sizeof(buf), "%lld",
use->scrub_args.devid) ||
-   scrub_write_buf(fd, buf, ret) ||
_SCRUB_KVWRITE(fd, buf, data_extents_scrubbed, use) ||
_SCRUB_KVWRITE(fd, buf, tree_extents_scrubbed, use) ||
_SCRUB_KVWRITE(fd, buf, data_bytes_scrubbed, use) ||
-- 
2.14.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] btrfs-progs: ctree: Don't return error if first key mismatch on leaf

2018-04-09 Thread Qu Wenruo
When running btrfs/074 with enough CPUs, it's possible (around 1%) to
hit first key mismatch warning.

Most of such warning happens in split_leaf() where it can ignore error
from push_leaf_right(), some of such problem also happens in
btrfs_search_forward().

Since it could break normal tree operations, at least keep the warning
for debug build but change its return to 0 for leaf, so we won't break
normal tree operations.

And, any suggest/advice on this tree locking problem is welcome.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bb38f4098e9c..cf58e9babe34 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -442,6 +442,7 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
btrfs_err(fs_info,
 "tree level mismatch detected, bytenr=%llu level expected=%u has=%u",
  eb->start, level, found_level);
+   btrfs_print_tree(eb, false);
 #endif
return -EIO;
}
@@ -463,8 +464,16 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
  eb->start, first_key->objectid, first_key->type,
  first_key->offset, found_key.objectid,
  found_key.type, found_key.offset);
+   btrfs_print_tree(eb, false);
}
 #endif
+   /*
+* XXX: We seems to have locking problem for leaf, where we have race
+* updating parent nodeptr's first key.
+* So if this eb is leaf and first key doesn't match, still let it go.
+*/
+   if (level == 0 && ret)
+   ret = 0;
return ret;
 }
 
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html