Re: [PATCH] btrfs: Remove WARN_ON for unaligned device created before v4.13 and adds more user friendly output

2017-09-22 Thread Satoru Takeuchi
At Sat, 23 Sep 2017 10:19:26 +0900,
Satoru Takeuchi wrote:
> 
> At Wed, 20 Sep 2017 15:18:43 +0900,
> Qu Wenruo wrote:
> > 
> > Commit 7dfb8be11b5d ("btrfs: Round down values which are written for
> > total_bytes_size") is fixing the unaligned device size caused by
> > adding/shrinking device.
> > 
> > It added a new WARN_ON() when device size is unaligned.
> > This is fine for new device added to btrfs using v4.13 kernel, but not
> > existing device whose total_bytes is already unaligned.
> > 
> > And the WARN_ON() will get triggered every time a block group get
> > created/removed on the unaligned device.
> > 
> > This patch will remove the WARN_ON(), and warn user more gently what's
> > happening and how to fix it.
> > 
> > Reported-by: Rich Rauenzahn 
> > Fixes: 7dfb8be11b5d ("btrfs: Round down values which are written for
> > total_bytes_size")
> > Signed-off-by: Qu Wenruo 
> > ---
> >  fs/btrfs/ctree.h   |  1 -
> >  fs/btrfs/volumes.c | 16 
> >  2 files changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 5a8933da39a7..4de9269e435a 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -1562,7 +1562,6 @@ static inline void 
> > btrfs_set_device_total_bytes(struct extent_buffer *eb,
> >  {
> > BUILD_BUG_ON(sizeof(u64) !=
> >  sizeof(((struct btrfs_dev_item *)0))->total_bytes);
> > -   WARN_ON(!IS_ALIGNED(val, eb->fs_info->sectorsize));
> > btrfs_set_64(eb, s, offsetof(struct btrfs_dev_item, total_bytes), val);
> >  }
> >  
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 0e8f16c305df..afae25df6a8c 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -6472,15 +6472,23 @@ static int read_one_chunk(struct btrfs_fs_info 
> > *fs_info, struct btrfs_key *key,
> > return 0;
> >  }
> >  
> > -static void fill_device_from_item(struct extent_buffer *leaf,
> > -struct btrfs_dev_item *dev_item,
> > -struct btrfs_device *device)
> > +static void fill_device_from_item(struct btrfs_fs_info *fs_info,
> > + struct extent_buffer *leaf,
> > + struct btrfs_dev_item *dev_item,
> > + struct btrfs_device *device)
> >  {
> > unsigned long ptr;
> >  
> > device->devid = btrfs_device_id(leaf, dev_item);
> > device->disk_total_bytes = btrfs_device_total_bytes(leaf, dev_item);
> > device->total_bytes = device->disk_total_bytes;
> > +   if (!IS_ALIGNED(device->total_bytes, fs_info->sectorsize)) {
> > +   btrfs_warn(fs_info,
> > +  "devid %llu has unaligned total bytes %llu",
> > +  device->devid, device->disk_total_bytes);
> > +   btrfs_warn(fs_info,
> > +  "please shrink the device a little and resize back 
> > to fix it");
> > +   }
> 
> How about telling uses to know device->total_bytes should be alligned
> to fs_info->sectorsize here?
> 
> Thanks,

I should make my comment clearer, sorry.

===
+   if (!IS_ALIGNED(device->total_bytes, fs_info->sectorsize)) {
+   btrfs_warn(fs_info,
+  "devid %llu: total bytes %llu should be aligned to 
%u bytes",
+  device->devid, device->disk_total_bytes, 
fs_info->sectorsize);
+   btrfs_warn(fs_info,
+  "please shrink the device a little and resize back 
to fix it");
+   }
===

Thanks,
Satoru

> Satoru
> 
> > device->commit_total_bytes = device->disk_total_bytes;
> > device->bytes_used = btrfs_device_bytes_used(leaf, dev_item);
> > device->commit_bytes_used = device->bytes_used;
> > @@ -6625,7 +6633,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
> > return -EINVAL;
> > }
> >  
> > -   fill_device_from_item(leaf, dev_item, device);
> > +   fill_device_from_item(fs_info, leaf, dev_item, device);
> > device->in_fs_metadata = 1;
> > if (device->writeable && !device->is_tgtdev_for_dev_replace) {
> > device->fs_devices->total_rw_bytes += device->total_bytes;
> > -- 
> > 2.14.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Remove WARN_ON for unaligned device created before v4.13 and adds more user friendly output

2017-09-22 Thread Satoru Takeuchi
At Wed, 20 Sep 2017 15:18:43 +0900,
Qu Wenruo wrote:
> 
> Commit 7dfb8be11b5d ("btrfs: Round down values which are written for
> total_bytes_size") is fixing the unaligned device size caused by
> adding/shrinking device.
> 
> It added a new WARN_ON() when device size is unaligned.
> This is fine for new device added to btrfs using v4.13 kernel, but not
> existing device whose total_bytes is already unaligned.
> 
> And the WARN_ON() will get triggered every time a block group get
> created/removed on the unaligned device.
> 
> This patch will remove the WARN_ON(), and warn user more gently what's
> happening and how to fix it.
> 
> Reported-by: Rich Rauenzahn 
> Fixes: 7dfb8be11b5d ("btrfs: Round down values which are written for
> total_bytes_size")
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/ctree.h   |  1 -
>  fs/btrfs/volumes.c | 16 
>  2 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 5a8933da39a7..4de9269e435a 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1562,7 +1562,6 @@ static inline void btrfs_set_device_total_bytes(struct 
> extent_buffer *eb,
>  {
>   BUILD_BUG_ON(sizeof(u64) !=
>sizeof(((struct btrfs_dev_item *)0))->total_bytes);
> - WARN_ON(!IS_ALIGNED(val, eb->fs_info->sectorsize));
>   btrfs_set_64(eb, s, offsetof(struct btrfs_dev_item, total_bytes), val);
>  }
>  
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 0e8f16c305df..afae25df6a8c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6472,15 +6472,23 @@ static int read_one_chunk(struct btrfs_fs_info 
> *fs_info, struct btrfs_key *key,
>   return 0;
>  }
>  
> -static void fill_device_from_item(struct extent_buffer *leaf,
> -  struct btrfs_dev_item *dev_item,
> -  struct btrfs_device *device)
> +static void fill_device_from_item(struct btrfs_fs_info *fs_info,
> +   struct extent_buffer *leaf,
> +   struct btrfs_dev_item *dev_item,
> +   struct btrfs_device *device)
>  {
>   unsigned long ptr;
>  
>   device->devid = btrfs_device_id(leaf, dev_item);
>   device->disk_total_bytes = btrfs_device_total_bytes(leaf, dev_item);
>   device->total_bytes = device->disk_total_bytes;
> + if (!IS_ALIGNED(device->total_bytes, fs_info->sectorsize)) {
> + btrfs_warn(fs_info,
> +"devid %llu has unaligned total bytes %llu",
> +device->devid, device->disk_total_bytes);
> + btrfs_warn(fs_info,
> +"please shrink the device a little and resize back 
> to fix it");
> + }

How about telling uses to know device->total_bytes should be alligned
to fs_info->sectorsize here?

Thanks,
Satoru

>   device->commit_total_bytes = device->disk_total_bytes;
>   device->bytes_used = btrfs_device_bytes_used(leaf, dev_item);
>   device->commit_bytes_used = device->bytes_used;
> @@ -6625,7 +6633,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
>   return -EINVAL;
>   }
>  
> - fill_device_from_item(leaf, dev_item, device);
> + fill_device_from_item(fs_info, leaf, dev_item, device);
>   device->in_fs_metadata = 1;
>   if (device->writeable && !device->is_tgtdev_for_dev_replace) {
>   device->fs_devices->total_rw_bytes += device->total_bytes;
> -- 
> 2.14.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use self-explaining variable

2017-09-22 Thread Qu Wenruo



On 2017年09月23日 08:48, Liu Bo wrote:

On Sat, Sep 23, 2017 at 08:46:55AM +0800, Qu Wenruo wrote:



On 2017年09月23日 07:36, Liu Bo wrote:

This uses a bool 'do_backup' to help understand this piece of code.

Signed-off-by: Liu Bo 
---
This is based on a patch "Btrfs: do not backup tree roots when fsync".

   fs/btrfs/disk-io.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index cdb7043..9811b9d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3691,6 +3691,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors)
int max_errors;
int total_errors = 0;
u64 flags;
+   bool do_backup = (max_mirrors == 0);


Why not replacing @max_mirrors with @do_backup as parameter?


If I read the code correctly, max_mirrors is not just for deciding
backup.


That's strange.

write_all_supers() only uses @max_mirrors by passing it to 
write_dev_supers() and wait_dev_supers().


Both of the write/wait_dev_supers() will replace @max_mirrors to 
BTRFS_SUPER_MIRROR_MAX if it's zero.


Further more, all write_all_supers() callers just pass @max_mirrors as 
bool (either 0 or 1).


So I don't see any point not replacing the parameter as bool.

Thanks,
Qu


thanks,

-liubo



Thanks,
Qu

do_barriers = !btrfs_test_opt(fs_info, NOBARRIER);
@@ -3699,7 +3700,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors)
 * not from fsync where the tree roots in fs_info have not
 * been consistent on disk.
 */
-   if (max_mirrors == 0)
+   if (do_backup)
backup_super_roots(fs_info);
sb = fs_info->super_for_commit;


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block

2017-09-22 Thread Liu Bo
On Sat, Sep 23, 2017 at 09:49:38AM +0900, Satoru Takeuchi wrote:
> At Tue, 19 Sep 2017 17:50:09 -0600,
> Liu Bo wrote:
> > 
> > This seems to be a leftover of commit cf8cddd38bab ("btrfs: don't
> > abuse REQ_OP_* flags for btrfs_map_block").
> > 
> > It should use btrfs_op() helper to provide one of 'enum btrfs_map_op'
> > types.
> > 
> > Signed-off-by: Liu Bo 
> 
> Reviewed-by: Satoru Takeuchi 
> 
> I checked all callers of the following functions and there is no leftover.
> 
> - btrfs_map_block
> - btrfs_map_sblock
> - __btrfs_map_block

I did check them, but thanks a lot for double checking.

thanks,
-liubo

> 
> Thanks,
> Satoru
> 
> > ---
> >  fs/btrfs/volumes.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index bd679bc..bd7b75d3 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -6229,7 +6229,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info 
> > *fs_info, struct bio *bio,
> > map_length = length;
> >  
> > btrfs_bio_counter_inc_blocked(fs_info);
> > -   ret = __btrfs_map_block(fs_info, bio_op(bio), logical,
> > +   ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical,
> > &map_length, &bbio, mirror_num, 1);
> > if (ret) {
> > btrfs_bio_counter_dec(fs_info);
> > -- 
> > 2.9.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use self-explaining variable

2017-09-22 Thread Liu Bo
On Sat, Sep 23, 2017 at 08:46:55AM +0800, Qu Wenruo wrote:
> 
> 
> On 2017年09月23日 07:36, Liu Bo wrote:
> > This uses a bool 'do_backup' to help understand this piece of code.
> > 
> > Signed-off-by: Liu Bo 
> > ---
> > This is based on a patch "Btrfs: do not backup tree roots when fsync".
> > 
> >   fs/btrfs/disk-io.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index cdb7043..9811b9d 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -3691,6 +3691,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, 
> > int max_mirrors)
> > int max_errors;
> > int total_errors = 0;
> > u64 flags;
> > +   bool do_backup = (max_mirrors == 0);
> 
> Why not replacing @max_mirrors with @do_backup as parameter?

If I read the code correctly, max_mirrors is not just for deciding
backup.

thanks,

-liubo

> 
> Thanks,
> Qu
> > do_barriers = !btrfs_test_opt(fs_info, NOBARRIER);
> > @@ -3699,7 +3700,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, 
> > int max_mirrors)
> >  * not from fsync where the tree roots in fs_info have not
> >  * been consistent on disk.
> >  */
> > -   if (max_mirrors == 0)
> > +   if (do_backup)
> > backup_super_roots(fs_info);
> > sb = fs_info->super_for_commit;
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block

2017-09-22 Thread Satoru Takeuchi
At Tue, 19 Sep 2017 17:50:09 -0600,
Liu Bo wrote:
> 
> This seems to be a leftover of commit cf8cddd38bab ("btrfs: don't
> abuse REQ_OP_* flags for btrfs_map_block").
> 
> It should use btrfs_op() helper to provide one of 'enum btrfs_map_op'
> types.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: Satoru Takeuchi 

I checked all callers of the following functions and there is no leftover.

- btrfs_map_block
- btrfs_map_sblock
- __btrfs_map_block

Thanks,
Satoru

> ---
>  fs/btrfs/volumes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index bd679bc..bd7b75d3 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6229,7 +6229,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info 
> *fs_info, struct bio *bio,
>   map_length = length;
>  
>   btrfs_bio_counter_inc_blocked(fs_info);
> - ret = __btrfs_map_block(fs_info, bio_op(bio), logical,
> + ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical,
>   &map_length, &bbio, mirror_num, 1);
>   if (ret) {
>   btrfs_bio_counter_dec(fs_info);
> -- 
> 2.9.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use self-explaining variable

2017-09-22 Thread Qu Wenruo



On 2017年09月23日 07:36, Liu Bo wrote:

This uses a bool 'do_backup' to help understand this piece of code.

Signed-off-by: Liu Bo 
---
This is based on a patch "Btrfs: do not backup tree roots when fsync".

  fs/btrfs/disk-io.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index cdb7043..9811b9d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3691,6 +3691,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors)
int max_errors;
int total_errors = 0;
u64 flags;
+   bool do_backup = (max_mirrors == 0);


Why not replacing @max_mirrors with @do_backup as parameter?

Thanks,
Qu
  
  	do_barriers = !btrfs_test_opt(fs_info, NOBARRIER);
  
@@ -3699,7 +3700,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)

 * not from fsync where the tree roots in fs_info have not
 * been consistent on disk.
 */
-   if (max_mirrors == 0)
+   if (do_backup)
backup_super_roots(fs_info);
  
  	sb = fs_info->super_for_commit;



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: use self-explaining variable

2017-09-22 Thread Liu Bo
This uses a bool 'do_backup' to help understand this piece of code.

Signed-off-by: Liu Bo 
---
This is based on a patch "Btrfs: do not backup tree roots when fsync".

 fs/btrfs/disk-io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index cdb7043..9811b9d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3691,6 +3691,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors)
int max_errors;
int total_errors = 0;
u64 flags;
+   bool do_backup = (max_mirrors == 0);
 
do_barriers = !btrfs_test_opt(fs_info, NOBARRIER);
 
@@ -3699,7 +3700,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors)
 * not from fsync where the tree roots in fs_info have not
 * been consistent on disk.
 */
-   if (max_mirrors == 0)
+   if (do_backup)
backup_super_roots(fs_info);
 
sb = fs_info->super_for_commit;
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] fstests: btrfs/150 regression test for reading compressed data

2017-09-22 Thread Liu Bo
We had a bug in btrfs compression code which could end up with a
kernel panic.

This is adding a regression test for the bug and I've also sent a
kernel patch to fix the bug.

The patch is "Btrfs: fix kernel oops while reading compressed data".

Signed-off-by: Liu Bo 
---
v2: - Fix ambiguous copyright.
- Use /proc/$pid/make-it-fail to specify IO failure
- Use bash -c to run test only when pid is odd.
- Add test to dangerous group.

 tests/btrfs/150 | 103 
 tests/btrfs/150.out |   3 ++
 tests/btrfs/group   |   1 +
 3 files changed, 107 insertions(+)
 create mode 100755 tests/btrfs/150
 create mode 100644 tests/btrfs/150.out

diff --git a/tests/btrfs/150 b/tests/btrfs/150
new file mode 100755
index 000..8891c38
--- /dev/null
+++ b/tests/btrfs/150
@@ -0,0 +1,103 @@
+#! /bin/bash
+# FS QA Test btrfs/150
+#
+# This is a regression test which ends up with a kernel oops in btrfs.
+# It occurs when btrfs's read repair happens while reading a compressed
+# extent.
+# The patch to fix it is
+#  Btrfs: fix kernel oops while reading compressed data
+#
+#---
+# Copyright (c) 2017 Oracle.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_fail_make_request
+_require_scratch_dev_pool 2 
+
+SYSFS_BDEV=`_sysfs_dev $SCRATCH_DEV`
+enable_io_failure()
+{
+echo 100 > $DEBUGFS_MNT/fail_make_request/probability
+echo 1000 > $DEBUGFS_MNT/fail_make_request/times
+echo 0 > $DEBUGFS_MNT/fail_make_request/verbose
+echo 1 > $SYSFS_BDEV/make-it-fail
+}
+
+disable_io_failure()
+{
+echo 0 > $SYSFS_BDEV/make-it-fail
+echo 0 > $DEBUGFS_MNT/fail_make_request/probability
+echo 0 > $DEBUGFS_MNT/fail_make_request/times
+}
+
+_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1
+
+# It doesn't matter which compression algorithm we use.
+_scratch_mount -ocompress
+
+# Create a file with all data being compressed
+$XFS_IO_PROG -f -c "pwrite -W 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
+
+# Raid1 consists of two copies and btrfs decides which copy to read by reader's
+# %pid.  Now we inject errors to copy #1 and copy #0 is good.  We want to read
+# the bad copy to trigger read-repair.
+while [[ -z $result ]]; do
+   # invalidate the page cache
+   $XFS_IO_PROG -f -c "fadvise -d 0 8K" $SCRATCH_MNT/foobar
+
+   enable_io_failure
+
+   result=$(bash -c "
+   if [ \$((\$\$ % 2)) == 1 ]; then
+   echo 1 > /proc/\$\$/make-it-fail
+   exec $XFS_IO_PROG -c \"pread 0 8K\" \$SCRATCH_MNT/foobar
+   fi")
+
+   disable_io_failure
+done
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/150.out b/tests/btrfs/150.out
new file mode 100644
index 000..c492c24
--- /dev/null
+++ b/tests/btrfs/150.out
@@ -0,0 +1,3 @@
+QA output created by 150
+wrote 8192/8192 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 70c3f05..e73bb1b 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -152,3 +152,4 @@
 147 auto quick send
 148 auto quick rw
 149 auto quick send compress
+150 auto quick dangerous
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-22 Thread Marc Joliet
Am Freitag, 22. September 2017, 13:22:52 CEST schrieb Austin S. Hemmelgarn:
> > I'm not sure where Firefox puts its cache, I only use it on very rare
> > occasions. But I think it's going to .cache/mozilla last time looked
> > at it.
> 
> I'm pretty sure that is correct.

FWIW, on my system Firefox's cache looks like this:

% du -hsc (find .cache/mozilla/firefox/ -type f) | wc -l
9008


   
% du -hsc (find .cache/mozilla/firefox/ -type f) | sort -h | tail
5,4M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
83CEC8ADA08D9A9658458AB872BE107A216E71C6
5,5M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
C60061B33D3BB91ED45951C922BAA1BB40022CB7
5,7M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
0900D9EA8E3222EB8690348C2482C69308B15A20
5,7M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
F8E90D121B884360E36BCB1735CC5A8B1B7A744B
5,8M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
903C4CD01ABD74E353C7484C6E21A053AAC5DCC2
6,1M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
3A0D4193B009700155811D14A28DBE38C37C0067
6,1M.cache/mozilla/firefox/cb236e4s.default-1464421886682/startupCache/
scriptCache-current.bin
6,5M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
304405168662C3624D57AF98A74345464F32A0DB
8,8M.cache/mozilla/firefox/ik7qsfwb.Temp/cache2/entries/
BD7CA4125B3AA87D6B16C987741F33C65DBFFFDD
427Minsgesamt

So lots of files, many of which are (I suppose) relatively large, but do not 
look "everything in one database" large to me.

(This is with Firefox 55.0.2.)

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] fstests: btrfs/150 regression test for reading compressed data

2017-09-22 Thread Liu Bo
On Thu, Sep 21, 2017 at 02:39:52PM +1000, Dave Chinner wrote:
> On Wed, Sep 20, 2017 at 05:52:43PM -0600, Liu Bo wrote:
> > We had a bug in btrfs compression code which could end up with a
> > kernel panic.
> > 
> > This is adding a regression test for the bug and I've also sent a
> > kernel patch to fix the bug.
> > 
> > The patch is "Btrfs: fix kernel oops while reading compressed data".
> > 
> > Signed-off-by: Liu Bo 
> > ---
> >  tests/btrfs/150 | 102 
> > 
> >  tests/btrfs/150.out |   3 ++
> >  tests/btrfs/group   |   1 +
> >  3 files changed, 106 insertions(+)
> >  create mode 100755 tests/btrfs/150
> >  create mode 100644 tests/btrfs/150.out
> > 
> > diff --git a/tests/btrfs/150 b/tests/btrfs/150
> > new file mode 100755
> > index 000..834be51
> > --- /dev/null
> > +++ b/tests/btrfs/150
> > @@ -0,0 +1,102 @@
> > +#! /bin/bash
> > +# FS QA Test btrfs/150
> > +#
> > +# This is a regression test which ends up with a kernel oops in btrfs.
> 
> group += dangerous

OK.

> 
> > +# It occurs when btrfs's read repair happens while reading a compressed
> > +# extent.
> > +# The patch for this is 
> > +# x
> 
> Incomplete?

Urr, thanks for pointing it out.

> 
> > +#
> > +#---
> > +# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> 
> You're signing off this patch an Oracle employee, but claiming
> personal copyright. Please clarify who owns the copyright - if it's
> your personal copyright then please sign off with a personal email
> address, not your employer's...
> 
> Also, I note that these recently added tests from you:
> 
> tests/btrfs/140:# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> tests/btrfs/141:# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> tests/btrfs/142:# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> tests/btrfs/143:# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> tests/generic/406:# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> 
> all have this same ambiguity - personal copyright with employer
> signoff in the commit. This definitely needs clarification and
> fixing if it is wrong
>

All right, will fix all of them (in a separate one).

> 
> > +disable_io_failure()
> > +{
> > +echo 0 > $SYSFS_BDEV/make-it-fail
> > +echo 0 > $DEBUGFS_MNT/fail_make_request/probability
> > +echo 0 > $DEBUGFS_MNT/fail_make_request/times
> > +}
> > +
> > +_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1
> > +
> > +# It doesn't matter which compression algorithm we use.
> > +_scratch_mount -ocompress
> > +
> > +# Create a file with all data being compressed
> > +$XFS_IO_PROG -f -c "pwrite -W 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
> 
> needs an fsync to reach disk.

'pwrite -W' has ensured that.

> 
> > +# Raid1 consists of two copies and btrfs decides which copy to read by 
> > reader's
> > +# %pid.  Now we inject errors to copy #1 and copy #0 is good.  We want to 
> > read
> > +# the bad copy to trigger read-repair.
> > +while true; do
> > +   disable_io_failure
> > +   # invalidate the page cache
> > +   $XFS_IO_PROG -f -c "fadvise -d 0 128K" $SCRATCH_MNT/foobar | 
> > _filter_xfs_io
> > +
> > +   enable_io_failure
> > +   od -x $SCRATCH_MNT/foobar > /dev/null &
> 
> why are you using od to read the data when the output is piped to
> dev/null? why not just xfs_io -c "pread 0 8k" ?

Oh yes, that's better, will do.

thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: btrfs/150 regression test for reading compressed data

2017-09-22 Thread Liu Bo
On Thu, Sep 21, 2017 at 03:03:45PM +0800, Lu Fengqi wrote:
> On Wed, Sep 20, 2017 at 05:52:43PM -0600, Liu Bo wrote:
> >We had a bug in btrfs compression code which could end up with a
> >kernel panic.
> >
> >This is adding a regression test for the bug and I've also sent a
> >kernel patch to fix the bug.
> >
> >The patch is "Btrfs: fix kernel oops while reading compressed data".
> >
> >Signed-off-by: Liu Bo 
> >---
> > tests/btrfs/150 | 102 
> > 
> > tests/btrfs/150.out |   3 ++
> > tests/btrfs/group   |   1 +
> > 3 files changed, 106 insertions(+)
> > create mode 100755 tests/btrfs/150
> > create mode 100644 tests/btrfs/150.out
> >
> >diff --git a/tests/btrfs/150 b/tests/btrfs/150
> >new file mode 100755
> >index 000..834be51
> >--- /dev/null
> >+++ b/tests/btrfs/150
> >@@ -0,0 +1,102 @@
> >+#! /bin/bash
> >+# FS QA Test btrfs/150
> >+#
> >+# This is a regression test which ends up with a kernel oops in btrfs.
> >+# It occurs when btrfs's read repair happens while reading a compressed
> >+# extent.
> >+# The patch for this is 
> >+# x
> >+#
> >+#---
> >+# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> >+#
> >+# This program is free software; you can redistribute it and/or
> >+# modify it under the terms of the GNU General Public License as
> >+# published by the Free Software Foundation.
> >+#
> >+# This program is distributed in the hope that it would be useful,
> >+# but WITHOUT ANY WARRANTY; without even the implied warranty of
> >+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >+# GNU General Public License for more details.
> >+#
> >+# You should have received a copy of the GNU General Public License
> >+# along with this program; if not, write the Free Software Foundation,
> >+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> >+#---
> >+#
> >+
> >+seq=`basename $0`
> >+seqres=$RESULT_DIR/$seq
> >+echo "QA output created by $seq"
> >+
> >+here=`pwd`
> >+tmp=/tmp/$$
> >+status=1# failure is the default!
> >+trap "_cleanup; exit \$status" 0 1 2 3 15
> >+
> >+_cleanup()
> >+{
> >+cd /
> >+rm -f $tmp.*
> >+}
> >+
> >+# get standard environment, filters and checks
> >+. ./common/rc
> >+. ./common/filter
> >+
> >+# remove previous $seqres.full before test
> >+rm -f $seqres.full
> >+
> >+# real QA test starts here
> >+
> >+# Modify as appropriate.
> >+_supported_fs btrfs
> >+_supported_os Linux
> >+_require_scratch
> >+_require_fail_make_request
> >+_require_scratch_dev_pool 2 
> >+
> >+SYSFS_BDEV=`_sysfs_dev $SCRATCH_DEV`
> >+enable_io_failure()
> >+{
> >+echo 100 > $DEBUGFS_MNT/fail_make_request/probability
> >+echo 1000 > $DEBUGFS_MNT/fail_make_request/times
> 
> What does 1000 mean? Enough failures?
> Why not set times to -1?

This was copied from another test, so I kept it as is.

As this test just submits a single 8K read after enabling fault
injection, 1000 is in fact as same as -1(no limit), I think 1000 is OK
to use.

thanks,
-liubo

> 
> >+echo 0 > $DEBUGFS_MNT/fail_make_request/verbose
> >+echo 1 > $SYSFS_BDEV/make-it-fail
> >+}
> >+
> >+disable_io_failure()
> >+{
> >+echo 0 > $SYSFS_BDEV/make-it-fail
> >+echo 0 > $DEBUGFS_MNT/fail_make_request/probability
> >+echo 0 > $DEBUGFS_MNT/fail_make_request/times
> >+}
> >+
> >+_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1
> >+
> >+# It doesn't matter which compression algorithm we use.
> >+_scratch_mount -ocompress
> >+
> >+# Create a file with all data being compressed
> >+$XFS_IO_PROG -f -c "pwrite -W 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
> >+
> >+# Raid1 consists of two copies and btrfs decides which copy to read by 
> >reader's
> >+# %pid.  Now we inject errors to copy #1 and copy #0 is good.  We want to 
> >read
> >+# the bad copy to trigger read-repair.
> >+while true; do
> >+disable_io_failure
> >+# invalidate the page cache
> >+$XFS_IO_PROG -f -c "fadvise -d 0 128K" $SCRATCH_MNT/foobar | 
> >_filter_xfs_io
> >+
> >+enable_io_failure
> >+od -x $SCRATCH_MNT/foobar > /dev/null &
> >+pid=$!
> >+wait
> >+[ $((pid % 2)) == 1 ] && break
> >+done
> >+
> >+disable_io_failure
> >+
> >+# success, all done
> >+status=0
> >+exit
> >diff --git a/tests/btrfs/150.out b/tests/btrfs/150.out
> >new file mode 100644
> >index 000..c492c24
> >--- /dev/null
> >+++ b/tests/btrfs/150.out
> >@@ -0,0 +1,3 @@
> >+QA output created by 150
> >+wrote 8192/8192 bytes at offset 0
> >+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >diff --git a/tests/btrfs/group b/tests/btrfs/group
> >index 70c3f05..b70a122 100644
> >--- a/tests/btrfs/group
> >+++ b/tests/btrfs/group
> >@@ -152,3 +152,4 @@
> > 147 auto quick send
> > 148 auto quick rw
> > 149 auto quick send compress
> >+150 auto quick
> >-- 
> >2.5.0
> >
> >-

[PATCH] Btrfs: fix memory leak in raid56

2017-09-22 Thread Liu Bo
The local bio_list may have pending bios when doing cleanup, it can
end up with memory leak if they don't get free'd.

Signed-off-by: Liu Bo 
---
 fs/btrfs/raid56.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 2cf6ba4..063a2a0 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1325,6 +1325,9 @@ static noinline void finish_rmw(struct btrfs_raid_bio 
*rbio)
 
 cleanup:
rbio_orig_end_io(rbio, BLK_STS_IOERR);
+
+   while ((bio = bio_list_pop(&bio_list)))
+   bio_put(bio);
 }
 
 /*
@@ -1580,6 +1583,10 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
 
 cleanup:
rbio_orig_end_io(rbio, BLK_STS_IOERR);
+
+   while ((bio = bio_list_pop(&bio_list)))
+   bio_put(bio);
+
return -EIO;
 
 finish:
@@ -2105,6 +2112,10 @@ static int __raid56_parity_recover(struct btrfs_raid_bio 
*rbio)
if (rbio->operation == BTRFS_RBIO_READ_REBUILD ||
rbio->operation == BTRFS_RBIO_REBUILD_MISSING)
rbio_orig_end_io(rbio, BLK_STS_IOERR);
+
+   while ((bio = bio_list_pop(&bio_list)))
+   bio_put(bio);
+
return -EIO;
 }
 
@@ -2452,6 +2463,9 @@ static noinline void finish_parity_scrub(struct 
btrfs_raid_bio *rbio,
 
 cleanup:
rbio_orig_end_io(rbio, BLK_STS_IOERR);
+
+   while ((bio = bio_list_pop(&bio_list)))
+   bio_put(bio);
 }
 
 static inline int is_data_stripe(struct btrfs_raid_bio *rbio, int stripe)
@@ -2561,12 +2575,12 @@ static void raid56_parity_scrub_stripe(struct 
btrfs_raid_bio *rbio)
int stripe;
struct bio *bio;
 
+   bio_list_init(&bio_list);
+
ret = alloc_rbio_essential_pages(rbio);
if (ret)
goto cleanup;
 
-   bio_list_init(&bio_list);
-
atomic_set(&rbio->error, 0);
/*
 * build a list of bios to read all the missing parts of this
@@ -2634,6 +2648,10 @@ static void raid56_parity_scrub_stripe(struct 
btrfs_raid_bio *rbio)
 
 cleanup:
rbio_orig_end_io(rbio, BLK_STS_IOERR);
+
+   while ((bio = bio_list_pop(&bio_list)))
+   bio_put(bio);
+
return;
 
 finish:
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents

2017-09-22 Thread Zygo Blaxell
The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and
offset (encoded as a single logical address) to a list of extent refs.
LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping
(extent ref -> extent bytenr and offset, or logical address).  These are
useful capabilities for programs that manipulate extents and extent
references from userspace (e.g. dedup and defrag utilities).

When the extents are uncompressed (and not encrypted and not other),
check_extent_in_eb performs filtering of the extent refs to remove any
extent refs which do not contain the same extent offset as the 'logical'
parameter's extent offset.  This prevents LOGICAL_INO from returning
references to more than a single block.

To find the set of extent references to an uncompressed extent from [a,
b), userspace has to run a loop like this pseudocode:

for (i = a; i < b; ++i)
extent_ref_set += LOGICAL_INO(i);

At each iteration of the loop (up to 32768 iterations for a 128M extent),
data we are interested in is collected in the kernel, then deleted by
the filter in check_extent_in_eb.

When the extents are compressed (or encrypted or other), the 'logical'
parameter must be an extent bytenr (the 'a' parameter in the loop).
No filtering by extent offset is done (or possible?) so the result is
the complete set of extent refs for the entire extent.  This removes
the need for the loop, since we get all the extent refs in one call.

Add an 'ignore_offset' argument to iterate_inodes_from_logical,
[...several levels of function call graph...], and check_extent_in_eb, so
that we can disable the extent offset filtering for uncompressed extents.
This flag can be set by an improved version of the LOGICAL_INO ioctl to
get either behavior as desired.

There is no functional change in this patch.  The new flag is always
false.

Signed-off-by: Zygo Blaxell 
---
 fs/btrfs/backref.c| 63 ++-
 fs/btrfs/backref.h|  8 +++---
 fs/btrfs/inode.c  |  2 +-
 fs/btrfs/ioctl.c  |  2 +-
 fs/btrfs/qgroup.c |  8 +++---
 fs/btrfs/scrub.c  |  6 ++---
 fs/btrfs/send.c   |  2 +-
 fs/btrfs/tests/qgroup-tests.c | 20 +++---
 8 files changed, 63 insertions(+), 48 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index b517ef1477ea..a2609786cd86 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -40,12 +40,14 @@ static int check_extent_in_eb(const struct btrfs_key *key,
  const struct extent_buffer *eb,
  const struct btrfs_file_extent_item *fi,
  u64 extent_item_pos,
- struct extent_inode_elem **eie)
+ struct extent_inode_elem **eie,
+ bool ignore_offset)
 {
u64 offset = 0;
struct extent_inode_elem *e;
 
-   if (!btrfs_file_extent_compression(eb, fi) &&
+   if (!ignore_offset &&
+   !btrfs_file_extent_compression(eb, fi) &&
!btrfs_file_extent_encryption(eb, fi) &&
!btrfs_file_extent_other_encoding(eb, fi)) {
u64 data_offset;
@@ -84,7 +86,8 @@ static void free_inode_elem_list(struct extent_inode_elem 
*eie)
 
 static int find_extent_in_eb(const struct extent_buffer *eb,
 u64 wanted_disk_byte, u64 extent_item_pos,
-struct extent_inode_elem **eie)
+struct extent_inode_elem **eie,
+bool ignore_offset)
 {
u64 disk_byte;
struct btrfs_key key;
@@ -113,7 +116,7 @@ static int find_extent_in_eb(const struct extent_buffer *eb,
if (disk_byte != wanted_disk_byte)
continue;
 
-   ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie);
+   ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie, 
ignore_offset);
if (ret < 0)
return ret;
}
@@ -419,7 +422,7 @@ static int add_indirect_ref(const struct btrfs_fs_info 
*fs_info,
 static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path,
   struct ulist *parents, struct prelim_ref *ref,
   int level, u64 time_seq, const u64 *extent_item_pos,
-  u64 total_refs)
+  u64 total_refs, bool ignore_offset)
 {
int ret = 0;
int slot;
@@ -472,7 +475,7 @@ static int add_all_parents(struct btrfs_root *root, struct 
btrfs_path *path,
if (extent_item_pos) {
ret = check_extent_in_eb(&key, eb, fi,
*extent_item_pos,
-   &eie);
+   &eie, ignore_offset);
   

[PATCH 3/3] btrfs: increase output size for LOGICAL_INO_V2 ioctl

2017-09-22 Thread Zygo Blaxell
Build-server workloads have hundreds of references per file after dedup.
Multiply by a few snapshots and we quickly exhaust the limit of 2730
references per extent that can fit into a 64K buffer.

Raise the limit to 16M to be consistent with other btrfs ioctls
(e.g. TREE_SEARCH_V2, FILE_EXTENT_SAME).

To minimize surprising userspace behavior, apply this change only to
the LOGICAL_INO_V2 ioctl.

Signed-off-by: Zygo Blaxell 
---
 fs/btrfs/ioctl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index f4281ffd1833..1940678fc440 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4554,6 +4554,7 @@ static long btrfs_ioctl_logical_to_ino(struct 
btrfs_fs_info *fs_info,
 
if (version == 1) {
ignore_offset = false;
+   size = min_t(u32, loi->size, SZ_64K);
} else {
/* All reserved bits must be 0 for now */
if (memchr_inv(loi->reserved, 0, sizeof(loi->reserved))) {
@@ -4566,6 +4567,7 @@ static long btrfs_ioctl_logical_to_ino(struct 
btrfs_fs_info *fs_info,
goto out_loi;
}
ignore_offset = loi->flags & 
BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET;
+   size = min_t(u32, loi->size, SZ_16M);
}
 
path = btrfs_alloc_path();
@@ -4574,7 +4576,6 @@ static long btrfs_ioctl_logical_to_ino(struct 
btrfs_fs_info *fs_info,
goto out;
}
 
-   size = min_t(u32, loi->size, SZ_64K);
inodes = init_data_container(size);
if (IS_ERR(inodes)) {
ret = PTR_ERR(inodes);
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2

2017-09-22 Thread Zygo Blaxell
Now that check_extent_in_eb()'s extent offset filter can be turned off,
we need a way to do it from userspace.

Add a 'flags' field to the btrfs_logical_ino_args structure to disable
extent offset filtering, taking the place of one of the existing
reserved[] fields.

Previous versions of LOGICAL_INO neglected to check whether any of the
reserved fields have non-zero values.  Assigning meaning to those fields
now may change the behavior of existing programs that left these fields
uninitialized.  The lack of a zero check also means that new programs
have no way to know whether the kernel is honoring the flags field.

To avoid these problems, define a new ioctl LOGICAL_INO_V2.  We can
use the same argument layout as LOGICAL_INO, but shorten the reserved[]
array by one element and turn it into the 'flags' field.  The V2 ioctl
explicitly checks that reserved fields and unsupported flag bits are zero
so that userspace can negotiate future feature bits as they are defined.

Since the memory layouts of the two ioctls' arguments are compatible,
there is no need for a separate function for logical_to_ino_v2 (contrast
with tree_search_v2 vs tree_search where the layout and code are quite
different).  A version parameter and an 'if' statement will suffice.

Now that we have a flags field in logical_ino_args, add a flag
BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want,
and pass it down the stack to iterate_inodes_from_logical.

Signed-off-by: Zygo Blaxell 
---
 fs/btrfs/ioctl.c   | 26 +++---
 include/uapi/linux/btrfs.h |  8 +++-
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b7de32568082..f4281ffd1833 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4536,13 +4536,14 @@ static int build_ino_list(u64 inum, u64 offset, u64 
root, void *ctx)
 }
 
 static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info,
-   void __user *arg)
+   void __user *arg, int version)
 {
int ret = 0;
int size;
struct btrfs_ioctl_logical_ino_args *loi;
struct btrfs_data_container *inodes = NULL;
struct btrfs_path *path = NULL;
+   bool ignore_offset;
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -4551,6 +4552,22 @@ static long btrfs_ioctl_logical_to_ino(struct 
btrfs_fs_info *fs_info,
if (IS_ERR(loi))
return PTR_ERR(loi);
 
+   if (version == 1) {
+   ignore_offset = false;
+   } else {
+   /* All reserved bits must be 0 for now */
+   if (memchr_inv(loi->reserved, 0, sizeof(loi->reserved))) {
+   ret = -EINVAL;
+   goto out_loi;
+   }
+   /* Only accept flags we have defined so far */
+   if (loi->flags & ~(BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET)) {
+   ret = -EINVAL;
+   goto out_loi;
+   }
+   ignore_offset = loi->flags & 
BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET;
+   }
+
path = btrfs_alloc_path();
if (!path) {
ret = -ENOMEM;
@@ -4566,7 +4583,7 @@ static long btrfs_ioctl_logical_to_ino(struct 
btrfs_fs_info *fs_info,
}
 
ret = iterate_inodes_from_logical(loi->logical, fs_info, path,
- build_ino_list, inodes, false);
+ build_ino_list, inodes, 
ignore_offset);
if (ret == -EINVAL)
ret = -ENOENT;
if (ret < 0)
@@ -4580,6 +4597,7 @@ static long btrfs_ioctl_logical_to_ino(struct 
btrfs_fs_info *fs_info,
 out:
btrfs_free_path(path);
kvfree(inodes);
+out_loi:
kfree(loi);
 
return ret;
@@ -5550,7 +5568,9 @@ long btrfs_ioctl(struct file *file, unsigned int
case BTRFS_IOC_INO_PATHS:
return btrfs_ioctl_ino_to_path(root, argp);
case BTRFS_IOC_LOGICAL_INO:
-   return btrfs_ioctl_logical_to_ino(fs_info, argp);
+   return btrfs_ioctl_logical_to_ino(fs_info, argp, 1);
+   case BTRFS_IOC_LOGICAL_INO_V2:
+   return btrfs_ioctl_logical_to_ino(fs_info, argp, 2);
case BTRFS_IOC_SPACE_INFO:
return btrfs_ioctl_space_info(fs_info, argp);
case BTRFS_IOC_SYNC: {
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 378230c163d5..99bb7988e6fe 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -608,10 +608,14 @@ struct btrfs_ioctl_ino_path_args {
 struct btrfs_ioctl_logical_ino_args {
__u64   logical;/* in */
__u64   size;   /* in */
-   __u64   reserved[4];
+   __u64   reserved[3];/* must be 0 for now */
+   __u64

[PATCH v3] btrfs: LOGICAL_INO enhancements

2017-09-22 Thread Zygo Blaxell
Changelog:

v3-v2:

- Stricter check on reserved[] field - now must be all zero, or
userspace gets EINVAL.  This prevents userspace from setting any
of the reserved bits without the kernel providing an unambiguous
interpretation of them, and doesn't require us to burn a flag
bit for each one.

- Moved 'flags' to the end of the reserved[] array.  This allows
existing source code using version 1 of the ioctl to behave the
same way when using version 2 of the btrfs_ioctl_logical_ino_args
struct definition (i.e. reserved[3] becomes an alias for 'flags',
and the addresses of reserved[0-2] don't change).

- Clarified the reasoning in the commit message for patch 2,
"btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2".

v2:

- added patch series intro text

- rebased on 4.14-rc1.

v1:

This patch series fixes some weaknesses in the btrfs LOGICAL_INO ioctl.

Background:

Suppose we have a file with one extent:

root@tester:~# zcat /usr/share/doc/cpio/changelog.gz > /test/a
root@tester:~# sync

Split the extent by overwriting it in the middle:

root@tester:~# cat /dev/urandom | dd bs=4k seek=2 skip=2 count=1 
conv=notrunc of=/test/a

We should now have 3 extent refs to 2 extents, with one block unreachable.
The extent tree looks like:

root@tester:~# btrfs-debug-tree /dev/vdc -t 2
[...]
item 9 key (1103101952 EXTENT_ITEM 73728) itemoff 15942 itemsize 53
extent refs 2 gen 29 flags DATA
extent data backref root 5 objectid 261 offset 0 count 2
[...]
item 11 key (1103175680 EXTENT_ITEM 4096) itemoff 15865 itemsize 53
extent refs 1 gen 30 flags DATA
extent data backref root 5 objectid 261 offset 8192 count 1
[...]

and the ref tree looks like:

root@tester:~# btrfs-debug-tree /dev/vdc -t 5
[...]
item 6 key (261 EXTENT_DATA 0) itemoff 15825 itemsize 53
extent data disk byte 1103101952 nr 73728
extent data offset 0 nr 8192 ram 73728
extent compression(none)
item 7 key (261 EXTENT_DATA 8192) itemoff 15772 itemsize 53
extent data disk byte 1103175680 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression(none)
item 8 key (261 EXTENT_DATA 12288) itemoff 15719 itemsize 53
extent data disk byte 1103101952 nr 73728
extent data offset 12288 nr 61440 ram 73728
extent compression(none)
[...]

There are two references to the same extent with different, non-overlapping
byte offsets:

[--72K extent at 1103101952--]
[--8K|--4K unreachable|--60K-]
^ ^
| |
[--8K ref offset 0--][--4K ref offset 0--][--60K ref offset 12K--]
 | 
 v
 [-4K extent-] at 1103175680

We want to find all of the references to extent bytenr 1103101952.

Without the patch (and without running btrfs-debug-tree), we have to
do it with 18 LOGICAL_INO calls:

root@tester:~# btrfs ins log 1103101952 -P /test/
Using LOGICAL_INO
inode 261 offset 0 root 5

root@tester:~# for x in $(seq 0 17); do btrfs ins log $((1103101952 + x * 
4096)) -P /test/; done 2>&1 | grep inode
inode 261 offset 0 root 5
inode 261 offset 4096 root 5   <- same extent ref as offset 0
   (offset 8192 returns empty set, not 
reachable)
inode 261 offset 12288 root 5
inode 261 offset 16384 root 5  \
inode 261 offset 20480 root 5  |
inode 261 offset 24576 root 5  |
inode 261 offset 28672 root 5  |
inode 261 offset 32768 root 5  |
inode 261 offset 36864 root 5  \
inode 261 offset 40960 root 5   > all the same extent ref as offset 12288.
inode 261 offset 45056 root 5  /  More processing required in userspace
inode 261 offset 49152 root 5  |  to figure out these are all duplicates.
inode 261 offset 53248 root 5  |
inode 261 offset 57344 root 5  |
inode 261 offset 61440 root 5  |
inode 261 offset 65536 root 5  |
inode 261 offset 69632 root 5  /

In the worst case the extents are 128MB long, and we have to do 32768
iterations of the loop to find one 4K extent ref.

With the patch, we just use one call to map all refs to the extent at once:

root@tester:~# btrfs ins log 1103101952 -P /test/
Using LOGICAL_INO_V2
inode 261 offset 0 root 5
inode 261 offset 12288 root 5

The TREE_SEARCH ioctl allows userspace to retrieve the offset and
extent bytenr fields easily once the root, inode and offset are known.
This is sufficient information to b

Re: using fio to test btrfs compression

2017-09-22 Thread Timofey Titovets
2017-09-22 8:33 GMT+03:00 shally verma :
> On Wed, Sep 20, 2017 at 5:26 PM, Timofey Titovets  
> wrote:
>> 2017-09-20 14:10 GMT+03:00 shally verma :
>>> Interesting part is I dont see "encoded" under flags. I couldn't
>>> understand if flags are retrieved from btrfs file metadata info. As
>>> you are running on 4.14 and I am on 4.9
>>>
>>> So, am still under doubt, even with dd if files are getting compressed.
>>>
>>> What is the filesize shown if you run
>>> btrfs fi du /mnt/test0.0 .. is it less or actual size?
>>>
>>> Is there any command that i can run to confirm file has been compressed?
>>>
>>> So far, I had my prints enabled in kernel/fs/btrfs/compression.c and
>>> check in dmesg that code jumped to compress_page() func.
>>>
>>> Thanks
>>> Shally
>>>
>>
>> Okay, lets play different.
>> encoded work for last several years for kernel releases, so you must see 
>> that.
>>
>> Reproduction script:
>> #!/bin/bash -e
>>
>> FILE_NAME=$RANDOM$RANDOM
>> TMP_DIR=$(mktemp -d)
>> IMAGE_FILE="$HOME/$FILE_NAME"
>>
>> truncate -s 4G $IMAGE_FILE
>> mkfs.btrfs -m single -L COMPRESS_TEST $IMAGE_FILE
>> mount -o compress-force $IMAGE_FILE $TMP_DIR
>> dd if=/dev/zero bs=128K count=2 of=$TMP_DIR/zero
>> sync
>> filefrag -v $TMP_DIR/zero
>> umount $TMP_DIR
>> rm -v $IMAGE_FILE
>>
>> Example output:
>> ~ sudo ./btrfs_compress_test.sh
>> btrfs-progs v4.13
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label:  COMPRESS_TEST
>> UUID:   abfedc39-dd94-4105-87d6-49eedb13467f
>> Node size:  16384
>> Sector size:4096
>> Filesystem size:4.00GiB
>> Block group profiles:
>>  Data: single8.00MiB
>>  Metadata: single8.00MiB
>>  System:   single4.00MiB
>> SSD detected:   no
>> Incompat features:  extref, skinny-metadata
>> Number of devices:  1
>> Devices:
>>   IDSIZE  PATH
>>1 4.00GiB  /root/322906281
>>
>> 2+0 records in
>> 2+0 records out
>> 262144 bytes (262 kB, 256 KiB) copied, 0.000197746 s, 1.3 GB/s
>> Filesystem type is: 9123683e
>> File size of /tmp/tmp.bDyt3EkEG5/zero is 262144 (64 blocks of 4096 bytes)
>> ext: logical_offset:physical_offset: length:   expected: flags:
>>   0:0..  31:   3072..  3103: 32: encoded
>>   1:   32..  63:   3073..  3104: 32:   3104:
>> last,encoded,eof
>> /tmp/tmp.bDyt3EkEG5/zero: 2 extents found
>> removed '/root/322906281'
>>
>> Good luck.
> Here's my output - Everything is same except:
>
> 1. nodesize and sector size = 64K
> 2. extent length = 2
> 3. I  don't see "encoded" in filefrag here.
>
> btrfs-progs v4.13
> See http://btrfs.wiki.kernel.org for more information.
>
> Label:  COMPRESS_TEST
> UUID:   fad6907e-d4eb-4dbb-9014-3918a822c9ce
> Node size:  65536
> Sector size:65536
> Filesystem size:4.00GiB
> Block group profiles:
>   Data: single8.00MiB
>   Metadata: single8.00MiB
>   System:   single4.00MiB
> SSD detected:   no
> Incompat features:  extref, skinny-metadata
> Number of devices:  1
> Devices:
>IDSIZE  PATH
> 1 4.00GiB  /root/2808626087
>
> 2+0 records in
> 2+0 records out
> 262144 bytes (262 kB) copied, 0.00028777 s, 911 MB/s
> Filesystem type is: 9123683e
> File size of /tmp/tmp.346ESCdOIi/zero is 262144 (4 blocks of 65536 bytes)
>  ext: logical_offset:physical_offset: length:   expected: flags:
>0:0..   1:192..   193:  2:
>1:2..   3:193..   194:  2:194: eof
> /tmp/tmp.346ESCdOIi/zero: 2 extents found
> removed '/root/2808626087'
>
> And this is my dmesg
>
> [170127.417119] BTRFS: device label COMPRESS_TEST devid 1 transid 5 /dev/loop0
> [170127.417493] BTRFS info (device loop0): force zlib compression
> [170127.417496] BTRFS info (device loop0): disk space caching is enabled
> [170127.417499] BTRFS info (device loop0): has skinny extents
> [170127.425858] BTRFS info (device loop0): creating UUID tree
>
> This is fio --version
> fio-3.0
>
> What do we doubt here?
>
> Thanks
> Shally
>
>> --
>> Have a nice day,
>> Timofey.

Sorry, no idea.

-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[josef-btrfs:new-kill-btree-inode 20/20] ERROR: "radix_tree_iter_tag_set" [fs/btrfs/btrfs.ko] undefined!

2017-09-22 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
new-kill-btree-inode
head:   3264d40ac916ca952f0b85e39304b9199ff36d8b
commit: 3264d40ac916ca952f0b85e39304b9199ff36d8b [20/20] Btrfs: kill the 
btree_inode
config: x86_64-randconfig-i0-201738 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 3264d40ac916ca952f0b85e39304b9199ff36d8b
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> ERROR: "radix_tree_iter_tag_set" [fs/btrfs/btrfs.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[josef-btrfs:new-kill-btree-inode 17/20] fs/ntfs/attrib.c:2549:35: error: implicit declaration of function 'inode_to_bdi'

2017-09-22 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
new-kill-btree-inode
head:   3264d40ac916ca952f0b85e39304b9199ff36d8b
commit: a03edce0823e97312869c60933b37157c3b8866a [17/20] remove mapping from 
balance_dirty_pages*()
config: x86_64-randconfig-x007-201738 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
git checkout a03edce0823e97312869c60933b37157c3b8866a
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   fs/ntfs/attrib.c: In function 'ntfs_attr_set':
>> fs/ntfs/attrib.c:2549:35: error: implicit declaration of function 
>> 'inode_to_bdi' [-Werror=implicit-function-declaration]
  balance_dirty_pages_ratelimited(inode_to_bdi(inode),
  ^~~~
   fs/ntfs/attrib.c:2549:35: warning: passing argument 1 of 
'balance_dirty_pages_ratelimited' makes pointer from integer without a cast 
[-Wint-conversion]
   In file included from include/linux/memcontrol.h:31:0,
from include/linux/swap.h:8,
from fs/ntfs/attrib.c:26:
   include/linux/writeback.h:380:6: note: expected 'struct backing_dev_info *' 
but argument is of type 'int'
void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
 ^~~
   fs/ntfs/attrib.c:2591:35: warning: passing argument 1 of 
'balance_dirty_pages_ratelimited' makes pointer from integer without a cast 
[-Wint-conversion]
  balance_dirty_pages_ratelimited(inode_to_bdi(inode),
  ^~~~
   In file included from include/linux/memcontrol.h:31:0,
from include/linux/swap.h:8,
from fs/ntfs/attrib.c:26:
   include/linux/writeback.h:380:6: note: expected 'struct backing_dev_info *' 
but argument is of type 'int'
void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
 ^~~
   fs/ntfs/attrib.c:2609:35: warning: passing argument 1 of 
'balance_dirty_pages_ratelimited' makes pointer from integer without a cast 
[-Wint-conversion]
  balance_dirty_pages_ratelimited(inode_to_bdi(inode),
  ^~~~
   In file included from include/linux/memcontrol.h:31:0,
from include/linux/swap.h:8,
from fs/ntfs/attrib.c:26:
   include/linux/writeback.h:380:6: note: expected 'struct backing_dev_info *' 
but argument is of type 'int'
void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi,
 ^~~
   cc1: some warnings being treated as errors

vim +/inode_to_bdi +2549 fs/ntfs/attrib.c

  2472  
  2473  /**
  2474   * ntfs_attr_set - fill (a part of) an attribute with a byte
  2475   * @ni: ntfs inode describing the attribute to fill
  2476   * @ofs:offset inside the attribute at which to start to fill
  2477   * @cnt:number of bytes to fill
  2478   * @val:the unsigned 8-bit value with which to fill the 
attribute
  2479   *
  2480   * Fill @cnt bytes of the attribute described by the ntfs inode @ni 
starting at
  2481   * byte offset @ofs inside the attribute with the constant byte @val.
  2482   *
  2483   * This function is effectively like memset() applied to an ntfs 
attribute.
  2484   * Note thie function actually only operates on the page cache pages 
belonging
  2485   * to the ntfs attribute and it marks them dirty after doing the 
memset().
  2486   * Thus it relies on the vm dirty page write code paths to cause the 
modified
  2487   * pages to be written to the mft record/disk.
  2488   *
  2489   * Return 0 on success and -errno on error.  An error code of -ESPIPE 
means
  2490   * that @ofs + @cnt were outside the end of the attribute and no write 
was
  2491   * performed.
  2492   */
  2493  int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const 
u8 val)
  2494  {
  2495  ntfs_volume *vol = ni->vol;
  2496  struct inode *inode = VFS_I(ni);
  2497  struct address_space *mapping;
  2498  struct page *page;
  2499  u8 *kaddr;
  2500  pgoff_t idx, end;
  2501  unsigned start_ofs, end_ofs, size;
  2502  
  2503  ntfs_debug("Entering for ofs 0x%llx, cnt 0x%llx, val 0x%hx.",
  2504  (long long)ofs, (long long)cnt, val);
  2505  BUG_ON(ofs < 0);
  2506  BUG_ON(cnt < 0);
  2507  if (!cnt)
  2508  goto done;
  2509  /*
  2510   * FIXME: Compressed and encrypted attributes are not supported 
when
  2511   * writing and we should never have gotten here for them.
  2512   */
  2513  BUG_ON(NInoCompressed(ni));
  2514  BUG_ON(NInoEncrypted(ni));
  2515  mapping = VFS_I(ni)->i_mapping;
  2516  /* Work out the starting index and page offset. */
  2517 

Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option

2017-09-22 Thread Qu Wenruo



On 2017年09月22日 21:33, Austin S. Hemmelgarn wrote:

On 2017-09-22 08:32, Qu Wenruo wrote:

On 2017年09月22日 19:38, Austin S. Hemmelgarn wrote:

On 2017-09-22 06:39, Qu Wenruo wrote:
As I already stated in an other thread, if you want to shrink, do it 
in another command line tool.
Do one thing and do it simple. (Although Btrfs itself is already out 
of the UNIX way)
Unless I'm reading the code wrong, the shrinking isn't happening in a 
second pass, so this _is_ doing one thing, and it appears to be doing 
it as simply as possible (although arguably not correctly because of 
the 1MB reserved area being used).


If you're referring to my V1 implementation of shrink, that's doing 
*one* thing.


But the original shrinking code? Nope, or we won't have the custom 
chunk allocator at all.


What I really mean is, if one wants to shrink, at least don't couple 
the shrink code into "mkfs.btrfs".


Do shrink in its own tool/subcommand, not in a really unrelated tool.

There are two cases for shrinking a filesystem:
1. You're resizing it to move to a smaller disk (or speed up copying to 
another disk).
2. You're generating a filesystem image that needs to be as small as 
possible.


I would argue there is no meaning of creating *smallest* image. (Of 
course it exists)


We could put tons of code to implement, and more (or less) test cases to 
verify it.


But the demand doesn't validate the effort.

All my points are clear for this patchset:
I know I removed one function, and my reason is:
1) No or little usage
   And it's anti intuition.
2) Dead code (not tested nor well documented)
3) Possible workaround

I can add several extra reasons as I stated before, but number of 
reasons won't validate anything anyway.


Building software is always trading one thing for another.
I understand there may be some need for this function, but it doesn't 
validate the cost.


And I think the fact that until recently a mail reported about the 
shrinking behavior has already backed up my point.


Thanks,
Qu



Case 1 is obviously unrelated to creating a filesystem.  Case 2 however 
is kind of integral to the creation of the filesystem image itself by 
definition, especially for a CoW filesystem because it's not possible to 
shrink to the absolute smallest size due to the GlobalReserve and other 
things.


Similarly, there are two primary use cases for pre-loading the 
filesystem with data:
1. Avoiding a copy when reprovisioning storage on a system.  For 
example, splitting a directory out to a new filesystem, you could use 
the -r option to avoid having to copy the data after mounting the 
filesystem.

2. Creating base images for systems.

The first case shouldn't need the shrinking functionality, but the 
second is a very common use case together with the second usage for 
shrinking a filesystem.




It may be offline shrink/balance. But not to further complexing the 
--rootdir option now. >
And you also said that, the shrink feature is not a popular feature 
*NOW*, then I don't think it's worthy to implment it *NOW* either.

Implement future feature in the future please.
I'm not sure about you, but I could have sworn that he meant seed 
devices weren't a popular feature right now,


Oh, sorry for my misunderstanding.

not that the shrinking is. As a general rule, the whole option of 
pre-loading a filesystem with data as you're creating it is not a 
popular feature, because most sysadmins are much more willing to 
trust adding data after the filesystem is created.


Personally, given the existence of seed devices, I would absolutely 
expect there to be a quick and easy way to generate a minimalistic 
image using a single command (because realistic usage of seed devices 
implies minimalistic images).  I agree that it shouldn't be the 
default behavior, but I don't think it needs to be removed completely.


Just like I said in cover letter, even for ext*, it's provided by 
genext2fs, not mke2fs.
Then maybe this should get split out into a separate tool instead of 
just removing it completely?  There is obviously at least some interest 
in this functionality.


I totally understand end-user really want a do-it-all solution.
But from developers' view, the old UNIX way is better to maintain code 
clean and easy to read.
What the code is doing should have near zero impact on readability.  If 
it did, then the BTRFS code in general is already way beyond most people.



In fact, you can even create your script to do the old behavior if you 
don't care that the result may not fully take use of the space, just by:


1) Calculate the size of the whole directory
    "du" command can do it easily, and it does things better than us! For
    years!
Um, no it actually doesn't do things better in all cases.  it doesn't 
account for extended attributes, or metadata usage, or any number of 
other things that factor into how much space a file or directory will 
take up on BTRFS.  It's good enough for finding what's using most of 
your space, but it'

Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option

2017-09-22 Thread Austin S. Hemmelgarn

On 2017-09-22 08:32, Qu Wenruo wrote:

On 2017年09月22日 19:38, Austin S. Hemmelgarn wrote:

On 2017-09-22 06:39, Qu Wenruo wrote:
As I already stated in an other thread, if you want to shrink, do it 
in another command line tool.
Do one thing and do it simple. (Although Btrfs itself is already out 
of the UNIX way)
Unless I'm reading the code wrong, the shrinking isn't happening in a 
second pass, so this _is_ doing one thing, and it appears to be doing 
it as simply as possible (although arguably not correctly because of 
the 1MB reserved area being used).


If you're referring to my V1 implementation of shrink, that's doing 
*one* thing.


But the original shrinking code? Nope, or we won't have the custom chunk 
allocator at all.


What I really mean is, if one wants to shrink, at least don't couple the 
shrink code into "mkfs.btrfs".


Do shrink in its own tool/subcommand, not in a really unrelated tool.

There are two cases for shrinking a filesystem:
1. You're resizing it to move to a smaller disk (or speed up copying to 
another disk).
2. You're generating a filesystem image that needs to be as small as 
possible.


Case 1 is obviously unrelated to creating a filesystem.  Case 2 however 
is kind of integral to the creation of the filesystem image itself by 
definition, especially for a CoW filesystem because it's not possible to 
shrink to the absolute smallest size due to the GlobalReserve and other 
things.


Similarly, there are two primary use cases for pre-loading the 
filesystem with data:
1. Avoiding a copy when reprovisioning storage on a system.  For 
example, splitting a directory out to a new filesystem, you could use 
the -r option to avoid having to copy the data after mounting the 
filesystem.

2. Creating base images for systems.

The first case shouldn't need the shrinking functionality, but the 
second is a very common use case together with the second usage for 
shrinking a filesystem.




It may be offline shrink/balance. But not to further complexing the 
--rootdir option now. >
And you also said that, the shrink feature is not a popular feature 
*NOW*, then I don't think it's worthy to implment it *NOW* either.

Implement future feature in the future please.
I'm not sure about you, but I could have sworn that he meant seed 
devices weren't a popular feature right now,


Oh, sorry for my misunderstanding.

not that the shrinking is. As a general rule, the whole option of 
pre-loading a filesystem with data as you're creating it is not a 
popular feature, because most sysadmins are much more willing to trust 
adding data after the filesystem is created.


Personally, given the existence of seed devices, I would absolutely 
expect there to be a quick and easy way to generate a minimalistic 
image using a single command (because realistic usage of seed devices 
implies minimalistic images).  I agree that it shouldn't be the 
default behavior, but I don't think it needs to be removed completely.


Just like I said in cover letter, even for ext*, it's provided by 
genext2fs, not mke2fs.
Then maybe this should get split out into a separate tool instead of 
just removing it completely?  There is obviously at least some interest 
in this functionality.


I totally understand end-user really want a do-it-all solution.
But from developers' view, the old UNIX way is better to maintain code 
clean and easy to read.
What the code is doing should have near zero impact on readability.  If 
it did, then the BTRFS code in general is already way beyond most people.



In fact, you can even create your script to do the old behavior if you 
don't care that the result may not fully take use of the space, just by:


1) Calculate the size of the whole directory
    "du" command can do it easily, and it does things better than us! For
    years!
Um, no it actually doesn't do things better in all cases.  it doesn't 
account for extended attributes, or metadata usage, or any number of 
other things that factor into how much space a file or directory will 
take up on BTRFS.  It's good enough for finding what's using most of 
your space, but it's not reliable for determining how much space you 
need to store that data (especially once you throw in in-line compression).


2) Multiple the value according to the meta/data profile
    Take care of small files, which will be inlined.
    And don't forget size for data checksum.
    (BTW, there is no way to change the behavor of inlined data and data
     checksum for mkfs. unlike btrfs-convert)
This is where the issue lies.  It's not possible for a person to 
calculate this with reasonable accuracy, and you arguably can't even do 
it for certain programmatically without some serious work.


3) Create a file with size calculated by step 2)

4) Execute "mkfs.btrfs -d  "

  The main issues here are that it wasn't documented well (like many 
other things in BTRFS), and it didn't generate a filesystem that was 
properly compliant with the on-disk format (because it use

Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option

2017-09-22 Thread Qu Wenruo



On 2017年09月22日 19:38, Austin S. Hemmelgarn wrote:

On 2017-09-22 06:39, Qu Wenruo wrote:
As I already stated in an other thread, if you want to shrink, do it 
in another command line tool.
Do one thing and do it simple. (Although Btrfs itself is already out 
of the UNIX way)
Unless I'm reading the code wrong, the shrinking isn't happening in a 
second pass, so this _is_ doing one thing, and it appears to be doing it 
as simply as possible (although arguably not correctly because of the 
1MB reserved area being used).


If you're referring to my V1 implementation of shrink, that's doing 
*one* thing.


But the original shrinking code? Nope, or we won't have the custom chunk 
allocator at all.



What I really mean is, if one wants to shrink, at least don't couple the 
shrink code into "mkfs.btrfs".


Do shrink in its own tool/subcommand, not in a really unrelated tool.



It may be offline shrink/balance. But not to further complexing the 
--rootdir option now. >
And you also said that, the shrink feature is not a popular feature 
*NOW*, then I don't think it's worthy to implment it *NOW* either.

Implement future feature in the future please.
I'm not sure about you, but I could have sworn that he meant seed 
devices weren't a popular feature right now,


Oh, sorry for my misunderstanding.

not that the shrinking is. 
As a general rule, the whole option of pre-loading a filesystem with 
data as you're creating it is not a popular feature, because most 
sysadmins are much more willing to trust adding data after the 
filesystem is created.


Personally, given the existence of seed devices, I would absolutely 
expect there to be a quick and easy way to generate a minimalistic image 
using a single command (because realistic usage of seed devices implies 
minimalistic images).  I agree that it shouldn't be the default 
behavior, but I don't think it needs to be removed completely.


Just like I said in cover letter, even for ext*, it's provided by 
genext2fs, not mke2fs.


I totally understand end-user really want a do-it-all solution.
But from developers' view, the old UNIX way is better to maintain code 
clean and easy to read.



In fact, you can even create your script to do the old behavior if you 
don't care that the result may not fully take use of the space, just by:


1) Calculate the size of the whole directory
   "du" command can do it easily, and it does things better than us! For
   years!

2) Multiple the value according to the meta/data profile
   Take care of small files, which will be inlined.
   And don't forget size for data checksum.
   (BTW, there is no way to change the behavor of inlined data and data
checksum for mkfs. unlike btrfs-convert)


3) Create a file with size calculated by step 2)

4) Execute "mkfs.btrfs -d  "

  The main 
issues here are that it wasn't documented well (like many other things 
in BTRFS), and it didn't generate a filesystem that was properly 
compliant with the on-disk format (because it used space in the 1MB 
reserved area at the beginning of the FS).  Fixing those issues in no 
way requires removing the feature.


Yes, 1MB can be fixed easily (although not properly). But the whole 
customized chunk allocator is the real problem.
The almost dead code is always bug-prone. Replace it with updated 
generic chunk allocator is the way to avoid later whac-a-mole, and 
should be done asap.




And further more, even following the existing shrink behavior, you 
still need to truncate the file all by yourself.

Which is no better than creating a good sized file and then mkfs on it.
Only if you pre-create the file.  If the file doesn't exist, it gets 
created at the appropriate size.  That's part of why the chunk 
allocations are screwed up and stuff gets put in the first 1MB, it 
generates the FS on-the-fly and writes it out as it's generating it.


Nope, even you created the file in advance, it will still occupy the 
first 1M.


BTW, you can get back the size calculation for shrink, but you will soon 
find that it's just the start of a new nightmare.

Because there is no easy way to calculate the real metadata usage.

The result (and the old calculator) will be no better than guessing it.
(Well, just multiply the dir size by 2 will never go wrong)


Thanks,
Qu



Thanks,
Qu

Sent: Friday, September 22, 2017 at 5:24 PM
From: "Anand Jain" 
To: "Qu Wenruo" , linux-btrfs@vger.kernel.org
Cc: dste...@suse.cz
Subject: Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra 
condition for rootdir option


+WARNING: Before v4.14 btrfs-progs, *--rootdir* will shrink the 
filesystem,

+prevent user to make use of the remaining space.
+In v4.14 btrfs-progs, this behavior is changed, and will not shrink 
the fs.

+The result should be the same as `mkfs`, `mount` and then `cp -r`. +


Hmm well. Shrink to fit exactly to the size of the given
files-and-directory is indeed a nice feature. Which would help to create
a golden-image btrfs seed device. Its not popular as of now, but a

Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option

2017-09-22 Thread Austin S. Hemmelgarn

On 2017-09-22 06:39, Qu Wenruo wrote:

As I already stated in an other thread, if you want to shrink, do it in another 
command line tool.
Do one thing and do it simple. (Although Btrfs itself is already out of the 
UNIX way)
Unless I'm reading the code wrong, the shrinking isn't happening in a 
second pass, so this _is_ doing one thing, and it appears to be doing it 
as simply as possible (although arguably not correctly because of the 
1MB reserved area being used).


It may be offline shrink/balance. But not to further complexing the --rootdir 
option now. >
And you also said that, the shrink feature is not a popular feature *NOW*, then 
I don't think it's worthy to implment it *NOW* either.
Implement future feature in the future please.
I'm not sure about you, but I could have sworn that he meant seed 
devices weren't a popular feature right now, not that the shrinking is. 
As a general rule, the whole option of pre-loading a filesystem with 
data as you're creating it is not a popular feature, because most 
sysadmins are much more willing to trust adding data after the 
filesystem is created.


Personally, given the existence of seed devices, I would absolutely 
expect there to be a quick and easy way to generate a minimalistic image 
using a single command (because realistic usage of seed devices implies 
minimalistic images).  I agree that it shouldn't be the default 
behavior, but I don't think it needs to be removed completely.  The main 
issues here are that it wasn't documented well (like many other things 
in BTRFS), and it didn't generate a filesystem that was properly 
compliant with the on-disk format (because it used space in the 1MB 
reserved area at the beginning of the FS).  Fixing those issues in no 
way requires removing the feature.


And further more, even following the existing shrink behavior, you still need 
to truncate the file all by yourself.
Which is no better than creating a good sized file and then mkfs on it.
Only if you pre-create the file.  If the file doesn't exist, it gets 
created at the appropriate size.  That's part of why the chunk 
allocations are screwed up and stuff gets put in the first 1MB, it 
generates the FS on-the-fly and writes it out as it's generating it.


Thanks,
Qu

Sent: Friday, September 22, 2017 at 5:24 PM
From: "Anand Jain" 
To: "Qu Wenruo" , linux-btrfs@vger.kernel.org
Cc: dste...@suse.cz
Subject: Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for 
rootdir option


+WARNING: Before v4.14 btrfs-progs, *--rootdir* will shrink the filesystem,
+prevent user to make use of the remaining space.
+In v4.14 btrfs-progs, this behavior is changed, and will not shrink the fs.
+The result should be the same as `mkfs`, `mount` and then `cp -r`. +


Hmm well. Shrink to fit exactly to the size of the given
files-and-directory is indeed a nice feature. Which would help to create
a golden-image btrfs seed device. Its not popular as of now, but at some
point it may in the cloud environment.

Replacing this feature instead of creating a new option is not a good
idea indeed. I missed something ?

Thanks, Anand


+Also, if destination file/block device does not exist, *--rootdir* will not
+create the image file, to make it follow the normal mkfs behavior.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-22 Thread Austin S. Hemmelgarn

On 2017-09-21 16:10, Kai Krakow wrote:

Am Wed, 20 Sep 2017 07:46:52 -0400
schrieb "Austin S. Hemmelgarn" :


  Fragmentation: Files with a lot of random writes can become
heavily fragmented (1+ extents) causing excessive multi-second
spikes of CPU load on systems with an SSD or large amount a RAM. On
desktops this primarily affects application databases (including
Firefox). Workarounds include manually defragmenting your home
directory using btrfs fi defragment. Auto-defragment (mount option
autodefrag) should solve this problem.

Upon reading that I am wondering if fragmentation in the Firefox
profile is part of my issue. That's one thing I never tested
previously. (BTW, this system has 256 GB of RAM and 20 cores.)

Almost certainly.  Most modern web browsers are brain-dead and insist
on using SQLite databases (or traditional DB files) for everything,
including the cache, and the usage for the cache in particular kills
performance when fragmentation is an issue.


At least in Chrome, you can turn on simple cache backend, which, I
think, is using many small instead of one huge file. This suit btrfs
much better:
That's correct.  The traditional cache in Chrome and Chromium uses a 
single SQLite database for storing all the cache data and metadata (just 
like FIrefox did last time I checked).  The simple cache backend instead 
uses the filesystem to handle allocations and uses directory hashing to 
speed up look ups of items, which actually means that even without BTRFS 
involved, it will usually be faster (both because it allows concurrent 
access unlike SQLite, and because it's generally faster to parse a 
multi-level directory hash than an SQL statement).


chrome://flags/#enable-simple-cache-backend


And then I suggest also doing this (as your login user):

$ cd $HOME
$ mv .cache .cache.old
$ mkdir .cache
$ lsattr +C .cache
$ rsync -av .cache.old/ .cache/
$ rm -Rf .cache.old

This makes caches for most applications nocow. Chrome performance was
completely fixed for me by doing this.

I'm not sure where Firefox puts its cache, I only use it on very rare
occasions. But I think it's going to .cache/mozilla last time looked
at it.

I'm pretty sure that is correct.


You may want to close all apps before converting the cache directory.
At a minimum, you'll have to restart them to get them to use the new 
location.


Also, I don't see any downsides in making this nocow. That directory
could easily be also completely volatile. If something breaks due to no
longer protected by data csum, just clean it out.
Indeed, anything that is storing data here that can't be regenerated 
from some other source is asking for trouble, sane backup systems don't 
include ~/.cache, and it's quite often one of the first things 
recommended for deletion when trying to free up disk space.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option

2017-09-22 Thread Qu Wenruo
As I already stated in an other thread, if you want to shrink, do it in another 
command line tool.
Do one thing and do it simple. (Although Btrfs itself is already out of the 
UNIX way)

It may be offline shrink/balance. But not to further complexing the --rootdir 
option now.

And you also said that, the shrink feature is not a popular feature *NOW*, then 
I don't think it's worthy to implment it *NOW* either.
Implement future feature in the future please.

And further more, even following the existing shrink behavior, you still need 
to truncate the file all by yourself.
Which is no better than creating a good sized file and then mkfs on it.

Thanks,
Qu

Sent: Friday, September 22, 2017 at 5:24 PM
From: "Anand Jain" 
To: "Qu Wenruo" , linux-btrfs@vger.kernel.org
Cc: dste...@suse.cz
Subject: Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for 
rootdir option

> +WARNING: Before v4.14 btrfs-progs, *--rootdir* will shrink the filesystem,
> +prevent user to make use of the remaining space.
> +In v4.14 btrfs-progs, this behavior is changed, and will not shrink the fs.
> +The result should be the same as `mkfs`, `mount` and then `cp -r`. +

Hmm well. Shrink to fit exactly to the size of the given
files-and-directory is indeed a nice feature. Which would help to create
a golden-image btrfs seed device. Its not popular as of now, but at some
point it may in the cloud environment.

Replacing this feature instead of creating a new option is not a good
idea indeed. I missed something ?

Thanks, Anand

> +Also, if destination file/block device does not exist, *--rootdir* will not
> +create the image file, to make it follow the normal mkfs behavior.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option

2017-09-22 Thread Anand Jain




+WARNING: Before v4.14 btrfs-progs, *--rootdir* will shrink the filesystem,
+prevent user to make use of the remaining space.
+In v4.14 btrfs-progs, this behavior is changed, and will not shrink the fs.
+The result should be the same as `mkfs`, `mount` and then `cp -r`. +


 Hmm well. Shrink to fit exactly to the size of the given 
files-and-directory is indeed a nice feature. Which would help to create 
a golden-image btrfs seed device. Its not popular as of now, but at some 
point it may in the cloud environment.


 Replacing this feature instead of creating a new option is not a good 
idea indeed. I missed something ?


Thanks, Anand


+Also, if destination file/block device does not exist, *--rootdir* will not
+create the image file, to make it follow the normal mkfs behavior.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] xfstests: Split MOUNT_OPTIONS to TEST_FS_MOUNT_OPTS and MOUNT_OPTIONS

2017-09-22 Thread Eryu Guan
On Mon, Sep 18, 2017 at 04:11:09PM +0800, Gu Jinxiang wrote:
> Resovle the inconsistent of mount option.
> Btrfs use MOUNT_OPTIONS for both scrath_dev and test_dev. Change to
> MOUNT_OPTIONS for scratch mount, and TEST_FS_MOUNT_OPTS for test dev
> mount.
> 
> Signed-off-by: Gu Jinxiang 
> ---
> As mentioned by https://patchwork.kernel.org/patch/9742039/, the usage
> of MOUNT_OPTIONS is inconsistent.

Sorry for the late reply, because I don't think this is the right fix,
and I was trying to sort out & refactor the mount option settings
through fstests, but found that it required more time than I thought..

The new _check_test_btrfs_filesystem() you introduced here is a (almost)
complete copy of _check_btrfs_filesystem, the only difference is that
it's mounting $device with $TEST_FS_MOUNT_OPTS instead of
$MOUNT_OPTIONS. At least you could factor out a common helper so you
don't have to repeat 99% of the code.

But that only fixes _check_test_btrfs_filesystems, there're similar
problems in _check_generic_filesystems and other places that do mount
operation. I'd like to see a well-defined interface to return the correct
mount options to callers that want to do mount operations.

I'll look into this and see if I can work something out. I really
appreciate if anyone has other thoughts/suggestions!

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Running compsize utility on btrfs compress mount point

2017-09-22 Thread Duncan
shally verma posted on Fri, 22 Sep 2017 12:01:32 +0530 as excerpted:

> In response to my question in another thread:
>> Is there any command that i can run to confirm file has been compressed?
> 
> I've gotten response from Duncan:
>>>There is the quite recently posted (and actively updated since then)
>>>compsize command.
> 
>>>https://github.com/kilobyte/compsize
> 
> On running compsize on btrfs mount point created, I get following:
> 
> Type   Perc Disk Usage   Uncompressed Referenced
> Data50%  128K 256K 256K
> zlib50%  128K 256K 256K
> Data   100%   10G  10G  10G
> none   100%   10G  10G  10G
> 
> but then how do I interpret above information?

Hmm... Looks like you got hit by the live-repo phenomenon.  I've not
updated for a few days (reported sync time was last Sat), but was
just running it a few hours ago, and my output was a bit more
intuitively ordered.

The way it's /supposed/ to work (unless it's changing, it /is/ a rather
new tool after all), the first line (Data) should be the aggregate
total and average compression percentage of all used compression types,
with the following lines displaying values for each compression type
compsize found, on its own.

IOW, for the above I'd have expected just three lines (plus header),
something like this:

Data100% 10G 10G 10G
zlib 50%128K256K256K
none100% 10G 10G 10G

(In this case the 128K compressed is so small it's a rounding
error in the 10G full size, so the combined data line and
the none line will appear the same.)

I'd suggest trying to pull any updates and rebuilding.  If that
doesn't fix it, try checking out a specific commit, say the 160c335e7
that was HEAD last Saturday (16th) when I built, and see if the output
comes out a bit more sane.  If that works, then the problem must be in
either 94d98028b from Tuesday, or caed4fdcd, HEAD as I'm looking at it
now, from Thursday, if you pulled after it.  If the author doesn't
popup in this thread, you might bisect to the specific commit and
file a github issue on it.

If 160c335e7 doesn't give you saner output, then there's some other
difference between our systems that must account for your output
above.  FWIW, gcc 6.4.0 (gentoo) here, amd64, kernel 4.13, glibc
2.25-r5 (the -r5 indicating gentoo patch level), btrfs-progs 4.12.
I have CFLAGS available too if you need 'em. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html