Re: [PATCH] btrfs: Remove WARN_ON for unaligned device created before v4.13 and adds more user friendly output
At Sat, 23 Sep 2017 10:19:26 +0900, Satoru Takeuchi wrote: > > At Wed, 20 Sep 2017 15:18:43 +0900, > Qu Wenruo wrote: > > > > Commit 7dfb8be11b5d ("btrfs: Round down values which are written for > > total_bytes_size") is fixing the unaligned device size caused by > > adding/shrinking device. > > > > It added a new WARN_ON() when device size is unaligned. > > This is fine for new device added to btrfs using v4.13 kernel, but not > > existing device whose total_bytes is already unaligned. > > > > And the WARN_ON() will get triggered every time a block group get > > created/removed on the unaligned device. > > > > This patch will remove the WARN_ON(), and warn user more gently what's > > happening and how to fix it. > > > > Reported-by: Rich Rauenzahn > > Fixes: 7dfb8be11b5d ("btrfs: Round down values which are written for > > total_bytes_size") > > Signed-off-by: Qu Wenruo > > --- > > fs/btrfs/ctree.h | 1 - > > fs/btrfs/volumes.c | 16 > > 2 files changed, 12 insertions(+), 5 deletions(-) > > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > > index 5a8933da39a7..4de9269e435a 100644 > > --- a/fs/btrfs/ctree.h > > +++ b/fs/btrfs/ctree.h > > @@ -1562,7 +1562,6 @@ static inline void > > btrfs_set_device_total_bytes(struct extent_buffer *eb, > > { > > BUILD_BUG_ON(sizeof(u64) != > > sizeof(((struct btrfs_dev_item *)0))->total_bytes); > > - WARN_ON(!IS_ALIGNED(val, eb->fs_info->sectorsize)); > > btrfs_set_64(eb, s, offsetof(struct btrfs_dev_item, total_bytes), val); > > } > > > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > > index 0e8f16c305df..afae25df6a8c 100644 > > --- a/fs/btrfs/volumes.c > > +++ b/fs/btrfs/volumes.c > > @@ -6472,15 +6472,23 @@ static int read_one_chunk(struct btrfs_fs_info > > *fs_info, struct btrfs_key *key, > > return 0; > > } > > > > -static void fill_device_from_item(struct extent_buffer *leaf, > > -struct btrfs_dev_item *dev_item, > > -struct btrfs_device *device) > > +static void fill_device_from_item(struct btrfs_fs_info *fs_info, > > + struct extent_buffer *leaf, > > + struct btrfs_dev_item *dev_item, > > + struct btrfs_device *device) > > { > > unsigned long ptr; > > > > device->devid = btrfs_device_id(leaf, dev_item); > > device->disk_total_bytes = btrfs_device_total_bytes(leaf, dev_item); > > device->total_bytes = device->disk_total_bytes; > > + if (!IS_ALIGNED(device->total_bytes, fs_info->sectorsize)) { > > + btrfs_warn(fs_info, > > + "devid %llu has unaligned total bytes %llu", > > + device->devid, device->disk_total_bytes); > > + btrfs_warn(fs_info, > > + "please shrink the device a little and resize back > > to fix it"); > > + } > > How about telling uses to know device->total_bytes should be alligned > to fs_info->sectorsize here? > > Thanks, I should make my comment clearer, sorry. === + if (!IS_ALIGNED(device->total_bytes, fs_info->sectorsize)) { + btrfs_warn(fs_info, + "devid %llu: total bytes %llu should be aligned to %u bytes", + device->devid, device->disk_total_bytes, fs_info->sectorsize); + btrfs_warn(fs_info, + "please shrink the device a little and resize back to fix it"); + } === Thanks, Satoru > Satoru > > > device->commit_total_bytes = device->disk_total_bytes; > > device->bytes_used = btrfs_device_bytes_used(leaf, dev_item); > > device->commit_bytes_used = device->bytes_used; > > @@ -6625,7 +6633,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info, > > return -EINVAL; > > } > > > > - fill_device_from_item(leaf, dev_item, device); > > + fill_device_from_item(fs_info, leaf, dev_item, device); > > device->in_fs_metadata = 1; > > if (device->writeable && !device->is_tgtdev_for_dev_replace) { > > device->fs_devices->total_rw_bytes += device->total_bytes; > > -- > > 2.14.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Remove WARN_ON for unaligned device created before v4.13 and adds more user friendly output
At Wed, 20 Sep 2017 15:18:43 +0900, Qu Wenruo wrote: > > Commit 7dfb8be11b5d ("btrfs: Round down values which are written for > total_bytes_size") is fixing the unaligned device size caused by > adding/shrinking device. > > It added a new WARN_ON() when device size is unaligned. > This is fine for new device added to btrfs using v4.13 kernel, but not > existing device whose total_bytes is already unaligned. > > And the WARN_ON() will get triggered every time a block group get > created/removed on the unaligned device. > > This patch will remove the WARN_ON(), and warn user more gently what's > happening and how to fix it. > > Reported-by: Rich Rauenzahn > Fixes: 7dfb8be11b5d ("btrfs: Round down values which are written for > total_bytes_size") > Signed-off-by: Qu Wenruo > --- > fs/btrfs/ctree.h | 1 - > fs/btrfs/volumes.c | 16 > 2 files changed, 12 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 5a8933da39a7..4de9269e435a 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1562,7 +1562,6 @@ static inline void btrfs_set_device_total_bytes(struct > extent_buffer *eb, > { > BUILD_BUG_ON(sizeof(u64) != >sizeof(((struct btrfs_dev_item *)0))->total_bytes); > - WARN_ON(!IS_ALIGNED(val, eb->fs_info->sectorsize)); > btrfs_set_64(eb, s, offsetof(struct btrfs_dev_item, total_bytes), val); > } > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 0e8f16c305df..afae25df6a8c 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -6472,15 +6472,23 @@ static int read_one_chunk(struct btrfs_fs_info > *fs_info, struct btrfs_key *key, > return 0; > } > > -static void fill_device_from_item(struct extent_buffer *leaf, > - struct btrfs_dev_item *dev_item, > - struct btrfs_device *device) > +static void fill_device_from_item(struct btrfs_fs_info *fs_info, > + struct extent_buffer *leaf, > + struct btrfs_dev_item *dev_item, > + struct btrfs_device *device) > { > unsigned long ptr; > > device->devid = btrfs_device_id(leaf, dev_item); > device->disk_total_bytes = btrfs_device_total_bytes(leaf, dev_item); > device->total_bytes = device->disk_total_bytes; > + if (!IS_ALIGNED(device->total_bytes, fs_info->sectorsize)) { > + btrfs_warn(fs_info, > +"devid %llu has unaligned total bytes %llu", > +device->devid, device->disk_total_bytes); > + btrfs_warn(fs_info, > +"please shrink the device a little and resize back > to fix it"); > + } How about telling uses to know device->total_bytes should be alligned to fs_info->sectorsize here? Thanks, Satoru > device->commit_total_bytes = device->disk_total_bytes; > device->bytes_used = btrfs_device_bytes_used(leaf, dev_item); > device->commit_bytes_used = device->bytes_used; > @@ -6625,7 +6633,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info, > return -EINVAL; > } > > - fill_device_from_item(leaf, dev_item, device); > + fill_device_from_item(fs_info, leaf, dev_item, device); > device->in_fs_metadata = 1; > if (device->writeable && !device->is_tgtdev_for_dev_replace) { > device->fs_devices->total_rw_bytes += device->total_bytes; > -- > 2.14.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use self-explaining variable
On 2017年09月23日 08:48, Liu Bo wrote: On Sat, Sep 23, 2017 at 08:46:55AM +0800, Qu Wenruo wrote: On 2017年09月23日 07:36, Liu Bo wrote: This uses a bool 'do_backup' to help understand this piece of code. Signed-off-by: Liu Bo --- This is based on a patch "Btrfs: do not backup tree roots when fsync". fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index cdb7043..9811b9d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3691,6 +3691,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) int max_errors; int total_errors = 0; u64 flags; + bool do_backup = (max_mirrors == 0); Why not replacing @max_mirrors with @do_backup as parameter? If I read the code correctly, max_mirrors is not just for deciding backup. That's strange. write_all_supers() only uses @max_mirrors by passing it to write_dev_supers() and wait_dev_supers(). Both of the write/wait_dev_supers() will replace @max_mirrors to BTRFS_SUPER_MIRROR_MAX if it's zero. Further more, all write_all_supers() callers just pass @max_mirrors as bool (either 0 or 1). So I don't see any point not replacing the parameter as bool. Thanks, Qu thanks, -liubo Thanks, Qu do_barriers = !btrfs_test_opt(fs_info, NOBARRIER); @@ -3699,7 +3700,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) * not from fsync where the tree roots in fs_info have not * been consistent on disk. */ - if (max_mirrors == 0) + if (do_backup) backup_super_roots(fs_info); sb = fs_info->super_for_commit; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block
On Sat, Sep 23, 2017 at 09:49:38AM +0900, Satoru Takeuchi wrote: > At Tue, 19 Sep 2017 17:50:09 -0600, > Liu Bo wrote: > > > > This seems to be a leftover of commit cf8cddd38bab ("btrfs: don't > > abuse REQ_OP_* flags for btrfs_map_block"). > > > > It should use btrfs_op() helper to provide one of 'enum btrfs_map_op' > > types. > > > > Signed-off-by: Liu Bo > > Reviewed-by: Satoru Takeuchi > > I checked all callers of the following functions and there is no leftover. > > - btrfs_map_block > - btrfs_map_sblock > - __btrfs_map_block I did check them, but thanks a lot for double checking. thanks, -liubo > > Thanks, > Satoru > > > --- > > fs/btrfs/volumes.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > > index bd679bc..bd7b75d3 100644 > > --- a/fs/btrfs/volumes.c > > +++ b/fs/btrfs/volumes.c > > @@ -6229,7 +6229,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info > > *fs_info, struct bio *bio, > > map_length = length; > > > > btrfs_bio_counter_inc_blocked(fs_info); > > - ret = __btrfs_map_block(fs_info, bio_op(bio), logical, > > + ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, > > &map_length, &bbio, mirror_num, 1); > > if (ret) { > > btrfs_bio_counter_dec(fs_info); > > -- > > 2.9.4 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use self-explaining variable
On Sat, Sep 23, 2017 at 08:46:55AM +0800, Qu Wenruo wrote: > > > On 2017年09月23日 07:36, Liu Bo wrote: > > This uses a bool 'do_backup' to help understand this piece of code. > > > > Signed-off-by: Liu Bo > > --- > > This is based on a patch "Btrfs: do not backup tree roots when fsync". > > > > fs/btrfs/disk-io.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > > index cdb7043..9811b9d 100644 > > --- a/fs/btrfs/disk-io.c > > +++ b/fs/btrfs/disk-io.c > > @@ -3691,6 +3691,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, > > int max_mirrors) > > int max_errors; > > int total_errors = 0; > > u64 flags; > > + bool do_backup = (max_mirrors == 0); > > Why not replacing @max_mirrors with @do_backup as parameter? If I read the code correctly, max_mirrors is not just for deciding backup. thanks, -liubo > > Thanks, > Qu > > do_barriers = !btrfs_test_opt(fs_info, NOBARRIER); > > @@ -3699,7 +3700,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, > > int max_mirrors) > > * not from fsync where the tree roots in fs_info have not > > * been consistent on disk. > > */ > > - if (max_mirrors == 0) > > + if (do_backup) > > backup_super_roots(fs_info); > > sb = fs_info->super_for_commit; > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block
At Tue, 19 Sep 2017 17:50:09 -0600, Liu Bo wrote: > > This seems to be a leftover of commit cf8cddd38bab ("btrfs: don't > abuse REQ_OP_* flags for btrfs_map_block"). > > It should use btrfs_op() helper to provide one of 'enum btrfs_map_op' > types. > > Signed-off-by: Liu Bo Reviewed-by: Satoru Takeuchi I checked all callers of the following functions and there is no leftover. - btrfs_map_block - btrfs_map_sblock - __btrfs_map_block Thanks, Satoru > --- > fs/btrfs/volumes.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index bd679bc..bd7b75d3 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -6229,7 +6229,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info > *fs_info, struct bio *bio, > map_length = length; > > btrfs_bio_counter_inc_blocked(fs_info); > - ret = __btrfs_map_block(fs_info, bio_op(bio), logical, > + ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, > &map_length, &bbio, mirror_num, 1); > if (ret) { > btrfs_bio_counter_dec(fs_info); > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use self-explaining variable
On 2017年09月23日 07:36, Liu Bo wrote: This uses a bool 'do_backup' to help understand this piece of code. Signed-off-by: Liu Bo --- This is based on a patch "Btrfs: do not backup tree roots when fsync". fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index cdb7043..9811b9d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3691,6 +3691,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) int max_errors; int total_errors = 0; u64 flags; + bool do_backup = (max_mirrors == 0); Why not replacing @max_mirrors with @do_backup as parameter? Thanks, Qu do_barriers = !btrfs_test_opt(fs_info, NOBARRIER); @@ -3699,7 +3700,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) * not from fsync where the tree roots in fs_info have not * been consistent on disk. */ - if (max_mirrors == 0) + if (do_backup) backup_super_roots(fs_info); sb = fs_info->super_for_commit; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: use self-explaining variable
This uses a bool 'do_backup' to help understand this piece of code. Signed-off-by: Liu Bo --- This is based on a patch "Btrfs: do not backup tree roots when fsync". fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index cdb7043..9811b9d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3691,6 +3691,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) int max_errors; int total_errors = 0; u64 flags; + bool do_backup = (max_mirrors == 0); do_barriers = !btrfs_test_opt(fs_info, NOBARRIER); @@ -3699,7 +3700,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) * not from fsync where the tree roots in fs_info have not * been consistent on disk. */ - if (max_mirrors == 0) + if (do_backup) backup_super_roots(fs_info); sb = fs_info->super_for_commit; -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] fstests: btrfs/150 regression test for reading compressed data
We had a bug in btrfs compression code which could end up with a kernel panic. This is adding a regression test for the bug and I've also sent a kernel patch to fix the bug. The patch is "Btrfs: fix kernel oops while reading compressed data". Signed-off-by: Liu Bo --- v2: - Fix ambiguous copyright. - Use /proc/$pid/make-it-fail to specify IO failure - Use bash -c to run test only when pid is odd. - Add test to dangerous group. tests/btrfs/150 | 103 tests/btrfs/150.out | 3 ++ tests/btrfs/group | 1 + 3 files changed, 107 insertions(+) create mode 100755 tests/btrfs/150 create mode 100644 tests/btrfs/150.out diff --git a/tests/btrfs/150 b/tests/btrfs/150 new file mode 100755 index 000..8891c38 --- /dev/null +++ b/tests/btrfs/150 @@ -0,0 +1,103 @@ +#! /bin/bash +# FS QA Test btrfs/150 +# +# This is a regression test which ends up with a kernel oops in btrfs. +# It occurs when btrfs's read repair happens while reading a compressed +# extent. +# The patch to fix it is +# Btrfs: fix kernel oops while reading compressed data +# +#--- +# Copyright (c) 2017 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_fail_make_request +_require_scratch_dev_pool 2 + +SYSFS_BDEV=`_sysfs_dev $SCRATCH_DEV` +enable_io_failure() +{ +echo 100 > $DEBUGFS_MNT/fail_make_request/probability +echo 1000 > $DEBUGFS_MNT/fail_make_request/times +echo 0 > $DEBUGFS_MNT/fail_make_request/verbose +echo 1 > $SYSFS_BDEV/make-it-fail +} + +disable_io_failure() +{ +echo 0 > $SYSFS_BDEV/make-it-fail +echo 0 > $DEBUGFS_MNT/fail_make_request/probability +echo 0 > $DEBUGFS_MNT/fail_make_request/times +} + +_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1 + +# It doesn't matter which compression algorithm we use. +_scratch_mount -ocompress + +# Create a file with all data being compressed +$XFS_IO_PROG -f -c "pwrite -W 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io + +# Raid1 consists of two copies and btrfs decides which copy to read by reader's +# %pid. Now we inject errors to copy #1 and copy #0 is good. We want to read +# the bad copy to trigger read-repair. +while [[ -z $result ]]; do + # invalidate the page cache + $XFS_IO_PROG -f -c "fadvise -d 0 8K" $SCRATCH_MNT/foobar + + enable_io_failure + + result=$(bash -c " + if [ \$((\$\$ % 2)) == 1 ]; then + echo 1 > /proc/\$\$/make-it-fail + exec $XFS_IO_PROG -c \"pread 0 8K\" \$SCRATCH_MNT/foobar + fi") + + disable_io_failure +done + +# success, all done +status=0 +exit diff --git a/tests/btrfs/150.out b/tests/btrfs/150.out new file mode 100644 index 000..c492c24 --- /dev/null +++ b/tests/btrfs/150.out @@ -0,0 +1,3 @@ +QA output created by 150 +wrote 8192/8192 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) diff --git a/tests/btrfs/group b/tests/btrfs/group index 70c3f05..e73bb1b 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -152,3 +152,4 @@ 147 auto quick send 148 auto quick rw 149 auto quick send compress +150 auto quick dangerous -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: defragmenting best practice?
Am Freitag, 22. September 2017, 13:22:52 CEST schrieb Austin S. Hemmelgarn: > > I'm not sure where Firefox puts its cache, I only use it on very rare > > occasions. But I think it's going to .cache/mozilla last time looked > > at it. > > I'm pretty sure that is correct. FWIW, on my system Firefox's cache looks like this: % du -hsc (find .cache/mozilla/firefox/ -type f) | wc -l 9008 % du -hsc (find .cache/mozilla/firefox/ -type f) | sort -h | tail 5,4M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/ 83CEC8ADA08D9A9658458AB872BE107A216E71C6 5,5M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/ C60061B33D3BB91ED45951C922BAA1BB40022CB7 5,7M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/ 0900D9EA8E3222EB8690348C2482C69308B15A20 5,7M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/ F8E90D121B884360E36BCB1735CC5A8B1B7A744B 5,8M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/ 903C4CD01ABD74E353C7484C6E21A053AAC5DCC2 6,1M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/ 3A0D4193B009700155811D14A28DBE38C37C0067 6,1M.cache/mozilla/firefox/cb236e4s.default-1464421886682/startupCache/ scriptCache-current.bin 6,5M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/ 304405168662C3624D57AF98A74345464F32A0DB 8,8M.cache/mozilla/firefox/ik7qsfwb.Temp/cache2/entries/ BD7CA4125B3AA87D6B16C987741F33C65DBFFFDD 427Minsgesamt So lots of files, many of which are (I suppose) relatively large, but do not look "everything in one database" large to me. (This is with Firefox 55.0.2.) -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup signature.asc Description: This is a digitally signed message part.
Re: [PATCH] fstests: btrfs/150 regression test for reading compressed data
On Thu, Sep 21, 2017 at 02:39:52PM +1000, Dave Chinner wrote: > On Wed, Sep 20, 2017 at 05:52:43PM -0600, Liu Bo wrote: > > We had a bug in btrfs compression code which could end up with a > > kernel panic. > > > > This is adding a regression test for the bug and I've also sent a > > kernel patch to fix the bug. > > > > The patch is "Btrfs: fix kernel oops while reading compressed data". > > > > Signed-off-by: Liu Bo > > --- > > tests/btrfs/150 | 102 > > > > tests/btrfs/150.out | 3 ++ > > tests/btrfs/group | 1 + > > 3 files changed, 106 insertions(+) > > create mode 100755 tests/btrfs/150 > > create mode 100644 tests/btrfs/150.out > > > > diff --git a/tests/btrfs/150 b/tests/btrfs/150 > > new file mode 100755 > > index 000..834be51 > > --- /dev/null > > +++ b/tests/btrfs/150 > > @@ -0,0 +1,102 @@ > > +#! /bin/bash > > +# FS QA Test btrfs/150 > > +# > > +# This is a regression test which ends up with a kernel oops in btrfs. > > group += dangerous OK. > > > +# It occurs when btrfs's read repair happens while reading a compressed > > +# extent. > > +# The patch for this is > > +# x > > Incomplete? Urr, thanks for pointing it out. > > > +# > > +#--- > > +# Copyright (c) 2017 Liu Bo. All Rights Reserved. > > You're signing off this patch an Oracle employee, but claiming > personal copyright. Please clarify who owns the copyright - if it's > your personal copyright then please sign off with a personal email > address, not your employer's... > > Also, I note that these recently added tests from you: > > tests/btrfs/140:# Copyright (c) 2017 Liu Bo. All Rights Reserved. > tests/btrfs/141:# Copyright (c) 2017 Liu Bo. All Rights Reserved. > tests/btrfs/142:# Copyright (c) 2017 Liu Bo. All Rights Reserved. > tests/btrfs/143:# Copyright (c) 2017 Liu Bo. All Rights Reserved. > tests/generic/406:# Copyright (c) 2017 Liu Bo. All Rights Reserved. > > all have this same ambiguity - personal copyright with employer > signoff in the commit. This definitely needs clarification and > fixing if it is wrong > All right, will fix all of them (in a separate one). > > > +disable_io_failure() > > +{ > > +echo 0 > $SYSFS_BDEV/make-it-fail > > +echo 0 > $DEBUGFS_MNT/fail_make_request/probability > > +echo 0 > $DEBUGFS_MNT/fail_make_request/times > > +} > > + > > +_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1 > > + > > +# It doesn't matter which compression algorithm we use. > > +_scratch_mount -ocompress > > + > > +# Create a file with all data being compressed > > +$XFS_IO_PROG -f -c "pwrite -W 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io > > needs an fsync to reach disk. 'pwrite -W' has ensured that. > > > +# Raid1 consists of two copies and btrfs decides which copy to read by > > reader's > > +# %pid. Now we inject errors to copy #1 and copy #0 is good. We want to > > read > > +# the bad copy to trigger read-repair. > > +while true; do > > + disable_io_failure > > + # invalidate the page cache > > + $XFS_IO_PROG -f -c "fadvise -d 0 128K" $SCRATCH_MNT/foobar | > > _filter_xfs_io > > + > > + enable_io_failure > > + od -x $SCRATCH_MNT/foobar > /dev/null & > > why are you using od to read the data when the output is piped to > dev/null? why not just xfs_io -c "pread 0 8k" ? Oh yes, that's better, will do. thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: btrfs/150 regression test for reading compressed data
On Thu, Sep 21, 2017 at 03:03:45PM +0800, Lu Fengqi wrote: > On Wed, Sep 20, 2017 at 05:52:43PM -0600, Liu Bo wrote: > >We had a bug in btrfs compression code which could end up with a > >kernel panic. > > > >This is adding a regression test for the bug and I've also sent a > >kernel patch to fix the bug. > > > >The patch is "Btrfs: fix kernel oops while reading compressed data". > > > >Signed-off-by: Liu Bo > >--- > > tests/btrfs/150 | 102 > > > > tests/btrfs/150.out | 3 ++ > > tests/btrfs/group | 1 + > > 3 files changed, 106 insertions(+) > > create mode 100755 tests/btrfs/150 > > create mode 100644 tests/btrfs/150.out > > > >diff --git a/tests/btrfs/150 b/tests/btrfs/150 > >new file mode 100755 > >index 000..834be51 > >--- /dev/null > >+++ b/tests/btrfs/150 > >@@ -0,0 +1,102 @@ > >+#! /bin/bash > >+# FS QA Test btrfs/150 > >+# > >+# This is a regression test which ends up with a kernel oops in btrfs. > >+# It occurs when btrfs's read repair happens while reading a compressed > >+# extent. > >+# The patch for this is > >+# x > >+# > >+#--- > >+# Copyright (c) 2017 Liu Bo. All Rights Reserved. > >+# > >+# This program is free software; you can redistribute it and/or > >+# modify it under the terms of the GNU General Public License as > >+# published by the Free Software Foundation. > >+# > >+# This program is distributed in the hope that it would be useful, > >+# but WITHOUT ANY WARRANTY; without even the implied warranty of > >+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > >+# GNU General Public License for more details. > >+# > >+# You should have received a copy of the GNU General Public License > >+# along with this program; if not, write the Free Software Foundation, > >+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > >+#--- > >+# > >+ > >+seq=`basename $0` > >+seqres=$RESULT_DIR/$seq > >+echo "QA output created by $seq" > >+ > >+here=`pwd` > >+tmp=/tmp/$$ > >+status=1# failure is the default! > >+trap "_cleanup; exit \$status" 0 1 2 3 15 > >+ > >+_cleanup() > >+{ > >+cd / > >+rm -f $tmp.* > >+} > >+ > >+# get standard environment, filters and checks > >+. ./common/rc > >+. ./common/filter > >+ > >+# remove previous $seqres.full before test > >+rm -f $seqres.full > >+ > >+# real QA test starts here > >+ > >+# Modify as appropriate. > >+_supported_fs btrfs > >+_supported_os Linux > >+_require_scratch > >+_require_fail_make_request > >+_require_scratch_dev_pool 2 > >+ > >+SYSFS_BDEV=`_sysfs_dev $SCRATCH_DEV` > >+enable_io_failure() > >+{ > >+echo 100 > $DEBUGFS_MNT/fail_make_request/probability > >+echo 1000 > $DEBUGFS_MNT/fail_make_request/times > > What does 1000 mean? Enough failures? > Why not set times to -1? This was copied from another test, so I kept it as is. As this test just submits a single 8K read after enabling fault injection, 1000 is in fact as same as -1(no limit), I think 1000 is OK to use. thanks, -liubo > > >+echo 0 > $DEBUGFS_MNT/fail_make_request/verbose > >+echo 1 > $SYSFS_BDEV/make-it-fail > >+} > >+ > >+disable_io_failure() > >+{ > >+echo 0 > $SYSFS_BDEV/make-it-fail > >+echo 0 > $DEBUGFS_MNT/fail_make_request/probability > >+echo 0 > $DEBUGFS_MNT/fail_make_request/times > >+} > >+ > >+_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1 > >+ > >+# It doesn't matter which compression algorithm we use. > >+_scratch_mount -ocompress > >+ > >+# Create a file with all data being compressed > >+$XFS_IO_PROG -f -c "pwrite -W 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io > >+ > >+# Raid1 consists of two copies and btrfs decides which copy to read by > >reader's > >+# %pid. Now we inject errors to copy #1 and copy #0 is good. We want to > >read > >+# the bad copy to trigger read-repair. > >+while true; do > >+disable_io_failure > >+# invalidate the page cache > >+$XFS_IO_PROG -f -c "fadvise -d 0 128K" $SCRATCH_MNT/foobar | > >_filter_xfs_io > >+ > >+enable_io_failure > >+od -x $SCRATCH_MNT/foobar > /dev/null & > >+pid=$! > >+wait > >+[ $((pid % 2)) == 1 ] && break > >+done > >+ > >+disable_io_failure > >+ > >+# success, all done > >+status=0 > >+exit > >diff --git a/tests/btrfs/150.out b/tests/btrfs/150.out > >new file mode 100644 > >index 000..c492c24 > >--- /dev/null > >+++ b/tests/btrfs/150.out > >@@ -0,0 +1,3 @@ > >+QA output created by 150 > >+wrote 8192/8192 bytes at offset 0 > >+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > >diff --git a/tests/btrfs/group b/tests/btrfs/group > >index 70c3f05..b70a122 100644 > >--- a/tests/btrfs/group > >+++ b/tests/btrfs/group > >@@ -152,3 +152,4 @@ > > 147 auto quick send > > 148 auto quick rw > > 149 auto quick send compress > >+150 auto quick > >-- > >2.5.0 > > > >-
[PATCH] Btrfs: fix memory leak in raid56
The local bio_list may have pending bios when doing cleanup, it can end up with memory leak if they don't get free'd. Signed-off-by: Liu Bo --- fs/btrfs/raid56.c | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 2cf6ba4..063a2a0 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1325,6 +1325,9 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) cleanup: rbio_orig_end_io(rbio, BLK_STS_IOERR); + + while ((bio = bio_list_pop(&bio_list))) + bio_put(bio); } /* @@ -1580,6 +1583,10 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio) cleanup: rbio_orig_end_io(rbio, BLK_STS_IOERR); + + while ((bio = bio_list_pop(&bio_list))) + bio_put(bio); + return -EIO; finish: @@ -2105,6 +2112,10 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio) if (rbio->operation == BTRFS_RBIO_READ_REBUILD || rbio->operation == BTRFS_RBIO_REBUILD_MISSING) rbio_orig_end_io(rbio, BLK_STS_IOERR); + + while ((bio = bio_list_pop(&bio_list))) + bio_put(bio); + return -EIO; } @@ -2452,6 +2463,9 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio, cleanup: rbio_orig_end_io(rbio, BLK_STS_IOERR); + + while ((bio = bio_list_pop(&bio_list))) + bio_put(bio); } static inline int is_data_stripe(struct btrfs_raid_bio *rbio, int stripe) @@ -2561,12 +2575,12 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio) int stripe; struct bio *bio; + bio_list_init(&bio_list); + ret = alloc_rbio_essential_pages(rbio); if (ret) goto cleanup; - bio_list_init(&bio_list); - atomic_set(&rbio->error, 0); /* * build a list of bios to read all the missing parts of this @@ -2634,6 +2648,10 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio) cleanup: rbio_orig_end_io(rbio, BLK_STS_IOERR); + + while ((bio = bio_list_pop(&bio_list))) + bio_put(bio); + return; finish: -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents
The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and offset (encoded as a single logical address) to a list of extent refs. LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping (extent ref -> extent bytenr and offset, or logical address). These are useful capabilities for programs that manipulate extents and extent references from userspace (e.g. dedup and defrag utilities). When the extents are uncompressed (and not encrypted and not other), check_extent_in_eb performs filtering of the extent refs to remove any extent refs which do not contain the same extent offset as the 'logical' parameter's extent offset. This prevents LOGICAL_INO from returning references to more than a single block. To find the set of extent references to an uncompressed extent from [a, b), userspace has to run a loop like this pseudocode: for (i = a; i < b; ++i) extent_ref_set += LOGICAL_INO(i); At each iteration of the loop (up to 32768 iterations for a 128M extent), data we are interested in is collected in the kernel, then deleted by the filter in check_extent_in_eb. When the extents are compressed (or encrypted or other), the 'logical' parameter must be an extent bytenr (the 'a' parameter in the loop). No filtering by extent offset is done (or possible?) so the result is the complete set of extent refs for the entire extent. This removes the need for the loop, since we get all the extent refs in one call. Add an 'ignore_offset' argument to iterate_inodes_from_logical, [...several levels of function call graph...], and check_extent_in_eb, so that we can disable the extent offset filtering for uncompressed extents. This flag can be set by an improved version of the LOGICAL_INO ioctl to get either behavior as desired. There is no functional change in this patch. The new flag is always false. Signed-off-by: Zygo Blaxell --- fs/btrfs/backref.c| 63 ++- fs/btrfs/backref.h| 8 +++--- fs/btrfs/inode.c | 2 +- fs/btrfs/ioctl.c | 2 +- fs/btrfs/qgroup.c | 8 +++--- fs/btrfs/scrub.c | 6 ++--- fs/btrfs/send.c | 2 +- fs/btrfs/tests/qgroup-tests.c | 20 +++--- 8 files changed, 63 insertions(+), 48 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index b517ef1477ea..a2609786cd86 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -40,12 +40,14 @@ static int check_extent_in_eb(const struct btrfs_key *key, const struct extent_buffer *eb, const struct btrfs_file_extent_item *fi, u64 extent_item_pos, - struct extent_inode_elem **eie) + struct extent_inode_elem **eie, + bool ignore_offset) { u64 offset = 0; struct extent_inode_elem *e; - if (!btrfs_file_extent_compression(eb, fi) && + if (!ignore_offset && + !btrfs_file_extent_compression(eb, fi) && !btrfs_file_extent_encryption(eb, fi) && !btrfs_file_extent_other_encoding(eb, fi)) { u64 data_offset; @@ -84,7 +86,8 @@ static void free_inode_elem_list(struct extent_inode_elem *eie) static int find_extent_in_eb(const struct extent_buffer *eb, u64 wanted_disk_byte, u64 extent_item_pos, -struct extent_inode_elem **eie) +struct extent_inode_elem **eie, +bool ignore_offset) { u64 disk_byte; struct btrfs_key key; @@ -113,7 +116,7 @@ static int find_extent_in_eb(const struct extent_buffer *eb, if (disk_byte != wanted_disk_byte) continue; - ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie); + ret = check_extent_in_eb(&key, eb, fi, extent_item_pos, eie, ignore_offset); if (ret < 0) return ret; } @@ -419,7 +422,7 @@ static int add_indirect_ref(const struct btrfs_fs_info *fs_info, static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, struct ulist *parents, struct prelim_ref *ref, int level, u64 time_seq, const u64 *extent_item_pos, - u64 total_refs) + u64 total_refs, bool ignore_offset) { int ret = 0; int slot; @@ -472,7 +475,7 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, if (extent_item_pos) { ret = check_extent_in_eb(&key, eb, fi, *extent_item_pos, - &eie); + &eie, ignore_offset);
[PATCH 3/3] btrfs: increase output size for LOGICAL_INO_V2 ioctl
Build-server workloads have hundreds of references per file after dedup. Multiply by a few snapshots and we quickly exhaust the limit of 2730 references per extent that can fit into a 64K buffer. Raise the limit to 16M to be consistent with other btrfs ioctls (e.g. TREE_SEARCH_V2, FILE_EXTENT_SAME). To minimize surprising userspace behavior, apply this change only to the LOGICAL_INO_V2 ioctl. Signed-off-by: Zygo Blaxell --- fs/btrfs/ioctl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f4281ffd1833..1940678fc440 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4554,6 +4554,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, if (version == 1) { ignore_offset = false; + size = min_t(u32, loi->size, SZ_64K); } else { /* All reserved bits must be 0 for now */ if (memchr_inv(loi->reserved, 0, sizeof(loi->reserved))) { @@ -4566,6 +4567,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, goto out_loi; } ignore_offset = loi->flags & BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET; + size = min_t(u32, loi->size, SZ_16M); } path = btrfs_alloc_path(); @@ -4574,7 +4576,6 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, goto out; } - size = min_t(u32, loi->size, SZ_64K); inodes = init_data_container(size); if (IS_ERR(inodes)) { ret = PTR_ERR(inodes); -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2
Now that check_extent_in_eb()'s extent offset filter can be turned off, we need a way to do it from userspace. Add a 'flags' field to the btrfs_logical_ino_args structure to disable extent offset filtering, taking the place of one of the existing reserved[] fields. Previous versions of LOGICAL_INO neglected to check whether any of the reserved fields have non-zero values. Assigning meaning to those fields now may change the behavior of existing programs that left these fields uninitialized. The lack of a zero check also means that new programs have no way to know whether the kernel is honoring the flags field. To avoid these problems, define a new ioctl LOGICAL_INO_V2. We can use the same argument layout as LOGICAL_INO, but shorten the reserved[] array by one element and turn it into the 'flags' field. The V2 ioctl explicitly checks that reserved fields and unsupported flag bits are zero so that userspace can negotiate future feature bits as they are defined. Since the memory layouts of the two ioctls' arguments are compatible, there is no need for a separate function for logical_to_ino_v2 (contrast with tree_search_v2 vs tree_search where the layout and code are quite different). A version parameter and an 'if' statement will suffice. Now that we have a flags field in logical_ino_args, add a flag BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want, and pass it down the stack to iterate_inodes_from_logical. Signed-off-by: Zygo Blaxell --- fs/btrfs/ioctl.c | 26 +++--- include/uapi/linux/btrfs.h | 8 +++- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index b7de32568082..f4281ffd1833 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4536,13 +4536,14 @@ static int build_ino_list(u64 inum, u64 offset, u64 root, void *ctx) } static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, - void __user *arg) + void __user *arg, int version) { int ret = 0; int size; struct btrfs_ioctl_logical_ino_args *loi; struct btrfs_data_container *inodes = NULL; struct btrfs_path *path = NULL; + bool ignore_offset; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -4551,6 +4552,22 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, if (IS_ERR(loi)) return PTR_ERR(loi); + if (version == 1) { + ignore_offset = false; + } else { + /* All reserved bits must be 0 for now */ + if (memchr_inv(loi->reserved, 0, sizeof(loi->reserved))) { + ret = -EINVAL; + goto out_loi; + } + /* Only accept flags we have defined so far */ + if (loi->flags & ~(BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET)) { + ret = -EINVAL; + goto out_loi; + } + ignore_offset = loi->flags & BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET; + } + path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; @@ -4566,7 +4583,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, } ret = iterate_inodes_from_logical(loi->logical, fs_info, path, - build_ino_list, inodes, false); + build_ino_list, inodes, ignore_offset); if (ret == -EINVAL) ret = -ENOENT; if (ret < 0) @@ -4580,6 +4597,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_fs_info *fs_info, out: btrfs_free_path(path); kvfree(inodes); +out_loi: kfree(loi); return ret; @@ -5550,7 +5568,9 @@ long btrfs_ioctl(struct file *file, unsigned int case BTRFS_IOC_INO_PATHS: return btrfs_ioctl_ino_to_path(root, argp); case BTRFS_IOC_LOGICAL_INO: - return btrfs_ioctl_logical_to_ino(fs_info, argp); + return btrfs_ioctl_logical_to_ino(fs_info, argp, 1); + case BTRFS_IOC_LOGICAL_INO_V2: + return btrfs_ioctl_logical_to_ino(fs_info, argp, 2); case BTRFS_IOC_SPACE_INFO: return btrfs_ioctl_space_info(fs_info, argp); case BTRFS_IOC_SYNC: { diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 378230c163d5..99bb7988e6fe 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -608,10 +608,14 @@ struct btrfs_ioctl_ino_path_args { struct btrfs_ioctl_logical_ino_args { __u64 logical;/* in */ __u64 size; /* in */ - __u64 reserved[4]; + __u64 reserved[3];/* must be 0 for now */ + __u64
[PATCH v3] btrfs: LOGICAL_INO enhancements
Changelog: v3-v2: - Stricter check on reserved[] field - now must be all zero, or userspace gets EINVAL. This prevents userspace from setting any of the reserved bits without the kernel providing an unambiguous interpretation of them, and doesn't require us to burn a flag bit for each one. - Moved 'flags' to the end of the reserved[] array. This allows existing source code using version 1 of the ioctl to behave the same way when using version 2 of the btrfs_ioctl_logical_ino_args struct definition (i.e. reserved[3] becomes an alias for 'flags', and the addresses of reserved[0-2] don't change). - Clarified the reasoning in the commit message for patch 2, "btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2". v2: - added patch series intro text - rebased on 4.14-rc1. v1: This patch series fixes some weaknesses in the btrfs LOGICAL_INO ioctl. Background: Suppose we have a file with one extent: root@tester:~# zcat /usr/share/doc/cpio/changelog.gz > /test/a root@tester:~# sync Split the extent by overwriting it in the middle: root@tester:~# cat /dev/urandom | dd bs=4k seek=2 skip=2 count=1 conv=notrunc of=/test/a We should now have 3 extent refs to 2 extents, with one block unreachable. The extent tree looks like: root@tester:~# btrfs-debug-tree /dev/vdc -t 2 [...] item 9 key (1103101952 EXTENT_ITEM 73728) itemoff 15942 itemsize 53 extent refs 2 gen 29 flags DATA extent data backref root 5 objectid 261 offset 0 count 2 [...] item 11 key (1103175680 EXTENT_ITEM 4096) itemoff 15865 itemsize 53 extent refs 1 gen 30 flags DATA extent data backref root 5 objectid 261 offset 8192 count 1 [...] and the ref tree looks like: root@tester:~# btrfs-debug-tree /dev/vdc -t 5 [...] item 6 key (261 EXTENT_DATA 0) itemoff 15825 itemsize 53 extent data disk byte 1103101952 nr 73728 extent data offset 0 nr 8192 ram 73728 extent compression(none) item 7 key (261 EXTENT_DATA 8192) itemoff 15772 itemsize 53 extent data disk byte 1103175680 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression(none) item 8 key (261 EXTENT_DATA 12288) itemoff 15719 itemsize 53 extent data disk byte 1103101952 nr 73728 extent data offset 12288 nr 61440 ram 73728 extent compression(none) [...] There are two references to the same extent with different, non-overlapping byte offsets: [--72K extent at 1103101952--] [--8K|--4K unreachable|--60K-] ^ ^ | | [--8K ref offset 0--][--4K ref offset 0--][--60K ref offset 12K--] | v [-4K extent-] at 1103175680 We want to find all of the references to extent bytenr 1103101952. Without the patch (and without running btrfs-debug-tree), we have to do it with 18 LOGICAL_INO calls: root@tester:~# btrfs ins log 1103101952 -P /test/ Using LOGICAL_INO inode 261 offset 0 root 5 root@tester:~# for x in $(seq 0 17); do btrfs ins log $((1103101952 + x * 4096)) -P /test/; done 2>&1 | grep inode inode 261 offset 0 root 5 inode 261 offset 4096 root 5 <- same extent ref as offset 0 (offset 8192 returns empty set, not reachable) inode 261 offset 12288 root 5 inode 261 offset 16384 root 5 \ inode 261 offset 20480 root 5 | inode 261 offset 24576 root 5 | inode 261 offset 28672 root 5 | inode 261 offset 32768 root 5 | inode 261 offset 36864 root 5 \ inode 261 offset 40960 root 5 > all the same extent ref as offset 12288. inode 261 offset 45056 root 5 / More processing required in userspace inode 261 offset 49152 root 5 | to figure out these are all duplicates. inode 261 offset 53248 root 5 | inode 261 offset 57344 root 5 | inode 261 offset 61440 root 5 | inode 261 offset 65536 root 5 | inode 261 offset 69632 root 5 / In the worst case the extents are 128MB long, and we have to do 32768 iterations of the loop to find one 4K extent ref. With the patch, we just use one call to map all refs to the extent at once: root@tester:~# btrfs ins log 1103101952 -P /test/ Using LOGICAL_INO_V2 inode 261 offset 0 root 5 inode 261 offset 12288 root 5 The TREE_SEARCH ioctl allows userspace to retrieve the offset and extent bytenr fields easily once the root, inode and offset are known. This is sufficient information to b
Re: using fio to test btrfs compression
2017-09-22 8:33 GMT+03:00 shally verma : > On Wed, Sep 20, 2017 at 5:26 PM, Timofey Titovets > wrote: >> 2017-09-20 14:10 GMT+03:00 shally verma : >>> Interesting part is I dont see "encoded" under flags. I couldn't >>> understand if flags are retrieved from btrfs file metadata info. As >>> you are running on 4.14 and I am on 4.9 >>> >>> So, am still under doubt, even with dd if files are getting compressed. >>> >>> What is the filesize shown if you run >>> btrfs fi du /mnt/test0.0 .. is it less or actual size? >>> >>> Is there any command that i can run to confirm file has been compressed? >>> >>> So far, I had my prints enabled in kernel/fs/btrfs/compression.c and >>> check in dmesg that code jumped to compress_page() func. >>> >>> Thanks >>> Shally >>> >> >> Okay, lets play different. >> encoded work for last several years for kernel releases, so you must see >> that. >> >> Reproduction script: >> #!/bin/bash -e >> >> FILE_NAME=$RANDOM$RANDOM >> TMP_DIR=$(mktemp -d) >> IMAGE_FILE="$HOME/$FILE_NAME" >> >> truncate -s 4G $IMAGE_FILE >> mkfs.btrfs -m single -L COMPRESS_TEST $IMAGE_FILE >> mount -o compress-force $IMAGE_FILE $TMP_DIR >> dd if=/dev/zero bs=128K count=2 of=$TMP_DIR/zero >> sync >> filefrag -v $TMP_DIR/zero >> umount $TMP_DIR >> rm -v $IMAGE_FILE >> >> Example output: >> ~ sudo ./btrfs_compress_test.sh >> btrfs-progs v4.13 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: COMPRESS_TEST >> UUID: abfedc39-dd94-4105-87d6-49eedb13467f >> Node size: 16384 >> Sector size:4096 >> Filesystem size:4.00GiB >> Block group profiles: >> Data: single8.00MiB >> Metadata: single8.00MiB >> System: single4.00MiB >> SSD detected: no >> Incompat features: extref, skinny-metadata >> Number of devices: 1 >> Devices: >> IDSIZE PATH >>1 4.00GiB /root/322906281 >> >> 2+0 records in >> 2+0 records out >> 262144 bytes (262 kB, 256 KiB) copied, 0.000197746 s, 1.3 GB/s >> Filesystem type is: 9123683e >> File size of /tmp/tmp.bDyt3EkEG5/zero is 262144 (64 blocks of 4096 bytes) >> ext: logical_offset:physical_offset: length: expected: flags: >> 0:0.. 31: 3072.. 3103: 32: encoded >> 1: 32.. 63: 3073.. 3104: 32: 3104: >> last,encoded,eof >> /tmp/tmp.bDyt3EkEG5/zero: 2 extents found >> removed '/root/322906281' >> >> Good luck. > Here's my output - Everything is same except: > > 1. nodesize and sector size = 64K > 2. extent length = 2 > 3. I don't see "encoded" in filefrag here. > > btrfs-progs v4.13 > See http://btrfs.wiki.kernel.org for more information. > > Label: COMPRESS_TEST > UUID: fad6907e-d4eb-4dbb-9014-3918a822c9ce > Node size: 65536 > Sector size:65536 > Filesystem size:4.00GiB > Block group profiles: > Data: single8.00MiB > Metadata: single8.00MiB > System: single4.00MiB > SSD detected: no > Incompat features: extref, skinny-metadata > Number of devices: 1 > Devices: >IDSIZE PATH > 1 4.00GiB /root/2808626087 > > 2+0 records in > 2+0 records out > 262144 bytes (262 kB) copied, 0.00028777 s, 911 MB/s > Filesystem type is: 9123683e > File size of /tmp/tmp.346ESCdOIi/zero is 262144 (4 blocks of 65536 bytes) > ext: logical_offset:physical_offset: length: expected: flags: >0:0.. 1:192.. 193: 2: >1:2.. 3:193.. 194: 2:194: eof > /tmp/tmp.346ESCdOIi/zero: 2 extents found > removed '/root/2808626087' > > And this is my dmesg > > [170127.417119] BTRFS: device label COMPRESS_TEST devid 1 transid 5 /dev/loop0 > [170127.417493] BTRFS info (device loop0): force zlib compression > [170127.417496] BTRFS info (device loop0): disk space caching is enabled > [170127.417499] BTRFS info (device loop0): has skinny extents > [170127.425858] BTRFS info (device loop0): creating UUID tree > > This is fio --version > fio-3.0 > > What do we doubt here? > > Thanks > Shally > >> -- >> Have a nice day, >> Timofey. Sorry, no idea. -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[josef-btrfs:new-kill-btree-inode 20/20] ERROR: "radix_tree_iter_tag_set" [fs/btrfs/btrfs.ko] undefined!
tree: https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git new-kill-btree-inode head: 3264d40ac916ca952f0b85e39304b9199ff36d8b commit: 3264d40ac916ca952f0b85e39304b9199ff36d8b [20/20] Btrfs: kill the btree_inode config: x86_64-randconfig-i0-201738 (attached as .config) compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 reproduce: git checkout 3264d40ac916ca952f0b85e39304b9199ff36d8b # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): >> ERROR: "radix_tree_iter_tag_set" [fs/btrfs/btrfs.ko] undefined! --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[josef-btrfs:new-kill-btree-inode 17/20] fs/ntfs/attrib.c:2549:35: error: implicit declaration of function 'inode_to_bdi'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git new-kill-btree-inode head: 3264d40ac916ca952f0b85e39304b9199ff36d8b commit: a03edce0823e97312869c60933b37157c3b8866a [17/20] remove mapping from balance_dirty_pages*() config: x86_64-randconfig-x007-201738 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: git checkout a03edce0823e97312869c60933b37157c3b8866a # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): fs/ntfs/attrib.c: In function 'ntfs_attr_set': >> fs/ntfs/attrib.c:2549:35: error: implicit declaration of function >> 'inode_to_bdi' [-Werror=implicit-function-declaration] balance_dirty_pages_ratelimited(inode_to_bdi(inode), ^~~~ fs/ntfs/attrib.c:2549:35: warning: passing argument 1 of 'balance_dirty_pages_ratelimited' makes pointer from integer without a cast [-Wint-conversion] In file included from include/linux/memcontrol.h:31:0, from include/linux/swap.h:8, from fs/ntfs/attrib.c:26: include/linux/writeback.h:380:6: note: expected 'struct backing_dev_info *' but argument is of type 'int' void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi, ^~~ fs/ntfs/attrib.c:2591:35: warning: passing argument 1 of 'balance_dirty_pages_ratelimited' makes pointer from integer without a cast [-Wint-conversion] balance_dirty_pages_ratelimited(inode_to_bdi(inode), ^~~~ In file included from include/linux/memcontrol.h:31:0, from include/linux/swap.h:8, from fs/ntfs/attrib.c:26: include/linux/writeback.h:380:6: note: expected 'struct backing_dev_info *' but argument is of type 'int' void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi, ^~~ fs/ntfs/attrib.c:2609:35: warning: passing argument 1 of 'balance_dirty_pages_ratelimited' makes pointer from integer without a cast [-Wint-conversion] balance_dirty_pages_ratelimited(inode_to_bdi(inode), ^~~~ In file included from include/linux/memcontrol.h:31:0, from include/linux/swap.h:8, from fs/ntfs/attrib.c:26: include/linux/writeback.h:380:6: note: expected 'struct backing_dev_info *' but argument is of type 'int' void balance_dirty_pages_ratelimited(struct backing_dev_info *bdi, ^~~ cc1: some warnings being treated as errors vim +/inode_to_bdi +2549 fs/ntfs/attrib.c 2472 2473 /** 2474 * ntfs_attr_set - fill (a part of) an attribute with a byte 2475 * @ni: ntfs inode describing the attribute to fill 2476 * @ofs:offset inside the attribute at which to start to fill 2477 * @cnt:number of bytes to fill 2478 * @val:the unsigned 8-bit value with which to fill the attribute 2479 * 2480 * Fill @cnt bytes of the attribute described by the ntfs inode @ni starting at 2481 * byte offset @ofs inside the attribute with the constant byte @val. 2482 * 2483 * This function is effectively like memset() applied to an ntfs attribute. 2484 * Note thie function actually only operates on the page cache pages belonging 2485 * to the ntfs attribute and it marks them dirty after doing the memset(). 2486 * Thus it relies on the vm dirty page write code paths to cause the modified 2487 * pages to be written to the mft record/disk. 2488 * 2489 * Return 0 on success and -errno on error. An error code of -ESPIPE means 2490 * that @ofs + @cnt were outside the end of the attribute and no write was 2491 * performed. 2492 */ 2493 int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val) 2494 { 2495 ntfs_volume *vol = ni->vol; 2496 struct inode *inode = VFS_I(ni); 2497 struct address_space *mapping; 2498 struct page *page; 2499 u8 *kaddr; 2500 pgoff_t idx, end; 2501 unsigned start_ofs, end_ofs, size; 2502 2503 ntfs_debug("Entering for ofs 0x%llx, cnt 0x%llx, val 0x%hx.", 2504 (long long)ofs, (long long)cnt, val); 2505 BUG_ON(ofs < 0); 2506 BUG_ON(cnt < 0); 2507 if (!cnt) 2508 goto done; 2509 /* 2510 * FIXME: Compressed and encrypted attributes are not supported when 2511 * writing and we should never have gotten here for them. 2512 */ 2513 BUG_ON(NInoCompressed(ni)); 2514 BUG_ON(NInoEncrypted(ni)); 2515 mapping = VFS_I(ni)->i_mapping; 2516 /* Work out the starting index and page offset. */ 2517
Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option
On 2017年09月22日 21:33, Austin S. Hemmelgarn wrote: On 2017-09-22 08:32, Qu Wenruo wrote: On 2017年09月22日 19:38, Austin S. Hemmelgarn wrote: On 2017-09-22 06:39, Qu Wenruo wrote: As I already stated in an other thread, if you want to shrink, do it in another command line tool. Do one thing and do it simple. (Although Btrfs itself is already out of the UNIX way) Unless I'm reading the code wrong, the shrinking isn't happening in a second pass, so this _is_ doing one thing, and it appears to be doing it as simply as possible (although arguably not correctly because of the 1MB reserved area being used). If you're referring to my V1 implementation of shrink, that's doing *one* thing. But the original shrinking code? Nope, or we won't have the custom chunk allocator at all. What I really mean is, if one wants to shrink, at least don't couple the shrink code into "mkfs.btrfs". Do shrink in its own tool/subcommand, not in a really unrelated tool. There are two cases for shrinking a filesystem: 1. You're resizing it to move to a smaller disk (or speed up copying to another disk). 2. You're generating a filesystem image that needs to be as small as possible. I would argue there is no meaning of creating *smallest* image. (Of course it exists) We could put tons of code to implement, and more (or less) test cases to verify it. But the demand doesn't validate the effort. All my points are clear for this patchset: I know I removed one function, and my reason is: 1) No or little usage And it's anti intuition. 2) Dead code (not tested nor well documented) 3) Possible workaround I can add several extra reasons as I stated before, but number of reasons won't validate anything anyway. Building software is always trading one thing for another. I understand there may be some need for this function, but it doesn't validate the cost. And I think the fact that until recently a mail reported about the shrinking behavior has already backed up my point. Thanks, Qu Case 1 is obviously unrelated to creating a filesystem. Case 2 however is kind of integral to the creation of the filesystem image itself by definition, especially for a CoW filesystem because it's not possible to shrink to the absolute smallest size due to the GlobalReserve and other things. Similarly, there are two primary use cases for pre-loading the filesystem with data: 1. Avoiding a copy when reprovisioning storage on a system. For example, splitting a directory out to a new filesystem, you could use the -r option to avoid having to copy the data after mounting the filesystem. 2. Creating base images for systems. The first case shouldn't need the shrinking functionality, but the second is a very common use case together with the second usage for shrinking a filesystem. It may be offline shrink/balance. But not to further complexing the --rootdir option now. > And you also said that, the shrink feature is not a popular feature *NOW*, then I don't think it's worthy to implment it *NOW* either. Implement future feature in the future please. I'm not sure about you, but I could have sworn that he meant seed devices weren't a popular feature right now, Oh, sorry for my misunderstanding. not that the shrinking is. As a general rule, the whole option of pre-loading a filesystem with data as you're creating it is not a popular feature, because most sysadmins are much more willing to trust adding data after the filesystem is created. Personally, given the existence of seed devices, I would absolutely expect there to be a quick and easy way to generate a minimalistic image using a single command (because realistic usage of seed devices implies minimalistic images). I agree that it shouldn't be the default behavior, but I don't think it needs to be removed completely. Just like I said in cover letter, even for ext*, it's provided by genext2fs, not mke2fs. Then maybe this should get split out into a separate tool instead of just removing it completely? There is obviously at least some interest in this functionality. I totally understand end-user really want a do-it-all solution. But from developers' view, the old UNIX way is better to maintain code clean and easy to read. What the code is doing should have near zero impact on readability. If it did, then the BTRFS code in general is already way beyond most people. In fact, you can even create your script to do the old behavior if you don't care that the result may not fully take use of the space, just by: 1) Calculate the size of the whole directory "du" command can do it easily, and it does things better than us! For years! Um, no it actually doesn't do things better in all cases. it doesn't account for extended attributes, or metadata usage, or any number of other things that factor into how much space a file or directory will take up on BTRFS. It's good enough for finding what's using most of your space, but it'
Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option
On 2017-09-22 08:32, Qu Wenruo wrote: On 2017年09月22日 19:38, Austin S. Hemmelgarn wrote: On 2017-09-22 06:39, Qu Wenruo wrote: As I already stated in an other thread, if you want to shrink, do it in another command line tool. Do one thing and do it simple. (Although Btrfs itself is already out of the UNIX way) Unless I'm reading the code wrong, the shrinking isn't happening in a second pass, so this _is_ doing one thing, and it appears to be doing it as simply as possible (although arguably not correctly because of the 1MB reserved area being used). If you're referring to my V1 implementation of shrink, that's doing *one* thing. But the original shrinking code? Nope, or we won't have the custom chunk allocator at all. What I really mean is, if one wants to shrink, at least don't couple the shrink code into "mkfs.btrfs". Do shrink in its own tool/subcommand, not in a really unrelated tool. There are two cases for shrinking a filesystem: 1. You're resizing it to move to a smaller disk (or speed up copying to another disk). 2. You're generating a filesystem image that needs to be as small as possible. Case 1 is obviously unrelated to creating a filesystem. Case 2 however is kind of integral to the creation of the filesystem image itself by definition, especially for a CoW filesystem because it's not possible to shrink to the absolute smallest size due to the GlobalReserve and other things. Similarly, there are two primary use cases for pre-loading the filesystem with data: 1. Avoiding a copy when reprovisioning storage on a system. For example, splitting a directory out to a new filesystem, you could use the -r option to avoid having to copy the data after mounting the filesystem. 2. Creating base images for systems. The first case shouldn't need the shrinking functionality, but the second is a very common use case together with the second usage for shrinking a filesystem. It may be offline shrink/balance. But not to further complexing the --rootdir option now. > And you also said that, the shrink feature is not a popular feature *NOW*, then I don't think it's worthy to implment it *NOW* either. Implement future feature in the future please. I'm not sure about you, but I could have sworn that he meant seed devices weren't a popular feature right now, Oh, sorry for my misunderstanding. not that the shrinking is. As a general rule, the whole option of pre-loading a filesystem with data as you're creating it is not a popular feature, because most sysadmins are much more willing to trust adding data after the filesystem is created. Personally, given the existence of seed devices, I would absolutely expect there to be a quick and easy way to generate a minimalistic image using a single command (because realistic usage of seed devices implies minimalistic images). I agree that it shouldn't be the default behavior, but I don't think it needs to be removed completely. Just like I said in cover letter, even for ext*, it's provided by genext2fs, not mke2fs. Then maybe this should get split out into a separate tool instead of just removing it completely? There is obviously at least some interest in this functionality. I totally understand end-user really want a do-it-all solution. But from developers' view, the old UNIX way is better to maintain code clean and easy to read. What the code is doing should have near zero impact on readability. If it did, then the BTRFS code in general is already way beyond most people. In fact, you can even create your script to do the old behavior if you don't care that the result may not fully take use of the space, just by: 1) Calculate the size of the whole directory "du" command can do it easily, and it does things better than us! For years! Um, no it actually doesn't do things better in all cases. it doesn't account for extended attributes, or metadata usage, or any number of other things that factor into how much space a file or directory will take up on BTRFS. It's good enough for finding what's using most of your space, but it's not reliable for determining how much space you need to store that data (especially once you throw in in-line compression). 2) Multiple the value according to the meta/data profile Take care of small files, which will be inlined. And don't forget size for data checksum. (BTW, there is no way to change the behavor of inlined data and data checksum for mkfs. unlike btrfs-convert) This is where the issue lies. It's not possible for a person to calculate this with reasonable accuracy, and you arguably can't even do it for certain programmatically without some serious work. 3) Create a file with size calculated by step 2) 4) Execute "mkfs.btrfs -d " The main issues here are that it wasn't documented well (like many other things in BTRFS), and it didn't generate a filesystem that was properly compliant with the on-disk format (because it use
Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option
On 2017年09月22日 19:38, Austin S. Hemmelgarn wrote: On 2017-09-22 06:39, Qu Wenruo wrote: As I already stated in an other thread, if you want to shrink, do it in another command line tool. Do one thing and do it simple. (Although Btrfs itself is already out of the UNIX way) Unless I'm reading the code wrong, the shrinking isn't happening in a second pass, so this _is_ doing one thing, and it appears to be doing it as simply as possible (although arguably not correctly because of the 1MB reserved area being used). If you're referring to my V1 implementation of shrink, that's doing *one* thing. But the original shrinking code? Nope, or we won't have the custom chunk allocator at all. What I really mean is, if one wants to shrink, at least don't couple the shrink code into "mkfs.btrfs". Do shrink in its own tool/subcommand, not in a really unrelated tool. It may be offline shrink/balance. But not to further complexing the --rootdir option now. > And you also said that, the shrink feature is not a popular feature *NOW*, then I don't think it's worthy to implment it *NOW* either. Implement future feature in the future please. I'm not sure about you, but I could have sworn that he meant seed devices weren't a popular feature right now, Oh, sorry for my misunderstanding. not that the shrinking is. As a general rule, the whole option of pre-loading a filesystem with data as you're creating it is not a popular feature, because most sysadmins are much more willing to trust adding data after the filesystem is created. Personally, given the existence of seed devices, I would absolutely expect there to be a quick and easy way to generate a minimalistic image using a single command (because realistic usage of seed devices implies minimalistic images). I agree that it shouldn't be the default behavior, but I don't think it needs to be removed completely. Just like I said in cover letter, even for ext*, it's provided by genext2fs, not mke2fs. I totally understand end-user really want a do-it-all solution. But from developers' view, the old UNIX way is better to maintain code clean and easy to read. In fact, you can even create your script to do the old behavior if you don't care that the result may not fully take use of the space, just by: 1) Calculate the size of the whole directory "du" command can do it easily, and it does things better than us! For years! 2) Multiple the value according to the meta/data profile Take care of small files, which will be inlined. And don't forget size for data checksum. (BTW, there is no way to change the behavor of inlined data and data checksum for mkfs. unlike btrfs-convert) 3) Create a file with size calculated by step 2) 4) Execute "mkfs.btrfs -d " The main issues here are that it wasn't documented well (like many other things in BTRFS), and it didn't generate a filesystem that was properly compliant with the on-disk format (because it used space in the 1MB reserved area at the beginning of the FS). Fixing those issues in no way requires removing the feature. Yes, 1MB can be fixed easily (although not properly). But the whole customized chunk allocator is the real problem. The almost dead code is always bug-prone. Replace it with updated generic chunk allocator is the way to avoid later whac-a-mole, and should be done asap. And further more, even following the existing shrink behavior, you still need to truncate the file all by yourself. Which is no better than creating a good sized file and then mkfs on it. Only if you pre-create the file. If the file doesn't exist, it gets created at the appropriate size. That's part of why the chunk allocations are screwed up and stuff gets put in the first 1MB, it generates the FS on-the-fly and writes it out as it's generating it. Nope, even you created the file in advance, it will still occupy the first 1M. BTW, you can get back the size calculation for shrink, but you will soon find that it's just the start of a new nightmare. Because there is no easy way to calculate the real metadata usage. The result (and the old calculator) will be no better than guessing it. (Well, just multiply the dir size by 2 will never go wrong) Thanks, Qu Thanks, Qu Sent: Friday, September 22, 2017 at 5:24 PM From: "Anand Jain" To: "Qu Wenruo" , linux-btrfs@vger.kernel.org Cc: dste...@suse.cz Subject: Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option +WARNING: Before v4.14 btrfs-progs, *--rootdir* will shrink the filesystem, +prevent user to make use of the remaining space. +In v4.14 btrfs-progs, this behavior is changed, and will not shrink the fs. +The result should be the same as `mkfs`, `mount` and then `cp -r`. + Hmm well. Shrink to fit exactly to the size of the given files-and-directory is indeed a nice feature. Which would help to create a golden-image btrfs seed device. Its not popular as of now, but a
Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option
On 2017-09-22 06:39, Qu Wenruo wrote: As I already stated in an other thread, if you want to shrink, do it in another command line tool. Do one thing and do it simple. (Although Btrfs itself is already out of the UNIX way) Unless I'm reading the code wrong, the shrinking isn't happening in a second pass, so this _is_ doing one thing, and it appears to be doing it as simply as possible (although arguably not correctly because of the 1MB reserved area being used). It may be offline shrink/balance. But not to further complexing the --rootdir option now. > And you also said that, the shrink feature is not a popular feature *NOW*, then I don't think it's worthy to implment it *NOW* either. Implement future feature in the future please. I'm not sure about you, but I could have sworn that he meant seed devices weren't a popular feature right now, not that the shrinking is. As a general rule, the whole option of pre-loading a filesystem with data as you're creating it is not a popular feature, because most sysadmins are much more willing to trust adding data after the filesystem is created. Personally, given the existence of seed devices, I would absolutely expect there to be a quick and easy way to generate a minimalistic image using a single command (because realistic usage of seed devices implies minimalistic images). I agree that it shouldn't be the default behavior, but I don't think it needs to be removed completely. The main issues here are that it wasn't documented well (like many other things in BTRFS), and it didn't generate a filesystem that was properly compliant with the on-disk format (because it used space in the 1MB reserved area at the beginning of the FS). Fixing those issues in no way requires removing the feature. And further more, even following the existing shrink behavior, you still need to truncate the file all by yourself. Which is no better than creating a good sized file and then mkfs on it. Only if you pre-create the file. If the file doesn't exist, it gets created at the appropriate size. That's part of why the chunk allocations are screwed up and stuff gets put in the first 1MB, it generates the FS on-the-fly and writes it out as it's generating it. Thanks, Qu Sent: Friday, September 22, 2017 at 5:24 PM From: "Anand Jain" To: "Qu Wenruo" , linux-btrfs@vger.kernel.org Cc: dste...@suse.cz Subject: Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option +WARNING: Before v4.14 btrfs-progs, *--rootdir* will shrink the filesystem, +prevent user to make use of the remaining space. +In v4.14 btrfs-progs, this behavior is changed, and will not shrink the fs. +The result should be the same as `mkfs`, `mount` and then `cp -r`. + Hmm well. Shrink to fit exactly to the size of the given files-and-directory is indeed a nice feature. Which would help to create a golden-image btrfs seed device. Its not popular as of now, but at some point it may in the cloud environment. Replacing this feature instead of creating a new option is not a good idea indeed. I missed something ? Thanks, Anand +Also, if destination file/block device does not exist, *--rootdir* will not +create the image file, to make it follow the normal mkfs behavior. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: defragmenting best practice?
On 2017-09-21 16:10, Kai Krakow wrote: Am Wed, 20 Sep 2017 07:46:52 -0400 schrieb "Austin S. Hemmelgarn" : Fragmentation: Files with a lot of random writes can become heavily fragmented (1+ extents) causing excessive multi-second spikes of CPU load on systems with an SSD or large amount a RAM. On desktops this primarily affects application databases (including Firefox). Workarounds include manually defragmenting your home directory using btrfs fi defragment. Auto-defragment (mount option autodefrag) should solve this problem. Upon reading that I am wondering if fragmentation in the Firefox profile is part of my issue. That's one thing I never tested previously. (BTW, this system has 256 GB of RAM and 20 cores.) Almost certainly. Most modern web browsers are brain-dead and insist on using SQLite databases (or traditional DB files) for everything, including the cache, and the usage for the cache in particular kills performance when fragmentation is an issue. At least in Chrome, you can turn on simple cache backend, which, I think, is using many small instead of one huge file. This suit btrfs much better: That's correct. The traditional cache in Chrome and Chromium uses a single SQLite database for storing all the cache data and metadata (just like FIrefox did last time I checked). The simple cache backend instead uses the filesystem to handle allocations and uses directory hashing to speed up look ups of items, which actually means that even without BTRFS involved, it will usually be faster (both because it allows concurrent access unlike SQLite, and because it's generally faster to parse a multi-level directory hash than an SQL statement). chrome://flags/#enable-simple-cache-backend And then I suggest also doing this (as your login user): $ cd $HOME $ mv .cache .cache.old $ mkdir .cache $ lsattr +C .cache $ rsync -av .cache.old/ .cache/ $ rm -Rf .cache.old This makes caches for most applications nocow. Chrome performance was completely fixed for me by doing this. I'm not sure where Firefox puts its cache, I only use it on very rare occasions. But I think it's going to .cache/mozilla last time looked at it. I'm pretty sure that is correct. You may want to close all apps before converting the cache directory. At a minimum, you'll have to restart them to get them to use the new location. Also, I don't see any downsides in making this nocow. That directory could easily be also completely volatile. If something breaks due to no longer protected by data csum, just clean it out. Indeed, anything that is storing data here that can't be regenerated from some other source is asking for trouble, sane backup systems don't include ~/.cache, and it's quite often one of the first things recommended for deletion when trying to free up disk space. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option
As I already stated in an other thread, if you want to shrink, do it in another command line tool. Do one thing and do it simple. (Although Btrfs itself is already out of the UNIX way) It may be offline shrink/balance. But not to further complexing the --rootdir option now. And you also said that, the shrink feature is not a popular feature *NOW*, then I don't think it's worthy to implment it *NOW* either. Implement future feature in the future please. And further more, even following the existing shrink behavior, you still need to truncate the file all by yourself. Which is no better than creating a good sized file and then mkfs on it. Thanks, Qu Sent: Friday, September 22, 2017 at 5:24 PM From: "Anand Jain" To: "Qu Wenruo" , linux-btrfs@vger.kernel.org Cc: dste...@suse.cz Subject: Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option > +WARNING: Before v4.14 btrfs-progs, *--rootdir* will shrink the filesystem, > +prevent user to make use of the remaining space. > +In v4.14 btrfs-progs, this behavior is changed, and will not shrink the fs. > +The result should be the same as `mkfs`, `mount` and then `cp -r`. + Hmm well. Shrink to fit exactly to the size of the given files-and-directory is indeed a nice feature. Which would help to create a golden-image btrfs seed device. Its not popular as of now, but at some point it may in the cloud environment. Replacing this feature instead of creating a new option is not a good idea indeed. I missed something ? Thanks, Anand > +Also, if destination file/block device does not exist, *--rootdir* will not > +create the image file, to make it follow the normal mkfs behavior. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option
+WARNING: Before v4.14 btrfs-progs, *--rootdir* will shrink the filesystem, +prevent user to make use of the remaining space. +In v4.14 btrfs-progs, this behavior is changed, and will not shrink the fs. +The result should be the same as `mkfs`, `mount` and then `cp -r`. + Hmm well. Shrink to fit exactly to the size of the given files-and-directory is indeed a nice feature. Which would help to create a golden-image btrfs seed device. Its not popular as of now, but at some point it may in the cloud environment. Replacing this feature instead of creating a new option is not a good idea indeed. I missed something ? Thanks, Anand +Also, if destination file/block device does not exist, *--rootdir* will not +create the image file, to make it follow the normal mkfs behavior. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] xfstests: Split MOUNT_OPTIONS to TEST_FS_MOUNT_OPTS and MOUNT_OPTIONS
On Mon, Sep 18, 2017 at 04:11:09PM +0800, Gu Jinxiang wrote: > Resovle the inconsistent of mount option. > Btrfs use MOUNT_OPTIONS for both scrath_dev and test_dev. Change to > MOUNT_OPTIONS for scratch mount, and TEST_FS_MOUNT_OPTS for test dev > mount. > > Signed-off-by: Gu Jinxiang > --- > As mentioned by https://patchwork.kernel.org/patch/9742039/, the usage > of MOUNT_OPTIONS is inconsistent. Sorry for the late reply, because I don't think this is the right fix, and I was trying to sort out & refactor the mount option settings through fstests, but found that it required more time than I thought.. The new _check_test_btrfs_filesystem() you introduced here is a (almost) complete copy of _check_btrfs_filesystem, the only difference is that it's mounting $device with $TEST_FS_MOUNT_OPTS instead of $MOUNT_OPTIONS. At least you could factor out a common helper so you don't have to repeat 99% of the code. But that only fixes _check_test_btrfs_filesystems, there're similar problems in _check_generic_filesystems and other places that do mount operation. I'd like to see a well-defined interface to return the correct mount options to callers that want to do mount operations. I'll look into this and see if I can work something out. I really appreciate if anyone has other thoughts/suggestions! Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Running compsize utility on btrfs compress mount point
shally verma posted on Fri, 22 Sep 2017 12:01:32 +0530 as excerpted: > In response to my question in another thread: >> Is there any command that i can run to confirm file has been compressed? > > I've gotten response from Duncan: >>>There is the quite recently posted (and actively updated since then) >>>compsize command. > >>>https://github.com/kilobyte/compsize > > On running compsize on btrfs mount point created, I get following: > > Type Perc Disk Usage Uncompressed Referenced > Data50% 128K 256K 256K > zlib50% 128K 256K 256K > Data 100% 10G 10G 10G > none 100% 10G 10G 10G > > but then how do I interpret above information? Hmm... Looks like you got hit by the live-repo phenomenon. I've not updated for a few days (reported sync time was last Sat), but was just running it a few hours ago, and my output was a bit more intuitively ordered. The way it's /supposed/ to work (unless it's changing, it /is/ a rather new tool after all), the first line (Data) should be the aggregate total and average compression percentage of all used compression types, with the following lines displaying values for each compression type compsize found, on its own. IOW, for the above I'd have expected just three lines (plus header), something like this: Data100% 10G 10G 10G zlib 50%128K256K256K none100% 10G 10G 10G (In this case the 128K compressed is so small it's a rounding error in the 10G full size, so the combined data line and the none line will appear the same.) I'd suggest trying to pull any updates and rebuilding. If that doesn't fix it, try checking out a specific commit, say the 160c335e7 that was HEAD last Saturday (16th) when I built, and see if the output comes out a bit more sane. If that works, then the problem must be in either 94d98028b from Tuesday, or caed4fdcd, HEAD as I'm looking at it now, from Thursday, if you pulled after it. If the author doesn't popup in this thread, you might bisect to the specific commit and file a github issue on it. If 160c335e7 doesn't give you saner output, then there's some other difference between our systems that must account for your output above. FWIW, gcc 6.4.0 (gentoo) here, amd64, kernel 4.13, glibc 2.25-r5 (the -r5 indicating gentoo patch level), btrfs-progs 4.12. I have CFLAGS available too if you need 'em. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html