Re: [PATCH 0/4] btrfs: update qgroups in drop snapshot

Mark Fasheh Wed, 23 Sep 2015 14:50:27 -0700

On Wed, Sep 23, 2015 at 09:40:42AM +0800, Qu Wenruo wrote:
> Thanks for all your work on this.
> Comment on test case is inlined below.


No problem, thanks for the review Qu!


> >+# Create an fs tree of a given height at a target location. This is
> >+# done by agressively creating inline extents to expand the number of
> >+# nodes required. We also add an traditional extent so that
> >+# drop_snapshot is forced to walk at least one extent that is not
> >+# stored in metadata.
> >+#
> >+# NOTE: The ability to vary tree height for this test is very useful
> >+# for debugging problems with drop_snapshot(). As a result we retain
> >+# that parameter even though the test below always does level 2 trees.
> >+_explode_fs_tree () {
> >+    local level=$1;
> >+    local loc="$2";
> >+    local bs=4095;
> >+    local cnt=1;
> >+    local n;
> >+
> >+    if [ -z $loc ]; then
> >+    echo "specify location for fileset"
> >+    exit 1;
> >+    fi
> >+
> >+    case $level in
> >+    1)# this always reproduces level 1 trees
> >+        n=10;
> 
> I'd prefer a more accurate calculation based on nodesize.
> That would allow using any possible nodesize.

How do we query nodesize? 'btrfs fi show' doesn't seem to give it.


But yeah in theory that sounds nice, I can certainly try it out. I'm not
really sure how complicated we want this whole function to be though. Right
now it should always work on all platforms.


> For example for $nodesize larger than 4K
> l1_min_n = int(($nodesize - 101) / (4095 + 21 + 17)) + 1
> 
> And for 4K nodesize,
> 2 2K inline extent will cause the fs_tree to be 2 level.
> As 4K nodeize won't allow 4095 inline extent length.
> 
> 101 is sizeof(struct btrfs_header).
> So $nodesize - 101 is the available space for a leaf.
> And 4095 is the maximum inline extent length, 21 is the inline
> file_extent header length,offsetof(struct btrfs_file_extent_item,
> disk_bytenr).
> 17 is sizeof(struct btrfs_disk_key).
> 
> Above is the maximum value in theory, in fact leaf will be split
> more early than hitting the number.
> So the max_n should be good enough to create desired tree level, and
> make script run a little faster.
> 
> >+        ;;
> >+    2)# this always reproduces level 2 trees
> >+        n=1500
> For level 2 and higher tree, it will be
> l2_min_n = l1_min_n * (($nodesize - 101) / 33 +1) ^ ($level - 1)
> 
> 101 is still header size.
> 33 is sizeof(struct btrfs_key_ptr)
> 
> >+        ;;
> >+    3)# this always reproduces level 3 trees
> >+        n=1000000;
> >+        ;;
> >+    *)
> >+        echo "Can't make level $level trees";
> >+        exit 1;
> >+        ;;
> >+    esac
> >+
> >+    mkdir -p $loc
> >+    for i in `seq -w 1 $n`;
> >+    do
> >+    dd status=none if=/dev/zero of=$loc/file$i bs=$bs count=$cnt
> >+    done
> >+
> >+    bs=131072
> >+    cnt=1
> >+    dd status=none if=/dev/zero of=$loc/extentfile bs=$bs count=$cnt
> >+}
> >+
> >+# Force the default leaf size as the calculations for making our btree
> >+# heights are based on that.
> >+run_check _scratch_mkfs "--nodesize 16384"
> 
> When using given nodesize, it's better to consider other arch like
> AA64 or other arch which support 64K pagesize.
> (btrfs doesn't support subpage size nodesize, which means if using
> nodesize smaller than pagesize, it won't be mounted)
> 
> I got informed several times when submitting qgroup related test case.
> See btrfs/017 and btrfs/091.
> 
> But on the other hand, using 64K nodesize is somewhat overkilled and
> will make the test takes more time.
> 
> If it's OK to ignore 64K pagesize case, I'd prefer to use 4K
> nodesize, which will be much faster to create fs tree with 2~3
> levels.

Right, so 64K nodesize is the most 'compatible' nodesize.


> >+_scratch_mount
> >+
> >+# populate the default subvolume and create a snapshot ('snap1')
> >+_explode_fs_tree 1 $SCRATCH_MNT/files
> >+_run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap1
> >+
> >+# create new btree nodes in this snapshot. They will become shared
> >+# with the next snapshot we create.
> >+_explode_fs_tree 1 $SCRATCH_MNT/snap1/files-snap1
> >+
> >+# create our final snapshot ('snap2'), populate it with
> >+# exclusively owned nodes.
> >+#
> >+# As a result of this action, snap2 will get an implied ref to the
> >+# 128K extent created in the default subvolume.
> >+_run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT/snap1 
> >$SCRATCH_MNT/snap2
> >+_explode_fs_tree 1 $SCRATCH_MNT/snap2/files-snap2
> >+
> >+# Enable qgroups now that we have our filesystem prepared. This
> >+# will kick off a scan which we will have to wait for.
> >+_run_btrfs_util_prog quota enable $SCRATCH_MNT
> >+_run_btrfs_util_prog quota rescan -w $SCRATCH_MNT
> >+
> >+# Remount to clear cache, force everything to disk
> >+_scratch_unmount
> >+_scratch_mount
> 
> Is there anything special that needs to use umount/mount other than sync?

A couple times now it's been to my advantage to force btrfs to reread the
file trees. It might not be strictly necessary any more.


> >+# Finally, delete snap1 to trigger btrfs_drop_snapshot(). This
> >+# snapshot is most interesting to delete because it will cause some
> >+# nodes to go exclusively owned for snap2, while some will stay shared
> >+# with the default subvolume. That exercises a maximum of the drop
> >+# snapshot/qgroup interactions.
> >+#
> >+# snap2s imlied ref from to the 128K extent in files/ can be lost by
> >+# the root finding code in qgroup accounting due to snap1 no longer
> >+# providing a path to it. This was fixed by the first two patches
> >+# referenced above.
> >+_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap1
> >+
> >+# There is no way from userspace to force btrfs_drop_snapshot to run
> >+# at a given time (even via mount/unmount). We must wait for it to
> >+# start and complete. This is the shortest time on my tests systems I
> >+# have found which always allows drop_snapshot to run to completion.
> >+sleep 45
> 
> Does "btrfs subv delete -c" help here?

Unfortunately not :( We need to wait for drop_snapshot() to get run. That
flag (from memory) just waits for the initial orphaning transaction to
finish.


> >+
> >+_scratch_unmount
> >+
> >+# generate a qgroup report and look for inconsistent groups
> >+#  - don't use _run_btrfs_util_prog here as it captures the output and
> >+#    we need to grep it.
> >+$BTRFS_UTIL_PROG check --qgroup-report $SCRATCH_DEV 2>&1 | grep -E -q 
> >"Counts for qgroup.*are different"
> >+if [ $? -ne 0 ]; then
> >+    status=0
> >+fi
> Quite a nice idea to use btrfsck to check qgroup validation.
> 
> But I don't see the reason not to use _run_btrfS_util_progs, as I
> don't think it's needed to grep.
> 
> If there is a bug in return value of btrfsck, then I'm OK with it as
> a workaround.
> 
> But if btrfsck --qgroup-report will return non-zero when it finds a
> qgroup mismatch, I think is better to just call
> _run_btrfs_util_prog, as it has judgment for return value check.

btrfsck --qgroup-report returns zero unless there was an issue generating
the report so the grep there is the only way to catch this consistently.

Thanks again,
        --Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] btrfs: update qgroups in drop snapshot

Reply via email to