On Fri, Dec 26, 2014 at 1:43 PM, Satoru Takeuchi <takeuchi_sat...@jp.fujitsu.com> wrote: > Hi Yang, > > On 2014/12/26 10:32, Dongsheng Yang wrote: >> >> Hi Satoru, >> >> I saw your mail of "[BUG] "Quota Ignored On write" problem still exist >> with 3.16-rc5 >> <http://news.gmane.org/find-root.php?message_id=53C8DEB0.1060404%40jp.fujitsu.com>" >> in gmane. >> I guess this patch will fix the problem you mentioned. >> Could you help to give a try? > > > Although it reached disk quota before writing whole 1.5GB, > the copied size was only about 700 MB. In this case, > expected behavior is to reach disk quota just after > writing 1 GB.
Hi Satoru-san, After some more investigation, I did meet this problem you described here. There is a window between "reserved += num_bytes" and "may_use -= num_bytes". And the similar problem exists in real space accounting. I am cooking a patch to solve this problem. Thanx for your test a lot. Yang > > * 3.19-rc1 without your two v2 patches > > =============================================================================== > + mkfs.btrfs -f /dev/vdb > Btrfs v3.17 > See http://btrfs.wiki.kernel.org for more information. > > Turning ON incompat feature 'extref': increased hardlink limit per file to > 65536 > fs created label (null) on /dev/vdb > nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB > + mount /dev/vdb /root/btrfs-auto-test/ > + ret=0 > + btrfs quota enable /root/btrfs-auto-test/ > + btrfs subvolume create /root/btrfs-auto-test//sub > Create subvolume '/root/btrfs-auto-test/sub' > + btrfs qgroup limit 1G /root/btrfs-auto-test//sub > + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000 > 1500000+0 records in > 1500000+0 records out > 1536000000 bytes (1.5 GB) copied, 25.6601 s, 59.9 MB/s > =============================================================================== > > * 3.19-rc1 with your two v2 patches > > =============================================================================== > + mkfs.btrfs -f /dev/vdb > Btrfs v3.17 > See http://btrfs.wiki.kernel.org for more information. > > Turning ON incompat feature 'extref': increased hardlink limit per file to > 65536 > fs created label (null) on /dev/vdb > nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB > + mount /dev/vdb /root/btrfs-auto-test/ > + ret=0 > + btrfs quota enable /root/btrfs-auto-test/ > + btrfs subvolume create /root/btrfs-auto-test//sub > Create subvolume '/root/btrfs-auto-test/sub' > + btrfs qgroup limit 1G /root/btrfs-auto-test//sub > + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000 > dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded > 681353+0 records in > 681352+0 records out > 697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s > =============================================================================== > > Thanks, > Satoru > >> >> Thanx >> Yang >> >> >> On 12/12/2014 04:44 PM, Dongsheng Yang wrote: >>> >>> Currently, for pre_alloc or delay_alloc, the bytes will be accounted >>> in space_info by the three guys. >>> space_info->bytes_may_use --- space_info->reserved --- space_info->used. >>> But on the other hand, in qgroup, there are only two counters to account >>> the >>> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts >>> bytes in space_info->bytes_may_use and qg->excl accounts bytes in >>> space_info->used. So the bytes in space_info->reserved is not accounted >>> in qgroup. If so, there is a window we can exceed the quota limit when >>> bytes is in space_info->reserved. >>> >>> Example: >>> # btrfs quota enable /mnt >>> # btrfs qgroup limit -e 10M /mnt >>> # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done >>> # sync >>> # btrfs qgroup show -pcre /mnt >>> qgroupid rfer excl max_rfer max_excl parent child >>> -------- ---- ---- -------- -------- ------ ----- >>> 0/5 20987904 20987904 0 10485760 --- --- >>> >>> qg->excl is 20987904 larger than max_excl 10485760. >>> >>> This patch introduce a new counter named may_use to qgroup, then >>> there are three counters in qgroup to account bytes in space_info >>> as below. >>> space_info->bytes_may_use --- space_info->reserved --- space_info->used. >>> qgroup->may_use --- qgroup->reserved --- qgroup->excl >>> >>> With this patch applied: >>> # btrfs quota enable /mnt >>> # btrfs qgroup limit -e 10M /mnt >>> # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done >>> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded >>> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded >>> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded >>> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded >>> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded >>> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded >>> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded >>> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded >>> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded >>> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded >>> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded >>> # sync >>> # btrfs qgroup show -pcre /mnt >>> qgroupid rfer excl max_rfer max_excl parent child >>> -------- ---- ---- -------- -------- ------ ----- >>> 0/5 9453568 9453568 0 10485760 --- --- >>> >>> Reported-by: Cyril SCETBON<cyril.scet...@free.fr> >>> Signed-off-by: Dongsheng Yang<yangds.f...@cn.fujitsu.com> >>> --- >>> Changelog: >>> v1 -> v2: >>> Remove the redundant check for fs_info->quota_enabled. >>> >>> fs/btrfs/extent-tree.c | 20 ++++++++++++++- >>> fs/btrfs/inode.c | 18 ++++++++++++- >>> fs/btrfs/qgroup.c | 68 >>> +++++++++++++++++++++++++++++++++++++++++++++++--- >>> fs/btrfs/qgroup.h | 4 +++ >>> 4 files changed, 104 insertions(+), 6 deletions(-) >>> >>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >>> index 014b7f2..f4ad737 100644 >>> --- a/fs/btrfs/extent-tree.c >>> +++ b/fs/btrfs/extent-tree.c >>> @@ -5500,8 +5500,12 @@ static int pin_down_extent(struct btrfs_root >>> *root, >>> >>> set_extent_dirty(root->fs_info->pinned_extents, bytenr, >>> bytenr + num_bytes - 1, GFP_NOFS | >>> __GFP_NOFAIL); >>> - if (reserved) >>> + if (reserved) { >>> + btrfs_qgroup_update_reserved_bytes(root->fs_info, >>> + >>> root->root_key.objectid, >>> + num_bytes, -1); >>> trace_btrfs_reserved_extent_free(root, bytenr, >>> num_bytes); >>> + } >>> return 0; >>> } >>> >>> @@ -6230,6 +6234,9 @@ void btrfs_free_tree_block(struct >>> btrfs_trans_handle *trans, >>> btrfs_update_reserved_bytes(cache, buf->len, >>> RESERVE_FREE, 0); >>> trace_btrfs_reserved_extent_free(root, buf->start, >>> buf->len); >>> pin = 0; >>> + btrfs_qgroup_update_reserved_bytes(root->fs_info, >>> + >>> root->root_key.objectid, >>> + buf->len, -1); >>> } >>> out: >>> if (pin) >>> @@ -6964,7 +6971,11 @@ static int __btrfs_free_reserved_extent(struct >>> btrfs_root *root, >>> else { >>> btrfs_add_free_space(cache, start, len); >>> btrfs_update_reserved_bytes(cache, len, RESERVE_FREE, >>> delalloc); >>> + btrfs_qgroup_update_reserved_bytes(root->fs_info, >>> + >>> root->root_key.objectid, >>> + len, -1); >>> } >>> + >>> btrfs_put_block_group(cache); >>> >>> trace_btrfs_reserved_extent_free(root, start, len); >>> @@ -7200,6 +7211,9 @@ int btrfs_alloc_logged_file_extent(struct >>> btrfs_trans_handle *trans, >>> BUG_ON(ret); /* logic error */ >>> ret = alloc_reserved_file_extent(trans, root, 0, root_objectid, >>> 0, owner, offset, ins, 1); >>> + btrfs_qgroup_update_reserved_bytes(root->fs_info, >>> + root->root_key.objectid, >>> + ins->offset, 1); >>> btrfs_put_block_group(block_group); >>> return ret; >>> } >>> @@ -7346,6 +7360,10 @@ struct extent_buffer >>> *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, >>> return ERR_PTR(ret); >>> } >>> >>> + btrfs_qgroup_update_reserved_bytes(root->fs_info, >>> + root_objectid, >>> + ins.offset, 1); >>> + >>> buf = btrfs_init_new_buffer(trans, root, ins.objectid, >>> blocksize, level); >>> BUG_ON(IS_ERR(buf)); /* -ENOMEM */ >>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>> index d23362f..0c824f3 100644 >>> --- a/fs/btrfs/inode.c >>> +++ b/fs/btrfs/inode.c >>> @@ -59,6 +59,7 @@ >>> #include "backref.h" >>> #include "hash.h" >>> #include "props.h" >>> +#include "qgroup.h" >>> >>> struct btrfs_iget_args { >>> struct btrfs_key *location; >>> @@ -739,7 +740,9 @@ retry: >>> } >>> goto out_free; >>> } >>> - >>> + btrfs_qgroup_update_reserved_bytes(root->fs_info, >>> + >>> root->root_key.objectid, >>> + ins.offset, 1); >>> /* >>> * here we're doing allocation and writeback of the >>> * compressed pages >>> @@ -951,6 +954,10 @@ static noinline int cow_file_range(struct inode >>> *inode, >>> if (ret < 0) >>> goto out_unlock; >>> >>> + btrfs_qgroup_update_reserved_bytes(root->fs_info, >>> + >>> root->root_key.objectid, >>> + ins.offset, 1); >>> + >>> em = alloc_extent_map(); >>> if (!em) { >>> ret = -ENOMEM; >>> @@ -6745,6 +6752,10 @@ static struct extent_map >>> *btrfs_new_extent_direct(struct inode *inode, >>> return ERR_PTR(ret); >>> } >>> >>> + btrfs_qgroup_update_reserved_bytes(root->fs_info, >>> + root->root_key.objectid, >>> + ins.offset, 1); >>> + >>> return em; >>> } >>> >>> @@ -9266,6 +9277,11 @@ static int __btrfs_prealloc_file_range(struct >>> inode *inode, int mode, >>> btrfs_end_transaction(trans, root); >>> break; >>> } >>> + >>> + btrfs_qgroup_update_reserved_bytes(root->fs_info, >>> + >>> root->root_key.objectid, >>> + ins.offset, 1); >>> + >>> btrfs_drop_extent_cache(inode, cur_offset, >>> cur_offset + ins.offset -1, 0); >>> >>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >>> index 48b60db..d147082 100644 >>> --- a/fs/btrfs/qgroup.c >>> +++ b/fs/btrfs/qgroup.c >>> @@ -72,6 +72,7 @@ struct btrfs_qgroup { >>> /* >>> * reservation tracking >>> */ >>> + u64 may_use; >>> u64 reserved; >>> >>> /* >>> @@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct >>> btrfs_fs_info *fs_info, >>> WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes); >>> qgroup->excl += sign * oper->num_bytes; >>> qgroup->excl_cmpr += sign * oper->num_bytes; >>> + if (sign > 0) >>> + qgroup->reserved -= oper->num_bytes; >>> >>> qgroup_dirty(fs_info, qgroup); >>> >>> @@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct >>> btrfs_fs_info *fs_info, >>> qgroup->excl += sign * oper->num_bytes; >>> if (sign < 0) >>> WARN_ON(qgroup->excl < oper->num_bytes); >>> + if (sign > 0) >>> + qgroup->reserved -= oper->num_bytes; >>> qgroup->excl_cmpr += sign * oper->num_bytes; >>> qgroup_dirty(fs_info, qgroup); >>> >>> @@ -2359,6 +2364,61 @@ out: >>> return ret; >>> } >>> >>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info, >>> + u64 ref_root, >>> + u64 num_bytes, >>> + int sign) >>> +{ >>> + struct btrfs_root *quota_root; >>> + struct btrfs_qgroup *qgroup; >>> + int ret = 0; >>> + struct ulist_node *unode; >>> + struct ulist_iterator uiter; >>> + >>> + if (!is_fstree(ref_root) || !fs_info->quota_enabled) >>> + return 0; >>> + >>> + if (num_bytes == 0) >>> + return 0; >>> + >>> + spin_lock(&fs_info->qgroup_lock); >>> + quota_root = fs_info->quota_root; >>> + if (!quota_root) >>> + goto out; >>> + >>> + qgroup = find_qgroup_rb(fs_info, ref_root); >>> + if (!qgroup) >>> + goto out; >>> + >>> + ulist_reinit(fs_info->qgroup_ulist); >>> + ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid, >>> + (uintptr_t)qgroup, GFP_ATOMIC); >>> + if (ret < 0) >>> + goto out; >>> + >>> + ULIST_ITER_INIT(&uiter); >>> + while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) { >>> + struct btrfs_qgroup *qg; >>> + struct btrfs_qgroup_list *glist; >>> + >>> + qg = u64_to_ptr(unode->aux); >>> + >>> + qg->reserved += sign * num_bytes; >>> + >>> + list_for_each_entry(glist, &qg->groups, next_group) { >>> + ret = ulist_add(fs_info->qgroup_ulist, >>> + glist->group->qgroupid, >>> + (uintptr_t)glist->group, >>> GFP_ATOMIC); >>> + if (ret < 0) >>> + goto out; >>> + } >>> + } >>> + >>> +out: >>> + spin_unlock(&fs_info->qgroup_lock); >>> + return ret; >>> +} >>> + >>> /* >>> * reserve some space for a qgroup and all its parents. The reservation >>> takes >>> * place with start_transaction or dealloc_reserve, similar to ENOSPC >>> @@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, >>> u64 num_bytes) >>> qg = u64_to_ptr(unode->aux); >>> >>> if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >>> - qg->reserved + (s64)qg->rfer + num_bytes > >>> + qg->reserved + qg->may_use + (s64)qg->rfer + >>> num_bytes > >>> qg->max_rfer) { >>> ret = -EDQUOT; >>> goto out; >>> } >>> >>> if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && >>> - qg->reserved + (s64)qg->excl + num_bytes > >>> + qg->reserved + qg->may_use + (s64)qg->excl + >>> num_bytes > >>> qg->max_excl) { >>> ret = -EDQUOT; >>> goto out; >>> @@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, >>> u64 num_bytes) >>> >>> qg = u64_to_ptr(unode->aux); >>> >>> - qg->reserved += num_bytes; >>> + qg->may_use += num_bytes; >>> } >>> >>> out: >>> @@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root *root, u64 >>> num_bytes) >>> >>> qg = u64_to_ptr(unode->aux); >>> >>> - qg->reserved -= num_bytes; >>> + qg->may_use -= num_bytes; >>> >>> list_for_each_entry(glist, &qg->groups, next_group) { >>> ret = ulist_add(fs_info->qgroup_ulist, >>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h >>> index 18cc68c..31de88e 100644 >>> --- a/fs/btrfs/qgroup.h >>> +++ b/fs/btrfs/qgroup.h >>> @@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle >>> *trans, >>> int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, >>> struct btrfs_fs_info *fs_info, u64 srcid, u64 >>> objectid, >>> struct btrfs_qgroup_inherit *inherit); >>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info, >>> + u64 ref_root, >>> + u64 num_bytes, >>> + int sign); >>> int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes); >>> void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes); >>> >> > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html