On Fri, Dec 26, 2014 at 1:43 PM, Satoru Takeuchi
<takeuchi_sat...@jp.fujitsu.com> wrote:
> Hi Yang,
>
> On 2014/12/26 10:32, Dongsheng Yang wrote:
>>
>> Hi Satoru,
>>
>> I saw your mail of "[BUG] "Quota Ignored On write" problem still exist
>> with 3.16-rc5
>> <http://news.gmane.org/find-root.php?message_id=53C8DEB0.1060404%40jp.fujitsu.com>"
>> in gmane.
>> I guess this patch will fix the problem you mentioned.
>> Could you help to give a try?
>
>
> Although it reached disk quota before writing whole 1.5GB,
> the copied size was only about 700 MB. In this case,
> expected behavior is to reach disk quota just after
> writing 1 GB.

Hi Satoru-san,

After some more investigation, I did meet this problem you described here.
There is a window between "reserved += num_bytes" and "may_use -= num_bytes".
And the similar problem exists in real space accounting. I am cooking a patch
to solve this problem.

Thanx for your test a lot.

Yang
>
> * 3.19-rc1 without your two v2 patches
>
> ===============================================================================
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per file to
> 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
> 1500000+0 records in
> 1500000+0 records out
> 1536000000 bytes (1.5 GB) copied, 25.6601 s, 59.9 MB/s
> ===============================================================================
>
> * 3.19-rc1 with your two v2 patches
>
> ===============================================================================
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per file to
> 65536
> fs created label (null) on /dev/vdb
>         nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000
> dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
> 681353+0 records in
> 681352+0 records out
> 697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
> ===============================================================================
>
> Thanks,
> Satoru
>
>>
>> Thanx
>> Yang
>>
>>
>> On 12/12/2014 04:44 PM, Dongsheng Yang wrote:
>>>
>>> Currently, for pre_alloc or delay_alloc, the bytes will be accounted
>>> in space_info by the three guys.
>>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>>> But on the other hand, in qgroup, there are only two counters to account
>>> the
>>> bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts
>>> bytes in space_info->bytes_may_use and qg->excl accounts bytes in
>>> space_info->used. So the bytes in space_info->reserved is not accounted
>>> in qgroup. If so, there is a window we can exceed the quota limit when
>>> bytes is in space_info->reserved.
>>>
>>> Example:
>>>         # btrfs quota enable /mnt
>>>         # btrfs qgroup limit -e 10M /mnt
>>>         # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>>         # sync
>>>         # btrfs qgroup show -pcre /mnt
>>> qgroupid rfer     excl     max_rfer max_excl parent  child
>>> -------- ----     ----     -------- -------- ------  -----
>>> 0/5      20987904 20987904 0        10485760 ---     ---
>>>
>>> qg->excl is 20987904 larger than max_excl 10485760.
>>>
>>> This patch introduce a new counter named may_use to qgroup, then
>>> there are three counters in qgroup to account bytes in space_info
>>> as below.
>>> space_info->bytes_may_use --- space_info->reserved --- space_info->used.
>>> qgroup->may_use           --- qgroup->reserved     --- qgroup->excl
>>>
>>> With this patch applied:
>>>         # btrfs quota enable /mnt
>>>         # btrfs qgroup limit -e 10M /mnt
>>>         # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done
>>> fallocate: /mnt/data9: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data10: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data11: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data12: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data13: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data14: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data15: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data16: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data17: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data18: fallocate failed: Disk quota exceeded
>>> fallocate: /mnt/data19: fallocate failed: Disk quota exceeded
>>>         # sync
>>>         # btrfs qgroup show -pcre /mnt
>>> qgroupid rfer    excl    max_rfer max_excl parent  child
>>> -------- ----    ----    -------- -------- ------  -----
>>> 0/5      9453568 9453568 0        10485760 ---     ---
>>>
>>> Reported-by: Cyril SCETBON<cyril.scet...@free.fr>
>>> Signed-off-by: Dongsheng Yang<yangds.f...@cn.fujitsu.com>
>>> ---
>>> Changelog:
>>>         v1 -> v2:
>>>                 Remove the redundant check for fs_info->quota_enabled.
>>>
>>>   fs/btrfs/extent-tree.c | 20 ++++++++++++++-
>>>   fs/btrfs/inode.c       | 18 ++++++++++++-
>>>   fs/btrfs/qgroup.c      | 68
>>> +++++++++++++++++++++++++++++++++++++++++++++++---
>>>   fs/btrfs/qgroup.h      |  4 +++
>>>   4 files changed, 104 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 014b7f2..f4ad737 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -5500,8 +5500,12 @@ static int pin_down_extent(struct btrfs_root
>>> *root,
>>>
>>>         set_extent_dirty(root->fs_info->pinned_extents, bytenr,
>>>                          bytenr + num_bytes - 1, GFP_NOFS |
>>> __GFP_NOFAIL);
>>> -       if (reserved)
>>> +       if (reserved) {
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  num_bytes, -1);
>>>                 trace_btrfs_reserved_extent_free(root, bytenr,
>>> num_bytes);
>>> +       }
>>>         return 0;
>>>   }
>>>
>>> @@ -6230,6 +6234,9 @@ void btrfs_free_tree_block(struct
>>> btrfs_trans_handle *trans,
>>>                 btrfs_update_reserved_bytes(cache, buf->len,
>>> RESERVE_FREE, 0);
>>>                 trace_btrfs_reserved_extent_free(root, buf->start,
>>> buf->len);
>>>                 pin = 0;
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  buf->len, -1);
>>>         }
>>>   out:
>>>         if (pin)
>>> @@ -6964,7 +6971,11 @@ static int __btrfs_free_reserved_extent(struct
>>> btrfs_root *root,
>>>         else {
>>>                 btrfs_add_free_space(cache, start, len);
>>>                 btrfs_update_reserved_bytes(cache, len, RESERVE_FREE,
>>> delalloc);
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  len, -1);
>>>         }
>>> +
>>>         btrfs_put_block_group(cache);
>>>
>>>         trace_btrfs_reserved_extent_free(root, start, len);
>>> @@ -7200,6 +7211,9 @@ int btrfs_alloc_logged_file_extent(struct
>>> btrfs_trans_handle *trans,
>>>         BUG_ON(ret); /* logic error */
>>>         ret = alloc_reserved_file_extent(trans, root, 0, root_objectid,
>>>                                          0, owner, offset, ins, 1);
>>> +       btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                                          root->root_key.objectid,
>>> +                                          ins->offset, 1);
>>>         btrfs_put_block_group(block_group);
>>>         return ret;
>>>   }
>>> @@ -7346,6 +7360,10 @@ struct extent_buffer
>>> *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>>>                 return ERR_PTR(ret);
>>>         }
>>>
>>> +       btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                                          root_objectid,
>>> +                                          ins.offset, 1);
>>> +
>>>         buf = btrfs_init_new_buffer(trans, root, ins.objectid,
>>>                                     blocksize, level);
>>>         BUG_ON(IS_ERR(buf)); /* -ENOMEM */
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index d23362f..0c824f3 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -59,6 +59,7 @@
>>>   #include "backref.h"
>>>   #include "hash.h"
>>>   #include "props.h"
>>> +#include "qgroup.h"
>>>
>>>   struct btrfs_iget_args {
>>>         struct btrfs_key *location;
>>> @@ -739,7 +740,9 @@ retry:
>>>                         }
>>>                         goto out_free;
>>>                 }
>>> -
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  ins.offset, 1);
>>>                 /*
>>>                  * here we're doing allocation and writeback of the
>>>                  * compressed pages
>>> @@ -951,6 +954,10 @@ static noinline int cow_file_range(struct inode
>>> *inode,
>>>                 if (ret < 0)
>>>                         goto out_unlock;
>>>
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  ins.offset, 1);
>>> +
>>>                 em = alloc_extent_map();
>>>                 if (!em) {
>>>                         ret = -ENOMEM;
>>> @@ -6745,6 +6752,10 @@ static struct extent_map
>>> *btrfs_new_extent_direct(struct inode *inode,
>>>                 return ERR_PTR(ret);
>>>         }
>>>
>>> +       btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +                                          root->root_key.objectid,
>>> +                                          ins.offset, 1);
>>> +
>>>         return em;
>>>   }
>>>
>>> @@ -9266,6 +9277,11 @@ static int __btrfs_prealloc_file_range(struct
>>> inode *inode, int mode,
>>>                                 btrfs_end_transaction(trans, root);
>>>                         break;
>>>                 }
>>> +
>>> +               btrfs_qgroup_update_reserved_bytes(root->fs_info,
>>> +
>>> root->root_key.objectid,
>>> +                                                  ins.offset, 1);
>>> +
>>>                 btrfs_drop_extent_cache(inode, cur_offset,
>>>                                         cur_offset + ins.offset -1, 0);
>>>
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index 48b60db..d147082 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -72,6 +72,7 @@ struct btrfs_qgroup {
>>>         /*
>>>          * reservation tracking
>>>          */
>>> +       u64 may_use;
>>>         u64 reserved;
>>>
>>>         /*
>>> @@ -1414,6 +1415,8 @@ static int qgroup_excl_accounting(struct
>>> btrfs_fs_info *fs_info,
>>>         WARN_ON(sign < 0 && qgroup->excl < oper->num_bytes);
>>>         qgroup->excl += sign * oper->num_bytes;
>>>         qgroup->excl_cmpr += sign * oper->num_bytes;
>>> +       if (sign > 0)
>>> +               qgroup->reserved -= oper->num_bytes;
>>>
>>>         qgroup_dirty(fs_info, qgroup);
>>>
>>> @@ -1434,6 +1437,8 @@ static int qgroup_excl_accounting(struct
>>> btrfs_fs_info *fs_info,
>>>                 qgroup->excl += sign * oper->num_bytes;
>>>                 if (sign < 0)
>>>                         WARN_ON(qgroup->excl < oper->num_bytes);
>>> +               if (sign > 0)
>>> +                       qgroup->reserved -= oper->num_bytes;
>>>                 qgroup->excl_cmpr += sign * oper->num_bytes;
>>>                 qgroup_dirty(fs_info, qgroup);
>>>
>>> @@ -2359,6 +2364,61 @@ out:
>>>         return ret;
>>>   }
>>>
>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>> +                                           u64 ref_root,
>>> +                                           u64 num_bytes,
>>> +                                           int sign)
>>> +{
>>> +       struct btrfs_root *quota_root;
>>> +       struct btrfs_qgroup *qgroup;
>>> +       int ret = 0;
>>> +       struct ulist_node *unode;
>>> +       struct ulist_iterator uiter;
>>> +
>>> +       if (!is_fstree(ref_root) || !fs_info->quota_enabled)
>>> +               return 0;
>>> +
>>> +       if (num_bytes == 0)
>>> +               return 0;
>>> +
>>> +       spin_lock(&fs_info->qgroup_lock);
>>> +       quota_root = fs_info->quota_root;
>>> +       if (!quota_root)
>>> +               goto out;
>>> +
>>> +       qgroup = find_qgroup_rb(fs_info, ref_root);
>>> +       if (!qgroup)
>>> +               goto out;
>>> +
>>> +       ulist_reinit(fs_info->qgroup_ulist);
>>> +       ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
>>> +                       (uintptr_t)qgroup, GFP_ATOMIC);
>>> +       if (ret < 0)
>>> +               goto out;
>>> +
>>> +       ULIST_ITER_INIT(&uiter);
>>> +       while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
>>> +               struct btrfs_qgroup *qg;
>>> +               struct btrfs_qgroup_list *glist;
>>> +
>>> +               qg = u64_to_ptr(unode->aux);
>>> +
>>> +               qg->reserved += sign * num_bytes;
>>> +
>>> +               list_for_each_entry(glist, &qg->groups, next_group) {
>>> +                       ret = ulist_add(fs_info->qgroup_ulist,
>>> +                                       glist->group->qgroupid,
>>> +                                       (uintptr_t)glist->group,
>>> GFP_ATOMIC);
>>> +                       if (ret < 0)
>>> +                               goto out;
>>> +               }
>>> +       }
>>> +
>>> +out:
>>> +       spin_unlock(&fs_info->qgroup_lock);
>>> +       return ret;
>>> +}
>>> +
>>>   /*
>>>    * reserve some space for a qgroup and all its parents. The reservation
>>> takes
>>>    * place with start_transaction or dealloc_reserve, similar to ENOSPC
>>> @@ -2407,14 +2467,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root,
>>> u64 num_bytes)
>>>                 qg = u64_to_ptr(unode->aux);
>>>
>>>                 if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>>> -                   qg->reserved + (s64)qg->rfer + num_bytes >
>>> +                   qg->reserved + qg->may_use + (s64)qg->rfer +
>>> num_bytes >
>>>                     qg->max_rfer) {
>>>                         ret = -EDQUOT;
>>>                         goto out;
>>>                 }
>>>
>>>                 if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
>>> -                   qg->reserved + (s64)qg->excl + num_bytes >
>>> +                   qg->reserved + qg->may_use + (s64)qg->excl +
>>> num_bytes >
>>>                     qg->max_excl) {
>>>                         ret = -EDQUOT;
>>>                         goto out;
>>> @@ -2438,7 +2498,7 @@ int btrfs_qgroup_reserve(struct btrfs_root *root,
>>> u64 num_bytes)
>>>
>>>                 qg = u64_to_ptr(unode->aux);
>>>
>>> -               qg->reserved += num_bytes;
>>> +               qg->may_use += num_bytes;
>>>         }
>>>
>>>   out:
>>> @@ -2484,7 +2544,7 @@ void btrfs_qgroup_free(struct btrfs_root *root, u64
>>> num_bytes)
>>>
>>>                 qg = u64_to_ptr(unode->aux);
>>>
>>> -               qg->reserved -= num_bytes;
>>> +               qg->may_use -= num_bytes;
>>>
>>>                 list_for_each_entry(glist, &qg->groups, next_group) {
>>>                         ret = ulist_add(fs_info->qgroup_ulist,
>>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
>>> index 18cc68c..31de88e 100644
>>> --- a/fs/btrfs/qgroup.h
>>> +++ b/fs/btrfs/qgroup.h
>>> @@ -95,6 +95,10 @@ int btrfs_run_qgroups(struct btrfs_trans_handle
>>> *trans,
>>>   int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
>>>                          struct btrfs_fs_info *fs_info, u64 srcid, u64
>>> objectid,
>>>                          struct btrfs_qgroup_inherit *inherit);
>>> +int btrfs_qgroup_update_reserved_bytes(struct btrfs_fs_info *fs_info,
>>> +                                      u64 ref_root,
>>> +                                      u64 num_bytes,
>>> +                                      int sign);
>>>   int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
>>>   void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
>>>
>>
>
> --
>
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to