On Sun, Apr 23, 2017 at 8:42 PM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote: > > > At 04/22/2017 07:12 AM, Sargun Dhillon wrote: >> >> This patch introduces the quota override flag to btrfs_fs_info, and >> a change to quota limit checking code to temporarily allow for quota >> to be overridden for processes with cap_sys_resource. >> >> It's useful for administrative programs, such as log rotation, >> that may need to temporarily use more disk space in order to free up >> a greater amount of overall disk space without yielding more disk >> space to the rest of userland. >> >> Eventually, we may want to add the idea of an operator-specific >> quota, operator reserved space, or something else to allow for >> administrative override, but this is perhaps the simplest >> solution. > > > Indeed simplest method yet. > > But considering that reserved data space can be used by none privileged > user, I'm not sure if it's a good idea. > > For example: > > If root want to write a 64M file with new data, then it reserved 64M data > space. > > And another none privileged user also want to write that 64M file with > different data, then the user won't need to reserve data space. > (Although metadata space is still needed). > > Won't this cause some method to escaping the qgroup limit? This is more of a failure-avoidance mechanism. We run containers that don't have cap_sys_resource. The log rotator, on the other hand, has a full-set of capabilities in the root user namespace. Given that we'd only flip quota_override if the system gets into a state where the log rotator cannot run, I don't see it being particularly problematic.
At least looking at my systems, none of my users have cap_sys_resource in their capabilities set, and it seems to be the closest capability that maps to disk quota logic. I'd hate to drop this into the bucket of cap_sys_admin. Are you perhaps suggesting per-uid and per-gid qgroups? Or being able to have a quota_override_uid value? I thought about adding an extended attribute, but that would require the attr to be set at file creation time, not necessarily when I need it for an escape. Another idea was to do what ext does, in adding a special "operator" reserved space which processes with uid == 0 && cap_sys_resource can use. This would require changing the on-disk qgroup format. You're right, we would (intentionally) "escape" the qgroup limit. A process with cap_sys_resource would be able to allocate more disk space temporarily If I'm understanding you correctly, I don't think that they would be able to rewrite the file's contents because that'd be considered changed extents, no? > >> >> Signed-off-by: Sargun Dhillon <sar...@sargun.me> >> --- >> fs/btrfs/ctree.h | 3 +++ >> fs/btrfs/qgroup.c | 9 +++++++-- >> 2 files changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >> index c411590..01a095b 100644 >> --- a/fs/btrfs/ctree.h >> +++ b/fs/btrfs/ctree.h >> @@ -1098,6 +1098,9 @@ struct btrfs_fs_info { >> u32 nodesize; >> u32 sectorsize; >> u32 stripesize; >> + >> + /* Allow tasks with cap_sys_resource to override the quota */ >> + bool quota_override; > > > Why not use existing fs_info->qgroup_flags? Isn't that persisted? I don't want this to surprise users across reboots. > > Thanks, > Qu > > >> }; >> static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb) >> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >> index a59801d..0328492 100644 >> --- a/fs/btrfs/qgroup.c >> +++ b/fs/btrfs/qgroup.c >> @@ -2347,8 +2347,12 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle >> *trans, >> return ret; >> } >> -static bool qgroup_check_limits(const struct btrfs_qgroup *qg, u64 >> num_bytes) >> +static bool qgroup_check_limits(const struct btrfs_qgroup *qg, >> + u64 num_bytes, bool quota_override) >> { >> + if (quota_override && capable(CAP_SYS_RESOURCE)) >> + return true; >> + >> if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >> qg->reserved + (s64)qg->rfer + num_bytes > qg->max_rfer) >> return false; >> @@ -2366,6 +2370,7 @@ static int qgroup_reserve(struct btrfs_root *root, >> u64 num_bytes, bool enforce) >> struct btrfs_qgroup *qgroup; >> struct btrfs_fs_info *fs_info = root->fs_info; >> u64 ref_root = root->root_key.objectid; >> + bool qo = fs_info->quota_override; >> int ret = 0; >> struct ulist_node *unode; >> struct ulist_iterator uiter; >> @@ -2401,7 +2406,7 @@ static int qgroup_reserve(struct btrfs_root *root, >> u64 num_bytes, bool enforce) >> qg = unode_aux_to_qgroup(unode); >> - if (enforce && !qgroup_check_limits(qg, num_bytes)) { >> + if (enforce && !qgroup_check_limits(qg, num_bytes, qo)) { >> ret = -EDQUOT; >> goto out; >> } >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html