On Sun, Apr 23, 2017 at 09:48:34PM -0700, Sargun Dhillon wrote:
> On Sun, Apr 23, 2017 at 8:42 PM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote:
> > At 04/22/2017 07:12 AM, Sargun Dhillon wrote:
> >>
> >> This patch introduces the quota override flag to btrfs_fs_info, and
> >> a change to quota limit checking code to temporarily allow for quota
> >> to be overridden for processes with cap_sys_resource.
> >>
> >> It's useful for administrative programs, such as log rotation,
> >> that may need to temporarily use more disk space in order to free up
> >> a greater amount of overall disk space without yielding more disk
> >> space to the rest of userland.
> >>
> >> Eventually, we may want to add the idea of an operator-specific
> >> quota, operator reserved space, or something else to allow for
> >> administrative override, but this is perhaps the simplest
> >> solution.
> >
> >
> > Indeed simplest method yet.
> >
> > But considering that reserved data space can be used by none privileged
> > user, I'm not sure if it's a good idea.
> >
> > For example:
> >
> > If root want to write a 64M file with new data, then it reserved 64M data
> > space.
> >
> > And another none privileged user also want to write that 64M file with
> > different data, then the user won't need to reserve data space.
> > (Although metadata space is still needed).
> >
> > Won't this cause some method to escaping the qgroup limit?
> This is more of a failure-avoidance mechanism. We run containers that
> don't have cap_sys_resource. The log rotator, on the other hand, has a
> full-set of capabilities in the root user namespace. Given that we'd
> only flip quota_override if the system gets into a state where the log
> rotator cannot run, I don't see it being particularly problematic.

So this usecase sounds valid to me. CAP_SYS_RESOURCE is documented to
allow quota overrides, no surprise here. The extra step to enable the
override per filesystem should put enough barriers against unintentional
behaviour.

> At least looking at my systems, none of my users have cap_sys_resource
> in their capabilities set, and it seems to be the closest capability
> that maps to disk quota logic. I'd hate to drop this into the bucket
> of cap_sys_admin. Are you perhaps suggesting per-uid and per-gid
> qgroups? Or being able to have a quota_override_uid value? I thought
> about adding an extended attribute, but that would require the attr to
> be set at file creation time, not necessarily when I need it for an
> escape. Another idea was to do what ext does, in adding a special
> "operator" reserved space which processes with uid == 0 &&
> cap_sys_resource can use. This would require changing the on-disk
> qgroup format.
> 
> You're right, we would (intentionally) "escape" the qgroup limit. A
> process with cap_sys_resource would be able to allocate more disk
> space temporarily If I'm understanding you correctly, I don't think
> that they would be able to rewrite the file's contents because that'd
> be considered changed extents, no?
> 
> >
> >>
> >> Signed-off-by: Sargun Dhillon <sar...@sargun.me>
> >> ---
> >>   fs/btrfs/ctree.h  | 3 +++
> >>   fs/btrfs/qgroup.c | 9 +++++++--
> >>   2 files changed, 10 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> >> index c411590..01a095b 100644
> >> --- a/fs/btrfs/ctree.h
> >> +++ b/fs/btrfs/ctree.h
> >> @@ -1098,6 +1098,9 @@ struct btrfs_fs_info {
> >>         u32 nodesize;
> >>         u32 sectorsize;
> >>         u32 stripesize;
> >> +
> >> +       /* Allow tasks with cap_sys_resource to override the quota */
> >> +       bool quota_override;
> >
> >
> > Why not use existing fs_info->qgroup_flags?
> Isn't that persisted? I don't want this to surprise users across reboots.

Yes it's persisted, but we can use a bitmask for the in-memory bits
before we sync the qgroup_flags to disk.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to