If we're moving from a parent to a direct descendant, the only end result (on cgroupv2 hierarchies) is that the process experiences more restrictive resource limits. Thus, there's no reason to restrict processes from moving to direct descendants based on whether or not they have cgroup.procs write access to their current cgroup.
This is important for unprivileged subtree management, as it allows unprivileged processes to move to their newly create subtrees. Cc: d...@opencontainers.org Signed-off-by: Aleksa Sarai <asa...@suse.de> --- kernel/cgroup.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 4559baa7eabd..fa403357ba91 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2859,14 +2859,22 @@ static int cgroup_procs_write_permission(struct task_struct *task, cgrp = task_cgroup_from_root(task, &cgrp_dfl_root); spin_unlock_irq(&css_set_lock); - while (!cgroup_is_descendant(dst_cgrp, cgrp)) - cgrp = cgroup_parent(cgrp); - - ret = -ENOMEM; - inode = kernfs_get_inode(sb, cgrp->procs_file.kn); - if (inode) { - ret = inode_permission(inode, MAY_WRITE); - iput(inode); + /* + * If we are moving to a descendant of our current cgroup, we + * can only further restrict the cgroup limits we must follow. + * Thus, it doesn't make sense to restrict the cgroup.procs + * write. + */ + if (!cgroup_is_descendant(dst_cgrp, cgrp)) { + while (!cgroup_is_descendant(dst_cgrp, cgrp)) + cgrp = cgroup_parent(cgrp); + + ret = -ENOMEM; + inode = kernfs_get_inode(sb, cgrp->procs_file.kn); + if (inode) { + ret = inode_permission(inode, MAY_WRITE); + iput(inode); + } } } -- 2.9.0