On Fri, 12 Oct 2012 16:30:42 +0400 Andrew Vagin <ava...@openvz.org> wrote:
> 'struct pid' is a "variable sized struct" - a header with an array > of upids at the end. > > A size of the array depends on a level (depth) of pid namespaces. Now > a level of pidns is not limited, so 'struct pid' can be more than one > page. > > Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL > is not calculated from PAGE_SIZE, because in this case it depends on > architectures, config options and it will be reduced, if someone adds a > new fields in struct pid or struct upid. > > I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to > expand "struct pid" and it's more than enough for all known for me > use-cases. When someone finds a reasonable use case, we can add a > config option or a sysctl parameter. > > In addition it will reduce effect of another problem, when we have many > nested namespaces and the oldest one starts dying. zap_pid_ns_processe > will be called for each namespace and find_vpid will be called for each > process in a namespace. find_vpid will be called minimum max_level^2 / 2 > times. The reason of that is that when we found a bit in pidmap, we > can't determine this pidns is top for this process or it isn't. > > vpid is a heavy operation, so a fork bomb, which create many nested > namespace, can do a system inaccessible for a long time. > > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -70,12 +70,18 @@ err_alloc: > return NULL; > } > > +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */ > +#define MAX_PID_NS_LEVEL 32 > + > static struct pid_namespace *create_pid_namespace(struct pid_namespace > *parent_pid_ns) > { > struct pid_namespace *ns; > unsigned int level = parent_pid_ns->level + 1; > int i, err = -ENOMEM; > > + if (level > MAX_PID_NS_LEVEL) > + goto out; > + > ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); > if (ns == NULL) > goto out; hm, OK. This will kind-of fix the free_pid_ns()-excessive-recursion issue which we already fixed by other means ;) I don't think this problem is serious enough to bother backporting into -stable but I guess we can/should do it in 3.7. Agree? I think that returning -ENOMEM in response to an excessive nesting attempt is misleading - the system *didn't* run out of memory. EINVAL is better? From: Andrew Morton <a...@linux-foundation.org> Subject: pidns-limit-the-nesting-depth-of-pid-namespaces-fix return -EINVAL in response to excessive nesting, not -ENOMEM Cc: "Eric W. Biederman" <ebied...@xmission.com> Cc: Andrew Vagin <ava...@openvz.org> Cc: Cyrill Gorcunov <gorcu...@openvz.org> Cc: Oleg Nesterov <o...@redhat.com> Cc: Pavel Emelyanov <xe...@parallels.com> Signed-off-by: Andrew Morton <a...@linux-foundation.org> --- kernel/pid_namespace.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff -puN kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix kernel/pid_namespace.c --- a/kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix +++ a/kernel/pid_namespace.c @@ -78,11 +78,15 @@ static struct pid_namespace *create_pid_ { struct pid_namespace *ns; unsigned int level = parent_pid_ns->level + 1; - int i, err = -ENOMEM; + int i; + int err; - if (level > MAX_PID_NS_LEVEL) + if (level > MAX_PID_NS_LEVEL) { + err = -EINVAL; goto out; + } + err = -ENOMEM; ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); if (ns == NULL) goto out; _ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/