Re: [PATCH] pidns: limit the nesting depth of pid namespaces
2012/10/24 Andrew Morton : > On Wed, 24 Oct 2012 09:38:57 +0400 > Andrey Wagin wrote: > >> > >> > I think that returning -ENOMEM in response to an excessive nesting >> > attempt is misleading - the system *didn't* run out of memory. EINVAL >> > is better? >> >> I chose ENOMEM by analogy with max_pid. When a new PID can not be >> allocated, ENOMEM is returned too. > > I don't know what this means - please be carefully specific when > identifying kernel code. Sorry. > > If you're referring to kernel/pid.c:alloc_pid() then -ENOMEM is > appropriate there, because a failure *is* caused by memory allocation > failure. I'm referring to alloc_pidmap(). For example I set pid_max to 500 and try to create more than 500 processes. [pid 345] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8721716a10) = -1 ENOMEM (Cannot allocate memory) Actually I'm agree with EINVAL and a patch is attached to this message. Thanks. > > But ENOMEM isn't appropriate for nesting-depth-exceeded - we shouldn't > tell the user "you ran out of memory" when he didn't! -EINVAL isn't > really appropriate either ("Invalid argument") but it has become a > general you-screwed-up catchall and seems to me to be the most > appropriate errno we have available. > 0001-pidns-limit-the-nesting-depth-of-pid-namespaces-v2.patch Description: Binary data
Re: [PATCH] pidns: limit the nesting depth of pid namespaces
On Wed, 24 Oct 2012 09:38:57 +0400 Andrey Wagin wrote: > > > > I think that returning -ENOMEM in response to an excessive nesting > > attempt is misleading - the system *didn't* run out of memory. EINVAL > > is better? > > I chose ENOMEM by analogy with max_pid. When a new PID can not be > allocated, ENOMEM is returned too. I don't know what this means - please be carefully specific when identifying kernel code. If you're referring to kernel/pid.c:alloc_pid() then -ENOMEM is appropriate there, because a failure *is* caused by memory allocation failure. But ENOMEM isn't appropriate for nesting-depth-exceeded - we shouldn't tell the user "you ran out of memory" when he didn't! -EINVAL isn't really appropriate either ("Invalid argument") but it has become a general you-screwed-up catchall and seems to me to be the most appropriate errno we have available. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pidns: limit the nesting depth of pid namespaces
2012/10/24 Andrew Morton : > On Fri, 12 Oct 2012 16:30:42 +0400 > Andrew Vagin wrote: > >> 'struct pid' is a "variable sized struct" - a header with an array >> of upids at the end. >> >> A size of the array depends on a level (depth) of pid namespaces. Now >> a level of pidns is not limited, so 'struct pid' can be more than one >> page. >> >> Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL >> is not calculated from PAGE_SIZE, because in this case it depends on >> architectures, config options and it will be reduced, if someone adds a >> new fields in struct pid or struct upid. >> >> I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to >> expand "struct pid" and it's more than enough for all known for me >> use-cases. When someone finds a reasonable use case, we can add a >> config option or a sysctl parameter. >> >> In addition it will reduce effect of another problem, when we have many >> nested namespaces and the oldest one starts dying. zap_pid_ns_processe >> will be called for each namespace and find_vpid will be called for each >> process in a namespace. find_vpid will be called minimum max_level^2 / 2 >> times. The reason of that is that when we found a bit in pidmap, we >> can't determine this pidns is top for this process or it isn't. >> >> vpid is a heavy operation, so a fork bomb, which create many nested >> namespace, can do a system inaccessible for a long time. >> >> --- a/kernel/pid_namespace.c >> +++ b/kernel/pid_namespace.c >> @@ -70,12 +70,18 @@ err_alloc: >> return NULL; >> } >> >> +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */ >> +#define MAX_PID_NS_LEVEL 32 >> + >> static struct pid_namespace *create_pid_namespace(struct pid_namespace >> *parent_pid_ns) >> { >> struct pid_namespace *ns; >> unsigned int level = parent_pid_ns->level + 1; >> int i, err = -ENOMEM; >> >> + if (level > MAX_PID_NS_LEVEL) >> + goto out; >> + >> ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); >> if (ns == NULL) >> goto out; > > hm, OK. This will kind-of fix the free_pid_ns()-excessive-recursion > issue which we already fixed by other means ;) I caught both problems with help of an one program, which create many nested namespaces. Actually this patch prevents another problem. A few thousand nested namespaces is destroyed for more than one minutes and in this period a system is inaccessible. > > I don't think this problem is serious enough to bother backporting into > -stable but I guess we can/should do it in 3.7. Agree? Yes. > > I think that returning -ENOMEM in response to an excessive nesting > attempt is misleading - the system *didn't* run out of memory. EINVAL > is better? I chose ENOMEM by analogy with max_pid. When a new PID can not be allocated, ENOMEM is returned too. > > > > From: Andrew Morton > Subject: pidns-limit-the-nesting-depth-of-pid-namespaces-fix > > return -EINVAL in response to excessive nesting, not -ENOMEM > > Cc: "Eric W. Biederman" > Cc: Andrew Vagin > Cc: Cyrill Gorcunov > Cc: Oleg Nesterov > Cc: Pavel Emelyanov > Signed-off-by: Andrew Morton > --- > > kernel/pid_namespace.c |8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff -puN > kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix > kernel/pid_namespace.c > --- > a/kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix > +++ a/kernel/pid_namespace.c > @@ -78,11 +78,15 @@ static struct pid_namespace *create_pid_ > { > struct pid_namespace *ns; > unsigned int level = parent_pid_ns->level + 1; > - int i, err = -ENOMEM; > + int i; > + int err; > > - if (level > MAX_PID_NS_LEVEL) > + if (level > MAX_PID_NS_LEVEL) { > + err = -EINVAL; > goto out; > + } > > + err = -ENOMEM; > ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); > if (ns == NULL) > goto out; > _ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pidns: limit the nesting depth of pid namespaces
On Fri, 12 Oct 2012 16:30:42 +0400 Andrew Vagin wrote: > 'struct pid' is a "variable sized struct" - a header with an array > of upids at the end. > > A size of the array depends on a level (depth) of pid namespaces. Now > a level of pidns is not limited, so 'struct pid' can be more than one > page. > > Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL > is not calculated from PAGE_SIZE, because in this case it depends on > architectures, config options and it will be reduced, if someone adds a > new fields in struct pid or struct upid. > > I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to > expand "struct pid" and it's more than enough for all known for me > use-cases. When someone finds a reasonable use case, we can add a > config option or a sysctl parameter. > > In addition it will reduce effect of another problem, when we have many > nested namespaces and the oldest one starts dying. zap_pid_ns_processe > will be called for each namespace and find_vpid will be called for each > process in a namespace. find_vpid will be called minimum max_level^2 / 2 > times. The reason of that is that when we found a bit in pidmap, we > can't determine this pidns is top for this process or it isn't. > > vpid is a heavy operation, so a fork bomb, which create many nested > namespace, can do a system inaccessible for a long time. > > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -70,12 +70,18 @@ err_alloc: > return NULL; > } > > +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */ > +#define MAX_PID_NS_LEVEL 32 > + > static struct pid_namespace *create_pid_namespace(struct pid_namespace > *parent_pid_ns) > { > struct pid_namespace *ns; > unsigned int level = parent_pid_ns->level + 1; > int i, err = -ENOMEM; > > + if (level > MAX_PID_NS_LEVEL) > + goto out; > + > ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); > if (ns == NULL) > goto out; hm, OK. This will kind-of fix the free_pid_ns()-excessive-recursion issue which we already fixed by other means ;) I don't think this problem is serious enough to bother backporting into -stable but I guess we can/should do it in 3.7. Agree? I think that returning -ENOMEM in response to an excessive nesting attempt is misleading - the system *didn't* run out of memory. EINVAL is better? From: Andrew Morton Subject: pidns-limit-the-nesting-depth-of-pid-namespaces-fix return -EINVAL in response to excessive nesting, not -ENOMEM Cc: "Eric W. Biederman" Cc: Andrew Vagin Cc: Cyrill Gorcunov Cc: Oleg Nesterov Cc: Pavel Emelyanov Signed-off-by: Andrew Morton --- kernel/pid_namespace.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff -puN kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix kernel/pid_namespace.c --- a/kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix +++ a/kernel/pid_namespace.c @@ -78,11 +78,15 @@ static struct pid_namespace *create_pid_ { struct pid_namespace *ns; unsigned int level = parent_pid_ns->level + 1; - int i, err = -ENOMEM; + int i; + int err; - if (level > MAX_PID_NS_LEVEL) + if (level > MAX_PID_NS_LEVEL) { + err = -EINVAL; goto out; + } + err = -ENOMEM; ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); if (ns == NULL) goto out; _ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pidns: limit the nesting depth of pid namespaces
On 10/12, Andrew Vagin wrote: > > 'struct pid' is a "variable sized struct" - a header with an array > of upids at the end. > > A size of the array depends on a level (depth) of pid namespaces. Now > a level of pidns is not limited, so 'struct pid' can be more than one > page. > > Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL > is not calculated from PAGE_SIZE, because in this case it depends on > architectures, config options and it will be reduced, if someone adds a > new fields in struct pid or struct upid. > > I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to > expand "struct pid" and it's more than enough for all known for me > use-cases. When someone finds a reasonable use case, we can add a > config option or a sysctl parameter. > > In addition it will reduce effect of another problem, when we have many > nested namespaces and the oldest one starts dying. zap_pid_ns_processe > will be called for each namespace and find_vpid will be called for each > process in a namespace. find_vpid will be called minimum max_level^2 / 2 > times. The reason of that is that when we found a bit in pidmap, we > can't determine this pidns is top for this process or it isn't. > > vpid is a heavy operation, so a fork bomb, which create many nested > namespace, can do a system inaccessible for a long time. > > Cc: Andrew Morton > Cc: Oleg Nesterov > Cc: Cyrill Gorcunov > Cc: "Eric W. Biederman" > Cc: Pavel Emelyanov > Signed-off-by: Andrew Vagin Acked-by: Oleg Nesterov > --- > kernel/pid_namespace.c |6 ++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c > index b051fa6..598bfb3 100644 > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -70,12 +70,18 @@ err_alloc: > return NULL; > } > > +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */ > +#define MAX_PID_NS_LEVEL 32 > + > static struct pid_namespace *create_pid_namespace(struct pid_namespace > *parent_pid_ns) > { > struct pid_namespace *ns; > unsigned int level = parent_pid_ns->level + 1; > int i, err = -ENOMEM; > > + if (level > MAX_PID_NS_LEVEL) > + goto out; > + > ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); > if (ns == NULL) > goto out; > -- > 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pidns: limit the nesting depth of pid namespaces
Hello Andrew and Oleg, Andrew, what do you think about this patch? I reworked it according with your comments to the previous version. Oleg, could you send Ack in this version, if it's ok for you. Thanks. 2012/10/12 Andrew Vagin : > 'struct pid' is a "variable sized struct" - a header with an array > of upids at the end. > > A size of the array depends on a level (depth) of pid namespaces. Now > a level of pidns is not limited, so 'struct pid' can be more than one > page. > > Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL > is not calculated from PAGE_SIZE, because in this case it depends on > architectures, config options and it will be reduced, if someone adds a > new fields in struct pid or struct upid. > > I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to > expand "struct pid" and it's more than enough for all known for me > use-cases. When someone finds a reasonable use case, we can add a > config option or a sysctl parameter. > > In addition it will reduce effect of another problem, when we have many > nested namespaces and the oldest one starts dying. zap_pid_ns_processe > will be called for each namespace and find_vpid will be called for each > process in a namespace. find_vpid will be called minimum max_level^2 / 2 > times. The reason of that is that when we found a bit in pidmap, we > can't determine this pidns is top for this process or it isn't. > > vpid is a heavy operation, so a fork bomb, which create many nested > namespace, can do a system inaccessible for a long time. > > Cc: Andrew Morton > Cc: Oleg Nesterov > Cc: Cyrill Gorcunov > Cc: "Eric W. Biederman" > Cc: Pavel Emelyanov > Signed-off-by: Andrew Vagin > --- > kernel/pid_namespace.c |6 ++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c > index b051fa6..598bfb3 100644 > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -70,12 +70,18 @@ err_alloc: > return NULL; > } > > +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */ > +#define MAX_PID_NS_LEVEL 32 > + > static struct pid_namespace *create_pid_namespace(struct pid_namespace > *parent_pid_ns) > { > struct pid_namespace *ns; > unsigned int level = parent_pid_ns->level + 1; > int i, err = -ENOMEM; > > + if (level > MAX_PID_NS_LEVEL) > + goto out; > + > ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); > if (ns == NULL) > goto out; > -- > 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] pidns: limit the nesting depth of pid namespaces
'struct pid' is a "variable sized struct" - a header with an array of upids at the end. A size of the array depends on a level (depth) of pid namespaces. Now a level of pidns is not limited, so 'struct pid' can be more than one page. Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL is not calculated from PAGE_SIZE, because in this case it depends on architectures, config options and it will be reduced, if someone adds a new fields in struct pid or struct upid. I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to expand "struct pid" and it's more than enough for all known for me use-cases. When someone finds a reasonable use case, we can add a config option or a sysctl parameter. In addition it will reduce effect of another problem, when we have many nested namespaces and the oldest one starts dying. zap_pid_ns_processe will be called for each namespace and find_vpid will be called for each process in a namespace. find_vpid will be called minimum max_level^2 / 2 times. The reason of that is that when we found a bit in pidmap, we can't determine this pidns is top for this process or it isn't. vpid is a heavy operation, so a fork bomb, which create many nested namespace, can do a system inaccessible for a long time. Cc: Andrew Morton Cc: Oleg Nesterov Cc: Cyrill Gorcunov Cc: "Eric W. Biederman" Cc: Pavel Emelyanov Signed-off-by: Andrew Vagin --- kernel/pid_namespace.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index b051fa6..598bfb3 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -70,12 +70,18 @@ err_alloc: return NULL; } +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */ +#define MAX_PID_NS_LEVEL 32 + static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_pid_ns) { struct pid_namespace *ns; unsigned int level = parent_pid_ns->level + 1; int i, err = -ENOMEM; + if (level > MAX_PID_NS_LEVEL) + goto out; + ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); if (ns == NULL) goto out; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/