Re: [PATCH] pidns: limit the nesting depth of pid namespaces

2012-10-25 Thread Andrey Wagin
2012/10/24 Andrew Morton :
> On Wed, 24 Oct 2012 09:38:57 +0400
> Andrey Wagin  wrote:
>
>> >
>> > I think that returning -ENOMEM in response to an excessive nesting
>> > attempt is misleading - the system *didn't* run out of memory.  EINVAL
>> > is better?
>>
>> I chose ENOMEM by analogy with max_pid.  When a new PID can not be
>> allocated, ENOMEM is returned too.
>
> I don't know what this means - please be carefully specific when
> identifying kernel code.

Sorry.

>
> If you're referring to kernel/pid.c:alloc_pid() then -ENOMEM is
> appropriate there, because a failure *is* caused by memory allocation
> failure.

I'm referring to alloc_pidmap().
For example I set pid_max to 500 and try to create more than 500 processes.

[pid   345] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f8721716a10) = -1 ENOMEM (Cannot allocate memory)

Actually I'm agree with EINVAL and a patch is attached to this message.

Thanks.
>
> But ENOMEM isn't appropriate for nesting-depth-exceeded - we shouldn't
> tell the user "you ran out of memory" when he didn't!  -EINVAL isn't
> really appropriate either ("Invalid argument") but it has become a
> general you-screwed-up catchall and seems to me to be the most
> appropriate errno we have available.
>


0001-pidns-limit-the-nesting-depth-of-pid-namespaces-v2.patch
Description: Binary data


Re: [PATCH] pidns: limit the nesting depth of pid namespaces

2012-10-24 Thread Andrew Morton
On Wed, 24 Oct 2012 09:38:57 +0400
Andrey Wagin  wrote:

> >
> > I think that returning -ENOMEM in response to an excessive nesting
> > attempt is misleading - the system *didn't* run out of memory.  EINVAL
> > is better?
> 
> I chose ENOMEM by analogy with max_pid.  When a new PID can not be
> allocated, ENOMEM is returned too.

I don't know what this means - please be carefully specific when
identifying kernel code.

If you're referring to kernel/pid.c:alloc_pid() then -ENOMEM is
appropriate there, because a failure *is* caused by memory allocation
failure.

But ENOMEM isn't appropriate for nesting-depth-exceeded - we shouldn't
tell the user "you ran out of memory" when he didn't!  -EINVAL isn't
really appropriate either ("Invalid argument") but it has become a
general you-screwed-up catchall and seems to me to be the most
appropriate errno we have available.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pidns: limit the nesting depth of pid namespaces

2012-10-23 Thread Andrey Wagin
2012/10/24 Andrew Morton :
> On Fri, 12 Oct 2012 16:30:42 +0400
> Andrew Vagin  wrote:
>
>> 'struct pid' is a "variable sized struct" - a header with an array
>> of upids at the end.
>>
>> A size of the array depends on a level (depth) of pid namespaces. Now
>> a level of pidns is not limited, so 'struct pid' can be more than one
>> page.
>>
>> Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
>> is not calculated from PAGE_SIZE, because in this case it depends on
>> architectures, config options and it will be reduced, if someone adds a
>> new fields in struct pid or struct upid.
>>
>> I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
>> expand "struct pid" and it's more than enough for all known for me
>> use-cases.  When someone finds a reasonable use case, we can add a
>> config option or a sysctl parameter.
>>
>> In addition it will reduce effect of another problem, when we have many
>> nested namespaces and the oldest one starts dying.  zap_pid_ns_processe
>> will be called for each namespace and find_vpid will be called for each
>> process in a namespace. find_vpid will be called minimum max_level^2 / 2
>> times. The reason of that is that when we found a bit in pidmap, we
>> can't determine this pidns is top for this process or it isn't.
>>
>> vpid is a heavy operation, so a fork bomb, which create many nested
>> namespace, can do a system inaccessible for a long time.
>>
>> --- a/kernel/pid_namespace.c
>> +++ b/kernel/pid_namespace.c
>> @@ -70,12 +70,18 @@ err_alloc:
>>   return NULL;
>>  }
>>
>> +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
>> +#define MAX_PID_NS_LEVEL 32
>> +
>>  static struct pid_namespace *create_pid_namespace(struct pid_namespace 
>> *parent_pid_ns)
>>  {
>>   struct pid_namespace *ns;
>>   unsigned int level = parent_pid_ns->level + 1;
>>   int i, err = -ENOMEM;
>>
>> + if (level > MAX_PID_NS_LEVEL)
>> + goto out;
>> +
>>   ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
>>   if (ns == NULL)
>>   goto out;
>
> hm, OK.  This will kind-of fix the free_pid_ns()-excessive-recursion
> issue which we already fixed by other means ;)

I caught both problems with help of an one program, which create many
nested namespaces. Actually this patch prevents another problem. A few
thousand nested namespaces is destroyed for more than one minutes and
in this period a system is inaccessible.

>
> I don't think this problem is serious enough to bother backporting into
> -stable but I guess we can/should do it in 3.7.  Agree?

Yes.

>
> I think that returning -ENOMEM in response to an excessive nesting
> attempt is misleading - the system *didn't* run out of memory.  EINVAL
> is better?

I chose ENOMEM by analogy with max_pid.  When a new PID can not be
allocated, ENOMEM is returned too.

>
>
>
> From: Andrew Morton 
> Subject: pidns-limit-the-nesting-depth-of-pid-namespaces-fix
>
> return -EINVAL in response to excessive nesting, not -ENOMEM
>
> Cc: "Eric W. Biederman" 
> Cc: Andrew Vagin 
> Cc: Cyrill Gorcunov 
> Cc: Oleg Nesterov 
> Cc: Pavel Emelyanov 
> Signed-off-by: Andrew Morton 
> ---
>
>  kernel/pid_namespace.c |8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff -puN 
> kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix 
> kernel/pid_namespace.c
> --- 
> a/kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix
> +++ a/kernel/pid_namespace.c
> @@ -78,11 +78,15 @@ static struct pid_namespace *create_pid_
>  {
> struct pid_namespace *ns;
> unsigned int level = parent_pid_ns->level + 1;
> -   int i, err = -ENOMEM;
> +   int i;
> +   int err;
>
> -   if (level > MAX_PID_NS_LEVEL)
> +   if (level > MAX_PID_NS_LEVEL) {
> +   err = -EINVAL;
> goto out;
> +   }
>
> +   err = -ENOMEM;
> ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
> if (ns == NULL)
> goto out;
> _
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pidns: limit the nesting depth of pid namespaces

2012-10-23 Thread Andrew Morton
On Fri, 12 Oct 2012 16:30:42 +0400
Andrew Vagin  wrote:

> 'struct pid' is a "variable sized struct" - a header with an array
> of upids at the end.
> 
> A size of the array depends on a level (depth) of pid namespaces. Now
> a level of pidns is not limited, so 'struct pid' can be more than one
> page.
> 
> Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
> is not calculated from PAGE_SIZE, because in this case it depends on
> architectures, config options and it will be reduced, if someone adds a
> new fields in struct pid or struct upid.
> 
> I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
> expand "struct pid" and it's more than enough for all known for me
> use-cases.  When someone finds a reasonable use case, we can add a
> config option or a sysctl parameter.
> 
> In addition it will reduce effect of another problem, when we have many
> nested namespaces and the oldest one starts dying.  zap_pid_ns_processe
> will be called for each namespace and find_vpid will be called for each
> process in a namespace. find_vpid will be called minimum max_level^2 / 2
> times. The reason of that is that when we found a bit in pidmap, we
> can't determine this pidns is top for this process or it isn't.
> 
> vpid is a heavy operation, so a fork bomb, which create many nested
> namespace, can do a system inaccessible for a long time.
> 
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -70,12 +70,18 @@ err_alloc:
>   return NULL;
>  }
>  
> +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
> +#define MAX_PID_NS_LEVEL 32
> +
>  static struct pid_namespace *create_pid_namespace(struct pid_namespace 
> *parent_pid_ns)
>  {
>   struct pid_namespace *ns;
>   unsigned int level = parent_pid_ns->level + 1;
>   int i, err = -ENOMEM;
>  
> + if (level > MAX_PID_NS_LEVEL)
> + goto out;
> +
>   ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
>   if (ns == NULL)
>   goto out;

hm, OK.  This will kind-of fix the free_pid_ns()-excessive-recursion
issue which we already fixed by other means ;)

I don't think this problem is serious enough to bother backporting into
-stable but I guess we can/should do it in 3.7.  Agree?

I think that returning -ENOMEM in response to an excessive nesting
attempt is misleading - the system *didn't* run out of memory.  EINVAL
is better?



From: Andrew Morton 
Subject: pidns-limit-the-nesting-depth-of-pid-namespaces-fix

return -EINVAL in response to excessive nesting, not -ENOMEM

Cc: "Eric W. Biederman" 
Cc: Andrew Vagin 
Cc: Cyrill Gorcunov 
Cc: Oleg Nesterov 
Cc: Pavel Emelyanov 
Signed-off-by: Andrew Morton 
---

 kernel/pid_namespace.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff -puN 
kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix 
kernel/pid_namespace.c
--- a/kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix
+++ a/kernel/pid_namespace.c
@@ -78,11 +78,15 @@ static struct pid_namespace *create_pid_
 {
struct pid_namespace *ns;
unsigned int level = parent_pid_ns->level + 1;
-   int i, err = -ENOMEM;
+   int i;
+   int err;
 
-   if (level > MAX_PID_NS_LEVEL)
+   if (level > MAX_PID_NS_LEVEL) {
+   err = -EINVAL;
goto out;
+   }
 
+   err = -ENOMEM;
ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
if (ns == NULL)
goto out;
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pidns: limit the nesting depth of pid namespaces

2012-10-20 Thread Oleg Nesterov
On 10/12, Andrew Vagin wrote:
>
> 'struct pid' is a "variable sized struct" - a header with an array
> of upids at the end.
>
> A size of the array depends on a level (depth) of pid namespaces. Now
> a level of pidns is not limited, so 'struct pid' can be more than one
> page.
>
> Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
> is not calculated from PAGE_SIZE, because in this case it depends on
> architectures, config options and it will be reduced, if someone adds a
> new fields in struct pid or struct upid.
>
> I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
> expand "struct pid" and it's more than enough for all known for me
> use-cases.  When someone finds a reasonable use case, we can add a
> config option or a sysctl parameter.
>
> In addition it will reduce effect of another problem, when we have many
> nested namespaces and the oldest one starts dying.  zap_pid_ns_processe
> will be called for each namespace and find_vpid will be called for each
> process in a namespace. find_vpid will be called minimum max_level^2 / 2
> times. The reason of that is that when we found a bit in pidmap, we
> can't determine this pidns is top for this process or it isn't.
>
> vpid is a heavy operation, so a fork bomb, which create many nested
> namespace, can do a system inaccessible for a long time.
>
> Cc: Andrew Morton 
> Cc: Oleg Nesterov 
> Cc: Cyrill Gorcunov 
> Cc: "Eric W. Biederman" 
> Cc: Pavel Emelyanov 
> Signed-off-by: Andrew Vagin 


Acked-by: Oleg Nesterov 



> ---
>  kernel/pid_namespace.c |6 ++
>  1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index b051fa6..598bfb3 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -70,12 +70,18 @@ err_alloc:
>   return NULL;
>  }
>
> +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
> +#define MAX_PID_NS_LEVEL 32
> +
>  static struct pid_namespace *create_pid_namespace(struct pid_namespace 
> *parent_pid_ns)
>  {
>   struct pid_namespace *ns;
>   unsigned int level = parent_pid_ns->level + 1;
>   int i, err = -ENOMEM;
>
> + if (level > MAX_PID_NS_LEVEL)
> + goto out;
> +
>   ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
>   if (ns == NULL)
>   goto out;
> --
> 1.7.1
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pidns: limit the nesting depth of pid namespaces

2012-10-19 Thread Andrey Wagin
Hello Andrew and Oleg,

Andrew, what do you think about this patch? I reworked it according
with your comments to the previous version.

Oleg, could you send Ack in this version, if it's ok for you.

Thanks.

2012/10/12 Andrew Vagin :
> 'struct pid' is a "variable sized struct" - a header with an array
> of upids at the end.
>
> A size of the array depends on a level (depth) of pid namespaces. Now
> a level of pidns is not limited, so 'struct pid' can be more than one
> page.
>
> Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
> is not calculated from PAGE_SIZE, because in this case it depends on
> architectures, config options and it will be reduced, if someone adds a
> new fields in struct pid or struct upid.
>
> I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
> expand "struct pid" and it's more than enough for all known for me
> use-cases.  When someone finds a reasonable use case, we can add a
> config option or a sysctl parameter.
>
> In addition it will reduce effect of another problem, when we have many
> nested namespaces and the oldest one starts dying.  zap_pid_ns_processe
> will be called for each namespace and find_vpid will be called for each
> process in a namespace. find_vpid will be called minimum max_level^2 / 2
> times. The reason of that is that when we found a bit in pidmap, we
> can't determine this pidns is top for this process or it isn't.
>
> vpid is a heavy operation, so a fork bomb, which create many nested
> namespace, can do a system inaccessible for a long time.
>
> Cc: Andrew Morton 
> Cc: Oleg Nesterov 
> Cc: Cyrill Gorcunov 
> Cc: "Eric W. Biederman" 
> Cc: Pavel Emelyanov 
> Signed-off-by: Andrew Vagin 
> ---
>  kernel/pid_namespace.c |6 ++
>  1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index b051fa6..598bfb3 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -70,12 +70,18 @@ err_alloc:
> return NULL;
>  }
>
> +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
> +#define MAX_PID_NS_LEVEL 32
> +
>  static struct pid_namespace *create_pid_namespace(struct pid_namespace 
> *parent_pid_ns)
>  {
> struct pid_namespace *ns;
> unsigned int level = parent_pid_ns->level + 1;
> int i, err = -ENOMEM;
>
> +   if (level > MAX_PID_NS_LEVEL)
> +   goto out;
> +
> ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
> if (ns == NULL)
> goto out;
> --
> 1.7.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] pidns: limit the nesting depth of pid namespaces

2012-10-12 Thread Andrew Vagin
'struct pid' is a "variable sized struct" - a header with an array
of upids at the end.

A size of the array depends on a level (depth) of pid namespaces. Now
a level of pidns is not limited, so 'struct pid' can be more than one
page.

Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
is not calculated from PAGE_SIZE, because in this case it depends on
architectures, config options and it will be reduced, if someone adds a
new fields in struct pid or struct upid.

I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
expand "struct pid" and it's more than enough for all known for me
use-cases.  When someone finds a reasonable use case, we can add a
config option or a sysctl parameter.

In addition it will reduce effect of another problem, when we have many
nested namespaces and the oldest one starts dying.  zap_pid_ns_processe
will be called for each namespace and find_vpid will be called for each
process in a namespace. find_vpid will be called minimum max_level^2 / 2
times. The reason of that is that when we found a bit in pidmap, we
can't determine this pidns is top for this process or it isn't.

vpid is a heavy operation, so a fork bomb, which create many nested
namespace, can do a system inaccessible for a long time.

Cc: Andrew Morton 
Cc: Oleg Nesterov 
Cc: Cyrill Gorcunov 
Cc: "Eric W. Biederman" 
Cc: Pavel Emelyanov 
Signed-off-by: Andrew Vagin 
---
 kernel/pid_namespace.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index b051fa6..598bfb3 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -70,12 +70,18 @@ err_alloc:
return NULL;
 }
 
+/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
+#define MAX_PID_NS_LEVEL 32
+
 static struct pid_namespace *create_pid_namespace(struct pid_namespace 
*parent_pid_ns)
 {
struct pid_namespace *ns;
unsigned int level = parent_pid_ns->level + 1;
int i, err = -ENOMEM;
 
+   if (level > MAX_PID_NS_LEVEL)
+   goto out;
+
ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
if (ns == NULL)
goto out;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/