Re: [PATCH] pidns: limit a size of pid to one page
On Wed, 10 Oct 2012 14:24:14 +0400 Andrew Vagin wrote: > A size of pid depends on a level of pidns and now a level of pidns > is not limited, so it can be more than one page. > > Looks reasonable, that it should be limited to a page size. On x86_64 > it will allow to create 125 nested pid namespaces. I don't know a > usecase for which, it will be not enough. When someone finds a > reasonable use case, we can add a config option or a sysctl parameter. > > In addition it will reduce effect of another problem, when we have many > nested namespaces and the oldest one starts dying. zap_pid_ns_processe > will be called for each namespace and find_vpid will be called for each > process in a namespace. find_vpid will be called minimum max_level^2 / 2 > times. The reason of that is that when we found a bit in pidmap, we > can't determine this pidns is top for this process or it isn't. > > vpid is a heavy operation, so a fork bomb, which create many nested > namespace, can do a system inaccessible for a long time. > > For example my system becomes inaccessible for a few minutes with > 4000 processes. This changelog is a bit hard to follow. After reading the code, I see that `struct pid' is a "variable sized struct" - a header with an array of upids at the end. I suggest this be explained in the changelog so others don't scratch their heads in the way I did. Really, what the patch actually does is to limit the nesting depth of pid namespaces, yes? If so I think that is the way the patch should be presented - the restriction of the storage size of a struct pid is an implementation detail. > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -70,12 +70,18 @@ err_alloc: > return NULL; > } > > +/* Limit a size of pid to one page */ > +#define MAX_PID_NS_LEVEL ((PAGE_SIZE - sizeof(struct pid)) / sizeof(struct > upid)) I don't it was a good choice to use PAGE_SIZE here. PAGE_SIZE can vary from 4k to 64k, which is a large range. This choice introduces the risk that an application will pass testing on one architecture/config but will later fail when deployed on other archs/configs. And it means that large PAGE_SIZE setups might still experience the problems which you have identified. And the use of sizeof(struct upid) and sizeof(struct pid) mean that if we later change the size of those structures, the kernel's userspace interface will be accidentally changed in subtle and surprising ways. So I suggest you do #define MAX_PID_NS_LEVEL 128 and leave it at that? And the comment for MAX_PID_NS_LEVEL should tell the reader *why* we did this, not what we did (which is pretty obvious from reading the definition). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pidns: limit a size of pid to one page
On 10/10, Andrew Vagin wrote: > > A size of pid depends on a level of pidns and now a level of pidns > is not limited, so it can be more than one page. > > Looks reasonable, that it should be limited to a page size. On x86_64 > it will allow to create 125 nested pid namespaces. I don't know a > usecase for which, it will be not enough. When someone finds a > reasonable use case, we can add a config option or a sysctl parameter. I think this makes sense. Acked-by: Oleg Nesterov > In addition it will reduce effect of another problem, when we have many > nested namespaces and the oldest one starts dying. zap_pid_ns_processe > will be called for each namespace and find_vpid will be called for each > process in a namespace. find_vpid will be called minimum max_level^2 / 2 > times. The reason of that is that when we found a bit in pidmap, we > can't determine this pidns is top for this process or it isn't. > > vpid is a heavy operation, so a fork bomb, which create many nested > namespace, can do a system inaccessible for a long time. > > For example my system becomes inaccessible for a few minutes with > 4000 processes. > > Cc: Andrew Morton > Cc: Oleg Nesterov > Cc: Cyrill Gorcunov > Cc: "Eric W. Biederman" > Cc: Pavel Emelyanov > Signed-off-by: Andrew Vagin > --- > kernel/pid_namespace.c |6 ++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c > index b051fa6..bc822e6 100644 > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -70,12 +70,18 @@ err_alloc: > return NULL; > } > > +/* Limit a size of pid to one page */ > +#define MAX_PID_NS_LEVEL ((PAGE_SIZE - sizeof(struct pid)) / sizeof(struct > upid)) > + > static struct pid_namespace *create_pid_namespace(struct pid_namespace > *parent_pid_ns) > { > struct pid_namespace *ns; > unsigned int level = parent_pid_ns->level + 1; > int i, err = -ENOMEM; > > + if (level > MAX_PID_NS_LEVEL) > + goto out; > + > ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); > if (ns == NULL) > goto out; > -- > 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pidns: limit a size of pid to one page
On 10/10, Andrew Vagin wrote: A size of pid depends on a level of pidns and now a level of pidns is not limited, so it can be more than one page. Looks reasonable, that it should be limited to a page size. On x86_64 it will allow to create 125 nested pid namespaces. I don't know a usecase for which, it will be not enough. When someone finds a reasonable use case, we can add a config option or a sysctl parameter. I think this makes sense. Acked-by: Oleg Nesterov o...@redhat.com In addition it will reduce effect of another problem, when we have many nested namespaces and the oldest one starts dying. zap_pid_ns_processe will be called for each namespace and find_vpid will be called for each process in a namespace. find_vpid will be called minimum max_level^2 / 2 times. The reason of that is that when we found a bit in pidmap, we can't determine this pidns is top for this process or it isn't. vpid is a heavy operation, so a fork bomb, which create many nested namespace, can do a system inaccessible for a long time. For example my system becomes inaccessible for a few minutes with 4000 processes. Cc: Andrew Morton a...@linux-foundation.org Cc: Oleg Nesterov o...@redhat.com Cc: Cyrill Gorcunov gorcu...@openvz.org Cc: Eric W. Biederman ebied...@xmission.com Cc: Pavel Emelyanov xe...@parallels.com Signed-off-by: Andrew Vagin ava...@openvz.org --- kernel/pid_namespace.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index b051fa6..bc822e6 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -70,12 +70,18 @@ err_alloc: return NULL; } +/* Limit a size of pid to one page */ +#define MAX_PID_NS_LEVEL ((PAGE_SIZE - sizeof(struct pid)) / sizeof(struct upid)) + static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_pid_ns) { struct pid_namespace *ns; unsigned int level = parent_pid_ns-level + 1; int i, err = -ENOMEM; + if (level MAX_PID_NS_LEVEL) + goto out; + ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); if (ns == NULL) goto out; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pidns: limit a size of pid to one page
On Wed, 10 Oct 2012 14:24:14 +0400 Andrew Vagin ava...@openvz.org wrote: A size of pid depends on a level of pidns and now a level of pidns is not limited, so it can be more than one page. Looks reasonable, that it should be limited to a page size. On x86_64 it will allow to create 125 nested pid namespaces. I don't know a usecase for which, it will be not enough. When someone finds a reasonable use case, we can add a config option or a sysctl parameter. In addition it will reduce effect of another problem, when we have many nested namespaces and the oldest one starts dying. zap_pid_ns_processe will be called for each namespace and find_vpid will be called for each process in a namespace. find_vpid will be called minimum max_level^2 / 2 times. The reason of that is that when we found a bit in pidmap, we can't determine this pidns is top for this process or it isn't. vpid is a heavy operation, so a fork bomb, which create many nested namespace, can do a system inaccessible for a long time. For example my system becomes inaccessible for a few minutes with 4000 processes. This changelog is a bit hard to follow. After reading the code, I see that `struct pid' is a variable sized struct - a header with an array of upids at the end. I suggest this be explained in the changelog so others don't scratch their heads in the way I did. Really, what the patch actually does is to limit the nesting depth of pid namespaces, yes? If so I think that is the way the patch should be presented - the restriction of the storage size of a struct pid is an implementation detail. --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -70,12 +70,18 @@ err_alloc: return NULL; } +/* Limit a size of pid to one page */ +#define MAX_PID_NS_LEVEL ((PAGE_SIZE - sizeof(struct pid)) / sizeof(struct upid)) I don't it was a good choice to use PAGE_SIZE here. PAGE_SIZE can vary from 4k to 64k, which is a large range. This choice introduces the risk that an application will pass testing on one architecture/config but will later fail when deployed on other archs/configs. And it means that large PAGE_SIZE setups might still experience the problems which you have identified. And the use of sizeof(struct upid) and sizeof(struct pid) mean that if we later change the size of those structures, the kernel's userspace interface will be accidentally changed in subtle and surprising ways. So I suggest you do #define MAX_PID_NS_LEVEL 128 and leave it at that? And the comment for MAX_PID_NS_LEVEL should tell the reader *why* we did this, not what we did (which is pretty obvious from reading the definition). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] pidns: limit a size of pid to one page
A size of pid depends on a level of pidns and now a level of pidns is not limited, so it can be more than one page. Looks reasonable, that it should be limited to a page size. On x86_64 it will allow to create 125 nested pid namespaces. I don't know a usecase for which, it will be not enough. When someone finds a reasonable use case, we can add a config option or a sysctl parameter. In addition it will reduce effect of another problem, when we have many nested namespaces and the oldest one starts dying. zap_pid_ns_processe will be called for each namespace and find_vpid will be called for each process in a namespace. find_vpid will be called minimum max_level^2 / 2 times. The reason of that is that when we found a bit in pidmap, we can't determine this pidns is top for this process or it isn't. vpid is a heavy operation, so a fork bomb, which create many nested namespace, can do a system inaccessible for a long time. For example my system becomes inaccessible for a few minutes with 4000 processes. Cc: Andrew Morton Cc: Oleg Nesterov Cc: Cyrill Gorcunov Cc: "Eric W. Biederman" Cc: Pavel Emelyanov Signed-off-by: Andrew Vagin --- kernel/pid_namespace.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index b051fa6..bc822e6 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -70,12 +70,18 @@ err_alloc: return NULL; } +/* Limit a size of pid to one page */ +#define MAX_PID_NS_LEVEL ((PAGE_SIZE - sizeof(struct pid)) / sizeof(struct upid)) + static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_pid_ns) { struct pid_namespace *ns; unsigned int level = parent_pid_ns->level + 1; int i, err = -ENOMEM; + if (level > MAX_PID_NS_LEVEL) + goto out; + ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); if (ns == NULL) goto out; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] pidns: limit a size of pid to one page
A size of pid depends on a level of pidns and now a level of pidns is not limited, so it can be more than one page. Looks reasonable, that it should be limited to a page size. On x86_64 it will allow to create 125 nested pid namespaces. I don't know a usecase for which, it will be not enough. When someone finds a reasonable use case, we can add a config option or a sysctl parameter. In addition it will reduce effect of another problem, when we have many nested namespaces and the oldest one starts dying. zap_pid_ns_processe will be called for each namespace and find_vpid will be called for each process in a namespace. find_vpid will be called minimum max_level^2 / 2 times. The reason of that is that when we found a bit in pidmap, we can't determine this pidns is top for this process or it isn't. vpid is a heavy operation, so a fork bomb, which create many nested namespace, can do a system inaccessible for a long time. For example my system becomes inaccessible for a few minutes with 4000 processes. Cc: Andrew Morton a...@linux-foundation.org Cc: Oleg Nesterov o...@redhat.com Cc: Cyrill Gorcunov gorcu...@openvz.org Cc: Eric W. Biederman ebied...@xmission.com Cc: Pavel Emelyanov xe...@parallels.com Signed-off-by: Andrew Vagin ava...@openvz.org --- kernel/pid_namespace.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index b051fa6..bc822e6 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -70,12 +70,18 @@ err_alloc: return NULL; } +/* Limit a size of pid to one page */ +#define MAX_PID_NS_LEVEL ((PAGE_SIZE - sizeof(struct pid)) / sizeof(struct upid)) + static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_pid_ns) { struct pid_namespace *ns; unsigned int level = parent_pid_ns-level + 1; int i, err = -ENOMEM; + if (level MAX_PID_NS_LEVEL) + goto out; + ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL); if (ns == NULL) goto out; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/