On Wed, Oct 10, 2012 at 3:49 PM, Greg KH <g...@kroah.com> wrote: > On Tue, Oct 09, 2012 at 12:08:31PM -0700, Andrew Morton wrote: >> On Tue, 9 Oct 2012 12:03:00 -0700 >> Greg KH <g...@kroah.com> wrote: >> >> > On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote: >> > > On Sat, 6 Oct 2012 23:56:33 +0400 >> > > Andrew Vagin <ava...@openvz.org> wrote: >> > > >> > > > Here is a stack trace of recursion: >> > > > free_pid_ns(parent) >> > > > put_pid_ns(parent) >> > > > kref_put(&ns->kref, free_pid_ns); >> > > > free_pid_ns >> > > > >> > > > This patch turns recursion into loops. >> > > > >> > > > pidns can be nested many times, so in case of recursion >> > > > a simple user space program can provoke a kernel panic >> > > > due to exceed of a kernel stack. >> > > >> > > So we should backport this into earlier kernels. >> > > >> > > > --- a/include/linux/kref.h >> > > > +++ b/include/linux/kref.h >> > > > @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void >> > > > (*release)(struct kref *kref) >> > > > return kref_sub(kref, 1, release); >> > > > } >> > > > >> > > > +/** >> > > > + * kref_put - decrement refcount for object. >> > > > + * @kref: object. >> > > > + * >> > > > + * Decrement the refcount. >> > > > + * Return 1 if refcount is zero. >> > > > + */ >> > > > +static inline int __kref_put(struct kref *kref) >> > > > +{ >> > > > + return atomic_dec_and_test(&kref->refcount); >> > > > +} >> > > >> > > Greg might be interested in this. >> > > >> > > It's a pretty specialised thing and perhaps it needs some stern words >> > > in the description explaining when and why it should and shouldn't be >> > > used. >> > > >> > > I wonder if people might (ab)use this to avoid the "doesn't >> > > have a release function" warning. >> > >> > Yes they would, please don't do this at all. >> > >> > In fact, why is it needed? It doesn't solve anything (if it does, >> > something in the way the kref is being used is wrong.) >> > >> >> It's right there in the changelog. The patch fixes deep >> kref_put->release->kref_put recursion by turning the operation for >> pidns into a loop. > > But why would a kref release function ever decrement the same kref > again causing a loop in the first place? >
This should be kref_put ns->parent recursionly, put_pid_ns(ns) -> free_pid_ns(ns) -> put_pid_ns(ns->parent)->....... If users create many nested namespaces, the recursion put_pid_ns() will exausted the kernel stack. > That's what I was referring to. This strongly sounds like a problem in > how the kref is being used, not in the kref code itself. > > Is a kref even the correct thing here? Can we fix this by this way? free_pid_ns just release ns itself, we check the return value of kref_put, if kref_put returns 1, means ns->kref is removed, then we kref_put(ns->parent). diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 00474b0..2168535 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -53,8 +53,14 @@ extern int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd); static inline void put_pid_ns(struct pid_namespace *ns) { - if (ns != &init_pid_ns) - kref_put(&ns->kref, free_pid_ns); + int ret; + struct pid_namespace *parent + + while (ns != &init_pid_ns) { + parent = ns->parent; + ret = kref_put(&ns->kref, free_pid_ns); + if (!ret) + break; + else ns = parent; + } } #else /* !CONFIG_PID_NS */ diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 478bad2..85ca23a 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -135,15 +135,11 @@ struct pid_namespace *copy_pid_ns(unsigned long flags, struct pid_namespace *old void free_pid_ns(struct kref *kref) { - struct pid_namespace *ns, *parent; + struct pid_namespace *ns; ns = container_of(kref, struct pid_namespace, kref); - parent = ns->parent; destroy_pid_namespace(ns); - - if (parent != NULL) - put_pid_ns(parent); } EXPORT_SYMBOL_GPL(free_pid_ns); > > greg k-h > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/