Oleg Nesterov <o...@redhat.com> writes: > On 11/24, Eric W. Biederman wrote: >> >> Oleg Nesterov <o...@redhat.com> writes: >> >> > --- a/kernel/pid.c >> > +++ b/kernel/pid.c >> > @@ -320,7 +320,6 @@ struct pid *alloc_pid(struct pid_namespace *ns) >> > goto out_free; >> > } >> > >> > - get_pid_ns(ns); >> > atomic_set(&pid->count, 1); >> > for (type = 0; type < PIDTYPE_MAX; ++type) >> > INIT_HLIST_HEAD(&pid->tasks[type]); >> > @@ -336,7 +335,7 @@ struct pid *alloc_pid(struct pid_namespace *ns) >> > } >> > spin_unlock_irq(&pidmap_lock); >> > >> > -out: >> > + get_pid_ns(ns); >> >> Moving the label and changing the goto out logic is gratuitous confusing >> and I think it probably even generates worse code. >> >> Furthermore multiple exits make adding debugging code more difficult. > > Oh, I strongly disagree but I am not going to argue ;) cleanups are > always subjective, and I do believe in "maintainer is always right" > mantra. I can make v2 without this change.
Fair enough. My primary complaint was that you were changing the logic and fixing a bug at the same time. That added noise and made analysis of what was really going on much more difficult. >> Moving get_pid_ns down does close a leak in the error handling path. > > OK, good. > >> However at the moment my I can't figure out if it is safe to move >> get_pid_ns elow hlist_add_head_rcu. Because once we are on the rcu list >> the pid is findable, and being publicly visible with a bad refcount could >> cause >> problems. > > The caller has a reference, this ns can't go away. Obviously, otherwise > get_pid_ns(ns) is not safe. > > We need this get_pid_ns() to balance put_pid()->put_pid_ns() which obviously > won't be called until we return this pid, otherwise everything is wrong. > > So I think this should be safe? My concern is exposing a half initialized struct pid to the world via an rcu data structure. In particular could one of the rcu users get into trouble because we haven't called get_pid_ns yet? That is unclear to me. That is one of those weird nasty races I would rather not have to consider and moving the get_pid_ns after hlist_add requires that we think about it. To fix the error handling and avoid thinking about the races we have two choices: - In the error path that is currently called out_unlock we can drop the extra references. - Immediately after we perform the test that on error jumps to out_unlock we call get_pid_ns. My preference would be the first, as it is a trivially correct one line change. Aka I think this is the obviously correct trivial fix. out_unlock: spin_unlock_irq(&pidmap_lock); + put_pid_ns(ns); out_free: while (++i <= ns->level) free_pidmap(pid->numbers + i); Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/