On Sun, Jun 23, 2019 at 08:29:21PM -0700, Alexei Starovoitov wrote: > On 6/23/19 7:30 PM, Roman Gushchin wrote: > > Since commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf > > from cgroup itself"), cgroup_bpf release occurs asynchronously > > (from a worker context), and before the release of the cgroup itself. > > > > This introduced a previously non-existing race between the release > > and update paths. E.g. if a leaf's cgroup_bpf is released and a new > > bpf program is attached to the one of ancestor cgroups at the same > > time. The race may result in double-free and other memory corruptions. > > > > To fix the problem, let's protect the body of cgroup_bpf_release() > > with cgroup_mutex, as it was effectively previously, when all this > > code was called from the cgroup release path with cgroup mutex held. > > > > Also make sure, that we don't leave already freed pointers to the > > effective prog arrays. Otherwise, they can be released again by > > the update path. It wasn't necessary before, because previously > > the update path couldn't see such a cgroup, as cgroup_bpf and cgroup > > itself were released together. > > I thought dying cgroup won't have any children cgroups ?
It's not completely true, a dying cgroup can't have living children. > It should have been empty with no tasks inside it? Right. > Only some resources are still held? Right. > mutex and zero init are highly suspicious. > It feels that cgroup_bpf_release is called too early. An alternative solution is to bump the refcounter on every update path, and explicitly skip de-bpf'ed cgroups. > > Thinking from another angle... if child cgroups can still attach then > this bpf_release is broken. Hm, what do you mean under attach? It's not possible to attach a new prog, but if a prog is attached to a parent cgroup, a pointer can spill through "effective" array. But I agree, it's broken. Update path should ignore such cgroups (cgroups, which cgroup_bpf was released). I'll take a look. > The code should be > calling __cgroup_bpf_detach() one by one to make sure > update_effective_progs() is called, since descendant are still > sort-of alive and can attach? Not sure I get you. Dying cgroup is a leaf cgroup. > > My money is on 'too early'. > May be cgroup is not dying ? > Just cgroup_sk_free() is called on the last socket and > this auto-detach logic got triggered incorrectly? So, once again, what's my picture: A A/B A/B/C cpu1: cpu2: rmdir C attach new prog to A C got dying update A, update B, update C... C's cgroup_bpf is released C's effective progs is replaced with new one old is double freed It looks like it can be reproduced without any sockets. Thanks!