On 6/23/19 7:30 PM, Roman Gushchin wrote: > Since commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf > from cgroup itself"), cgroup_bpf release occurs asynchronously > (from a worker context), and before the release of the cgroup itself. > > This introduced a previously non-existing race between the release > and update paths. E.g. if a leaf's cgroup_bpf is released and a new > bpf program is attached to the one of ancestor cgroups at the same > time. The race may result in double-free and other memory corruptions. > > To fix the problem, let's protect the body of cgroup_bpf_release() > with cgroup_mutex, as it was effectively previously, when all this > code was called from the cgroup release path with cgroup mutex held. > > Also make sure, that we don't leave already freed pointers to the > effective prog arrays. Otherwise, they can be released again by > the update path. It wasn't necessary before, because previously > the update path couldn't see such a cgroup, as cgroup_bpf and cgroup > itself were released together.
I thought dying cgroup won't have any children cgroups ? It should have been empty with no tasks inside it? Only some resources are still held? mutex and zero init are highly suspicious. It feels that cgroup_bpf_release is called too early. Thinking from another angle... if child cgroups can still attach then this bpf_release is broken. The code should be calling __cgroup_bpf_detach() one by one to make sure update_effective_progs() is called, since descendant are still sort-of alive and can attach? My money is on 'too early'. May be cgroup is not dying ? Just cgroup_sk_free() is called on the last socket and this auto-detach logic got triggered incorrectly?