On Tue, Mar 05, 2019 at 12:45:34PM -0800 bseg...@google.com wrote: > Phil Auld <pa...@redhat.com> writes: > > > Interestingly, if I limit the number of child cgroups to the number of > > them I'm actually putting processes into (16 down from 2500) the problem > > does not reproduce. > > That is indeed interesting, and definitely not something we'd want to > matter. (Particularly if it's not root->a->b->c...->throttled_cgroup or > root->throttled->a->...->thread vs root->throttled_cgroup, which is what > I was originally thinking of) >
The locking may be a red herring. The setup is root->throttled->a where a is 1-2500. There are 4 threads in each of the first 16 a groups. The parent, throttled, is where the cfs_period/quota_us are set. I wonder if the problem is the walk_tg_tree_from() call in unthrottle_cfs_rq(). The distribute_cfg_runtime looks to be O(n * m) where n is number of throttled cfs_rqs and m is the number of child cgroups. But I'm not completely clear on how the hierarchical cgroups play together here. I'll pull on this thread some. Thanks for your input. Cheers, Phil --