On Thu, Dec 27, 2018 at 5:15 PM Tejun Heo <t...@kernel.org> wrote: > > I'm pretty sure enqueue_entity() *has* to be called with rq lock. > unthrottle_cfs_rq() is called from tg_set_cfs_bandwidth(), > distribute_cfs_runtime() and unthrottle_offline_cfs_rqs. The first > two grabs the rq_lock just around the calls and the last one has a > lockdep assert on the rq_lock. What am I missing?
No, I think you're right, and I just didn't follow things deep enough, didn't see any rq locking in the loop in unthrottle_offline_cfs_rqs(), and didn't realize that the rq is locked by the caller. > > But that still makes me go "how come is this only noticed 18 months > > after the fact"? > > Unless I'm totally confused, which is definitely possible, I don't > think there's a race condition and the only bug is the > tmp_alone_branch pointer getting dangled, which maybe doesn't happen > all that much? Ahh. That would explain the list corruption. The next list_add_leaf_cfs_rq() could try to add to a removed entry. How would you reset it? Do something like rq->tmp_alone_branch = &rq->leaf_cfs_rq_list; for every removal, or make it conditional on it matching the removed entry? Linus