On Thu, Dec 27, 2018 at 5:15 PM Tejun Heo <t...@kernel.org> wrote:
>
> I'm pretty sure enqueue_entity() *has* to be called with rq lock.
> unthrottle_cfs_rq() is called from tg_set_cfs_bandwidth(),
> distribute_cfs_runtime() and unthrottle_offline_cfs_rqs.  The first
> two grabs the rq_lock just around the calls and the last one has a
> lockdep assert on the rq_lock.  What am I missing?

No, I think you're right, and I just didn't follow things deep enough,
didn't see any rq locking in the loop in unthrottle_offline_cfs_rqs(),
and didn't realize that the rq is locked by the caller.

> > But that still makes me go "how come is this only noticed 18 months
> > after the fact"?
>
> Unless I'm totally confused, which is definitely possible, I don't
> think there's a race condition and the only bug is the
> tmp_alone_branch pointer getting dangled, which maybe doesn't happen
> all that much?

Ahh. That would explain the list corruption. The next
list_add_leaf_cfs_rq() could try to add to a removed entry.

How would you reset it? Do something like

       rq->tmp_alone_branch = &rq->leaf_cfs_rq_list;

for every removal, or make it conditional on it matching the removed entry?

            Linus

Reply via email to