On Wed, 2016-02-17 at 11:00 +0900, Byungchul Park wrote: > On Tue, Feb 16, 2016 at 04:41:39PM -0800, Greg KH wrote: > > On Wed, Feb 17, 2016 at 09:11:03AM +0900, Byungchul Park wrote: > > > On Tue, Feb 16, 2016 at 09:42:12AM -0800, Greg KH wrote: > > > > On Tue, Feb 16, 2016 at 09:44:35AM +0100, Peter Zijlstra wrote: > > > > > On Tue, Feb 16, 2016 at 04:08:37PM +0900, Byungchul Park wrote: > > > > > > On Mon, Jan 25, 2016 at 04:25:03PM +0900, Byungchul Park wrote: > > > > > > > On Tue, Jan 05, 2016 at 10:14:44AM +0100, Peter Zijlstra wrote: > > > > > > > > So the reason I didn't mark them for stable is that they were > > > > > > > > non > > > > > > > > trivial, however they've been in for a while now and nothing > > > > > > > > broke, so I > > > > > > > > suppose backporting them isn't a problem. > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > What do you think about the way to solve this oops problem? Could > > > > > > > you just > > > > > > > give your opinion of the way? Or ack or nack about this > > > > > > > backporting? > > > > > > > > > > > > Or would it be better to create a new simple patch with which we > > > > > > can solve > > > > > > the oops problem, because your patch is too complicated to backport > > > > > > to > > > > > > stable tree? What do you think about that? > > > > > > > > > > I would prefer just backporting existing stuff, we know that works. > > > > > > > > > > A separate patch for stable doesn't make sense to me; you get extra > > > > > chances for fail and a divergent code-base. > > > > > > > > I agree, I REALLY don't want to take patches that are not > > > > identical-as-much-as-possible to what is in Linus's tree, because almost > > > > every time we do, the patch is broken in some way. > > > > > > I also agree and got it. Then could you check if this backporting is done > > > properly? > > > > What backporting of what to where by whom? > > > > Come on, someone needs to actually send in some patches, in the correct > > format, before anyone can do anything with them... > > I am sorry for not ccing you when I sent the patches at first. (I didn't > know I should do it.) There are the patches in this thread. Refer to, > > https://lkml.org/lkml/2016/1/5/60
Anybody wanting to fix up a < 3.14 kernel can use the below. sched: fix __sched_setscheduler() vs load balancing race __sched_setscheduler() may release rq->lock in pull_rt_task() as a task is being changed rt -> fair class. load balancing may sneak in, move the task behind __sched_setscheduler()'s back, which explodes in switched_to_fair() when the passed but no longer valid rq is used. Tell can_migrate_task() to say no if ->pi_lock is held. @stable: Kernels that predate SCHED_DEADLINE can use this simple (and tested) check in lieu of backport of the full 18 patch mainline treatment. Signed-off-by: Mike Galbraith <umgwanakikb...@gmail.com> --- kernel/sched/fair.c | 9 +++++++++ 1 file changed, 9 insertions(+) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4008,6 +4008,7 @@ int can_migrate_task(struct task_struct * 2) cannot be migrated to this CPU due to cpus_allowed, or * 3) running (obviously), or * 4) are cache-hot on their current CPU. + * 5) p->pi_lock is held. */ if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu)) return 0; @@ -4049,6 +4050,14 @@ int can_migrate_task(struct task_struct } /* + * rt -> fair class change may be in progress. If we sneak in should + * double_lock_balance() release rq->lock, and move the task, we will + * cause switched_to_fair() to meet a passed but no longer valid rq. + */ + if (raw_spin_is_locked(&p->pi_lock)) + return 0; + + /* * Aggressive migration if: * 1) task is cache cold, or * 2) too many balance attempts have failed.