On Wed, 2016-02-17 at 11:00 +0900, Byungchul Park wrote:
> On Tue, Feb 16, 2016 at 04:41:39PM -0800, Greg KH wrote:
> > On Wed, Feb 17, 2016 at 09:11:03AM +0900, Byungchul Park wrote:
> > > On Tue, Feb 16, 2016 at 09:42:12AM -0800, Greg KH wrote:
> > > > On Tue, Feb 16, 2016 at 09:44:35AM +0100, Peter Zijlstra wrote:
> > > > > On Tue, Feb 16, 2016 at 04:08:37PM +0900, Byungchul Park wrote:
> > > > > > On Mon, Jan 25, 2016 at 04:25:03PM +0900, Byungchul Park wrote:
> > > > > > > On Tue, Jan 05, 2016 at 10:14:44AM +0100, Peter Zijlstra wrote:
> > > > > > > > So the reason I didn't mark them for stable is that they were 
> > > > > > > > non
> > > > > > > > trivial, however they've been in for a while now and nothing 
> > > > > > > > broke, so I
> > > > > > > > suppose backporting them isn't a problem.
> > > > > > > 
> > > > > > > Hello,
> > > > > > > 
> > > > > > > What do you think about the way to solve this oops problem? Could 
> > > > > > > you just
> > > > > > > give your opinion of the way? Or ack or nack about this 
> > > > > > > backporting?
> > > > > > 
> > > > > > Or would it be better to create a new simple patch with which we 
> > > > > > can solve
> > > > > > the oops problem, because your patch is too complicated to backport 
> > > > > > to
> > > > > > stable tree? What do you think about that?
> > > > > 
> > > > > I would prefer just backporting existing stuff, we know that works.
> > > > > 
> > > > > A separate patch for stable doesn't make sense to me; you get extra
> > > > > chances for fail and a divergent code-base.
> > > > 
> > > > I agree, I REALLY don't want to take patches that are not
> > > > identical-as-much-as-possible to what is in Linus's tree, because almost
> > > > every time we do, the patch is broken in some way.
> > > 
> > > I also agree and got it. Then could you check if this backporting is done
> > > properly?
> > 
> > What backporting of what to where by whom?
> > 
> > Come on, someone needs to actually send in some patches, in the correct
> > format, before anyone can do anything with them...
> 
> I am sorry for not ccing you when I sent the patches at first. (I didn't
> know I should do it.) There are the patches in this thread. Refer to,
> 
> https://lkml.org/lkml/2016/1/5/60

Anybody wanting to fix up a < 3.14 kernel can use the below.

sched: fix __sched_setscheduler() vs load balancing race

__sched_setscheduler() may release rq->lock in pull_rt_task() as a task is
being changed rt -> fair class.  load balancing may sneak in, move the task
behind __sched_setscheduler()'s back, which explodes in switched_to_fair()
when the passed but no longer valid rq is used.  Tell can_migrate_task() to
say no if ->pi_lock is held.

@stable: Kernels that predate SCHED_DEADLINE can use this simple (and tested)
check in lieu of backport of the full 18 patch mainline treatment.

Signed-off-by: Mike Galbraith <umgwanakikb...@gmail.com>
---
 kernel/sched/fair.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4008,6 +4008,7 @@ int can_migrate_task(struct task_struct
         * 2) cannot be migrated to this CPU due to cpus_allowed, or
         * 3) running (obviously), or
         * 4) are cache-hot on their current CPU.
+        * 5) p->pi_lock is held.
         */
        if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
                return 0;
@@ -4049,6 +4050,14 @@ int can_migrate_task(struct task_struct
        }
 
        /*
+        * rt -> fair class change may be in progress.  If we sneak in should
+        * double_lock_balance() release rq->lock, and move the task, we will
+        * cause switched_to_fair() to meet a passed but no longer valid rq.
+        */
+       if (raw_spin_is_locked(&p->pi_lock))
+               return 0;
+
+       /*
         * Aggressive migration if:
         * 1) task is cache cold, or
         * 2) too many balance attempts have failed.

Reply via email to