On Tue, 2012-12-11 at 13:43 +0100, Thomas Gleixner wrote:
> On Mon, 10 Dec 2012, Steven Rostedt wrote:
> > On Mon, 2012-12-10 at 17:15 -0800, Frank Rowand wrote:
> > 
> > > I should have also mentioned some previous experience using IPIs to
> > > avoid runq lock contention on wake up.  Someone encountered IPI
> > > storms when using the TTWU_QUEUE feature, thus it defaults to off
> > > for CONFIG_PREEMPT_RT_FULL:
> > > 
> > >   #ifndef CONFIG_PREEMPT_RT_FULL
> > >   /*
> > >    * Queue remote wakeups on the target CPU and process them
> > >    * using the scheduler IPI. Reduces rq->lock contention/bounces.
> > >    */
> > >   SCHED_FEAT(TTWU_QUEUE, true)
> > >   #else
> > >   SCHED_FEAT(TTWU_QUEUE, false)
> > > 
> > 
> > Interesting, but I'm wondering if this also does it for every wakeup? If
> > you have 1000 tasks waking up on another CPU, this could potentially
> > send out 1000 IPIs. The number of IPIs here looks to be # of tasks
> > waking up, and perhaps more than that, as there could be multiple
> > instances that try to wake up the same task.
> 
> Not using the TTWU_QUEUE feature limits the IPIs to a single one,
> which is only sent if the newly woken task preempts the current task
> on the remote cpu and the NEED_RESCHED flag was not yet set.
>  
> With TTWU_QUEUE you can induce massive latencies just by starting
> hackbench. You get a herd wakeup on CPU0 which then enqueues hundreds
> of tasks to the remote pull list and sends IPIs. The remote CPUs pulls
> the tasks and activate them on their runqueue in hard interrupt
> context. That easiliy can accumulate to hundreds of microseconds when
> you do a mass push of newly woken tasks.
> 
> Of course it avoids fiddling with the remote rq lock, but it becomes
> massivly non deterministic.

Agreed. I never suggested to use TTWU_QUEUE. I was just stating the
difference between that and my patches.

> 
> > Now this patch set, the # of IPIs is limited to the # of CPUs. If you
> > have 4 CPUs, you'll get a storm of 3 IPIs. That's a big difference.
> 
> Yeah, the big difference is that you offload the double lock to the
> IPI. So in the worst case you interrupt the most latency sensitive
> task running on the remote CPU. Not sure if I really like that
> "feature".
>  

First, the pulled CPU isn't necessarily running the most latency
sensitive task. It just happens to be running more than one RT task, and
the waiting RT task can migrate. The running task may be of the same
priority as the waiting task. And they both may be the lowest priority
RT tasks in the system, and a CPU just went idle.

Currently, what we have is a huge contention on both the pulled CPU rq
lock. We've measured over 500us latencies due to it. This hurts even the
CPU that has the overloaded task, as the contention is on its lock.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to