On Tue, Jan 21, 2014 at 04:03:53PM +0530, Viresh Kumar wrote:
> On 20 January 2014 21:21, Frederic Weisbecker <fweis...@gmail.com> wrote:
> > I fear you can't. If you schedule a timer in 4 seconds away and your 
> > clockdevice
> > can only count up to 2 seconds, you can't help much the interrupt in the 
> > middle to
> > cope with the overflow.
> >
> > So you need to act on the source of the timer:
> >
> > * identify what cause this timer
> > * try to turn that feature off
> > * if you can't then move the timer to the housekeeping CPU
> 
> So, the main problem in my case was caused by this:
> 
>            <...>-2147  [001] d..2   302.573881: hrtimer_start:
> hrtimer=c172aa50 function=tick_sched_timer expires=602075000000
> softexpires=602075000000
>
> I have mentioned this earlier when I sent you attachments. I think
> this is somehow
> tied with the NO_HZ_FULL stuff? As the timer is queued for 300 seconds after
> current time.
> 
> How to get this out?

So it's scheduled away 300 seconds later. It might be a pending timer_list. 
Enabling the
timer tracepoints may give you some clues.

> 
> > I'll have a look into the latter point to affine global timers to the
> > housekeeping CPU. Per cpu timers need more inspection though. Either we 
> > rework
> > them to be possibly handled by remote/housekeeping CPUs, or we let the 
> > associate feature
> > to be turned off. All in one it's a case by case work.
> 
> Which CPUs are housekeeping CPUs? How do we declare them?

It's not yet implemented, but it's an idea (partly from Thomas) of something we 
can do to
define some general policy on various periodic/async work affinity to enforce 
isolation.

The basic idea is to define the CPU handling the timekeeping duty to be the 
housekeeping
CPU. Given that CPU must keep a periodic tick, lets move all the unbound timers 
and
workqueues there. And also try to move some CPU affine work as well. For example
we could handle the scheduler tick of the full dynticks CPUs into that 
housekeeping
CPU, at a low freqency. This way we could remove that 1 second scheduler tick 
max deferment
per CPU. It may be an overkill though to run all the scheduler ticks on a 
single CPU so there
may be other ways to cope with that.

And I would like to keep that housekeeping notion flexible enough to be 
extendable on more
than one CPU, as I heard that some people plan to reserve one CPU per node on 
big
NUMA machines for such a purpose. So that could be a cpumask, augmented with an 
infrastructure.

Of course, if some people help contributing in this area, some things may 
eventually move foward
on the support of CPU isolation. I can't do that all alone, at least not 
quickly, given all the
things already pending in my queue (fix buggy nohz iowait accounting, support 
RCU full sysidle detection,
apply AMD range breakpoints patches, further cleanup posix cpu timers, etc...).

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to