On Mon 25-01-16 12:02:06, Christoph Lameter wrote:
> On Mon, 25 Jan 2016, Michal Hocko wrote:
> 
> > On Sat 23-01-16 17:21:55, Mike Galbraith wrote:
> > > Hi Christoph,
> > >
> > > While you're fixing that commit up, can you perhaps find a better home
> > > for quiet_vmstat()?  It not only munches cycles when switching cross
> > > -core mightily, for -rt it injects a sleeping lock into the idle task.
> > >
> > >     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
> > >      4.75%  [kernel]       [k] __schedule
> > >      4.70%  [kernel]       [k] mutex_unlock
> > >      3.14%  [kernel]       [k] __switch_to
> >
> > Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have
> > such a large footprint. I guess this would be just an expensive noop
> > because we have to check all the zones*counters and do an expensive
> > this_cpu_xchg. Is the whole deferred thing worth this overhead?
> 
> Why would the deferring cause this overhead?

I guess the profile speaks for itself, doesn't it?

> Also there is no cross core activity from quiet_vmstat(). It simply
> disables the local vmstat updates.

It doesn't go cross core but it still does nr_zones * counters atomic
ops.

> > Unless there is a clear and huge win from doing the vmstat update
> > deferrable then I think a revert is more appropriate IMHO.
> 
> It reduces the OS events that the application experiences by folding it
> into the tick events. If its not deferrable then a timer event will be
> generated in addition to the tick. We do not want that.

Yes this is what I have read in the changelog. But "how much" part is
really missing. Is this even quantifiable?

> Workqueues are used in many places. If RT can sleep within workqueue
> management functions then spinlocks cannot be taken anymore and there may
> be issues with preemption.

RT can sleep in _any_ spinlock except for raw spin locks. Even though
the !RT kernel is not sleeping doesn't really matter much because
cancel_delayed_work is quite a heavy function which shouldn't be called
from the idle context AFAIU. Sure most of the time it will boil down to
del_timer but it can hit the slowpath as well if the timer got migrated
to a different CPU and we have to race with the WQ pool management IIUC.

Maybe this overhead can be reduced by outsourcing the functionality to
vmstat_shepherd which can check idle CPUs, cancel the timer for them
update the differentials and put them to cpu_stat_off? 

> The regression that I know of (independent of "RT") is due as far as I
> know due to the switch of the parameters of some vmstat functions to 64
> bit instead of 32 bit.

I am not sure I am following.

-- 
Michal Hocko
SUSE Labs

Reply via email to