On Mon 25-01-16 12:02:06, Christoph Lameter wrote: > On Mon, 25 Jan 2016, Michal Hocko wrote: > > > On Sat 23-01-16 17:21:55, Mike Galbraith wrote: > > > Hi Christoph, > > > > > > While you're fixing that commit up, can you perhaps find a better home > > > for quiet_vmstat()? It not only munches cycles when switching cross > > > -core mightily, for -rt it injects a sleeping lock into the idle task. > > > > > > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > > > 4.75% [kernel] [k] __schedule > > > 4.70% [kernel] [k] mutex_unlock > > > 3.14% [kernel] [k] __switch_to > > > > Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have > > such a large footprint. I guess this would be just an expensive noop > > because we have to check all the zones*counters and do an expensive > > this_cpu_xchg. Is the whole deferred thing worth this overhead? > > Why would the deferring cause this overhead?
I guess the profile speaks for itself, doesn't it? > Also there is no cross core activity from quiet_vmstat(). It simply > disables the local vmstat updates. It doesn't go cross core but it still does nr_zones * counters atomic ops. > > Unless there is a clear and huge win from doing the vmstat update > > deferrable then I think a revert is more appropriate IMHO. > > It reduces the OS events that the application experiences by folding it > into the tick events. If its not deferrable then a timer event will be > generated in addition to the tick. We do not want that. Yes this is what I have read in the changelog. But "how much" part is really missing. Is this even quantifiable? > Workqueues are used in many places. If RT can sleep within workqueue > management functions then spinlocks cannot be taken anymore and there may > be issues with preemption. RT can sleep in _any_ spinlock except for raw spin locks. Even though the !RT kernel is not sleeping doesn't really matter much because cancel_delayed_work is quite a heavy function which shouldn't be called from the idle context AFAIU. Sure most of the time it will boil down to del_timer but it can hit the slowpath as well if the timer got migrated to a different CPU and we have to race with the WQ pool management IIUC. Maybe this overhead can be reduced by outsourcing the functionality to vmstat_shepherd which can check idle CPUs, cancel the timer for them update the differentials and put them to cpu_stat_off? > The regression that I know of (independent of "RT") is due as far as I > know due to the switch of the parameters of some vmstat functions to 64 > bit instead of 32 bit. I am not sure I am following. -- Michal Hocko SUSE Labs