On Thu, Sep 01, 2016 at 12:07:34PM +0200, Stanislaw Gruszka wrote: > On Thu, Sep 01, 2016 at 11:49:06AM +0200, Peter Zijlstra wrote: > > You're now making rather hot paths slower to benefit a rather slow path, > > that too is backwards. > > Ok, you have right, I made update_curr() slower (a bit I think, since > this new seqcount primitive should be in the same cache line as other > things).
seqcount adds 2 smp_wmb(), which on ARM, are not free (it is possible to do with just 1 FWIW). > But do we don't care about inconsistency of accessing of 64 bit variable > on 32 bit processors (see patch 3) ? I know this is unlikely scenario > to get inconsistency, but I assume it's still possible, or not? Its actually quite possible. We've observed it a fair few times. 64bit variables are 2 32bit stores/loads and getting interleaved data is quite possible. > If not, I can get rid of read_sum_exec_runtime() and just read > sum_exec_runtime without task_rq_lock() protection on > thread_group_cputime() . That would make the benchmark happy. I think this benchmark is misguided. Just accept that O(nr_threads) is expensive, same with process wide itimer, just don't use them when you care about performance.