On Thu, Jun 09, 2016 at 03:07:50PM +0200, Peter Zijlstra wrote: > Which given the lack of serialization, and the code generated from > update_cfs_rq_load_avg() is entirely possible. > > if (atomic_long_read(&cfs_rq->removed_load_avg)) { > s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); > sa->load_avg = max_t(long, sa->load_avg - r, 0); > sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); > removed_load = 1; > } > > turns into: > > ffffffff81087064: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax > ffffffff8108706b: 48 85 c0 test %rax,%rax > ffffffff8108706e: 74 40 je ffffffff810870b0 > <update_blocked_averages+0xc0> > ffffffff81087070: 4c 89 f8 mov %r15,%rax > ffffffff81087073: 49 87 85 98 00 00 00 xchg %rax,0x98(%r13) > ffffffff8108707a: 49 29 45 70 sub %rax,0x70(%r13) > ffffffff8108707e: 4c 89 f9 mov %r15,%rcx > ffffffff81087081: bb 01 00 00 00 mov $0x1,%ebx > ffffffff81087086: 49 83 7d 70 00 cmpq $0x0,0x70(%r13) > ffffffff8108708b: 49 0f 49 4d 70 cmovns 0x70(%r13),%rcx > > Which you'll note ends up with sa->load_avg -= r in memory at > ffffffff8108707a. Surprised. I actually tweaked a little bit on this, but haven't had desirable generated codes. Any compiler expert can shed some light on it?
> Ludicrous code generation if you ask me; I'd have expected something > like (note, r15 holds 0): > > mov %r15, %rax > xchg %rax, cfs_rq->removed_load_avg > mov sa->load_avg, %rcx > sub %rax, %rcx > cmovs %r15, %rcx > mov %rcx, sa->load_avg > > Adding the serialization (to _both_ call sites) should fix this. Absolutely, both :) I am going to remove the group entity post util initialization soon in the flat hierarchical util implementation, it is not used anywhere.