On Thu, Jun 09, 2016 at 03:07:50PM +0200, Peter Zijlstra wrote:
> Which given the lack of serialization, and the code generated from
> update_cfs_rq_load_avg() is entirely possible.
> 
>       if (atomic_long_read(&cfs_rq->removed_load_avg)) {
>               s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
>               sa->load_avg = max_t(long, sa->load_avg - r, 0);
>               sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
>               removed_load = 1;
>       }
> 
> turns into:
> 
> ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
> ffffffff8108706b:       48 85 c0                test   %rax,%rax
> ffffffff8108706e:       74 40                   je     ffffffff810870b0 
> <update_blocked_averages+0xc0>
> ffffffff81087070:       4c 89 f8                mov    %r15,%rax
> ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
> ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
> ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
> ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
> ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
> ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
> 
> Which you'll note ends up with sa->load_avg -= r in memory at
> ffffffff8108707a.
 
Surprised. I actually tweaked a little bit on this, but haven't had desirable
generated codes. Any compiler expert can shed some light on it?

> Ludicrous code generation if you ask me; I'd have expected something
> like (note, r15 holds 0):
> 
>       mov     %r15, %rax
>       xchg    %rax, cfs_rq->removed_load_avg
>       mov     sa->load_avg, %rcx
>       sub     %rax, %rcx
>       cmovs   %r15, %rcx
>       mov     %rcx, sa->load_avg
> 
> Adding the serialization (to _both_ call sites) should fix this.
 
Absolutely, both :)

I am going to remove the group entity post util initialization soon in the
flat hierarchical util implementation, it is not used anywhere.

Reply via email to