> On Feb 27, 2019, at 9:57 AM, Nadav Amit <na...@vmware.com> wrote: > >> On Feb 27, 2019, at 8:14 AM, Linus Torvalds <torva...@linux-foundation.org> >> wrote: >> >> On Wed, Feb 27, 2019 at 2:16 AM Peter Zijlstra <pet...@infradead.org> wrote: >>> Nadav Amit reported that commit: >>> >>> b59167ac7baf ("x86/percpu: Fix this_cpu_read()") >>> >>> added a bunch of constraints to all sorts of code; and while some of >>> that was correct and desired, some of that seems superfluous. >> >> Trivial (but entirely untested) patch attached. >> >> That said, I didn't actually check how it affects code generation. >> Nadav, would you check the code sequences you originally noticed? > > The original issue was raised while I was looking into a dropped patch of > Matthew Wilcox that caused code size increase [1]. As a result I noticed > that Peter’s patch caused big changes to the generated assembly across the > kernel - I did not have a specific scenario that I cared about. > > The patch you sent (“+m/-volatile”) does increase the code size by 1728 > bytes. Although code size is not the only metric for “code optimization”, > the original patch of Peter (“volatile”) only increased the code size by 201 > bytes. Peter’s original change also affected only 72 functions vs 228 that > impacted by the new patch. > > I’ll have a look at some specific function assembly, but overall, the “+m” > approach might prevent even more code optimizations than the “volatile” one. > > I’ll send an example or two later.
Here is one example: Dump of assembler code for function event_filter_pid_sched_wakeup_probe_pre: 0xffffffff8117c510 <+0>: push %rbp 0xffffffff8117c511 <+1>: mov %rsp,%rbp 0xffffffff8117c514 <+4>: push %rbx 0xffffffff8117c515 <+5>: mov 0x28(%rdi),%rax 0xffffffff8117c519 <+9>: mov %gs:0x78(%rax),%dl 0xffffffff8117c51d <+13>: test %dl,%dl 0xffffffff8117c51f <+15>: je 0xffffffff8117c535 <event_filter_pid_sched_wakeup_probe_pre+37> 0xffffffff8117c521 <+17>: mov %rdi,%rax 0xffffffff8117c524 <+20>: mov 0x78(%rdi),%rdi 0xffffffff8117c528 <+24>: mov 0x28(%rax),%rbx # REDUNDANT 0xffffffff8117c52c <+28>: callq 0xffffffff81167830 <trace_ignore_this_task> 0xffffffff8117c531 <+33>: mov %al,%gs:0x78(%rbx) 0xffffffff8117c535 <+37>: pop %rbx 0xffffffff8117c536 <+38>: pop %rbp 0xffffffff8117c537 <+39>: retq The instruction at 0xffffffff8117c528 is redundant, and does not exist without the recent patch. It seems to be a result of no-strict-aliasing, which due to the new "memory write” (“+m”) causes the compiler to re-read the data.