Nathan,

On Tue, 25 Jun 2019, Nathan Chancellor wrote:
> On Tue, Jun 25, 2019 at 09:53:09PM +0200, Thomas Gleixner wrote:
> > 
> > But can the script please check for a minimal clang version required to
> > build that thing.
> > 
> > The default clang-3.8 which is installed on Debian stretch explodes. The
> > 6.0 variant from backports works as advertised.
> > 
> 
> Hmmm interesting, I test a lot of different distros using Docker
> containers to make sure the script works universally and that includes
> Debian stretch, which is the stress tester because all of the packages
> are older. I install the following packages then run the following
> command and it works fine for me (just tested):
> 
> $ apt update && apt install -y --no-install-recommends ca-certificates \
> ccache clang cmake curl file gcc g++ git make ninja-build python3 \
> texinfo zlib1g-dev
> $ ./build-llvm.py
> 
> If you could give me a build log, I'd be happy to look into it and see
> what I can do.

I can produce one tomorrow.
 
> > Kernel builds with the new shiny compiler. Jump labels seem to be enabled.
> > 
> > It complains about a few type conversions:
> > 
> >  arch/x86/kvm/mmu.c:4596:39: warning: implicit conversion from 'int' to 
> > 'u8' (aka 'unsigned char') changes value from -205 to 51 
> > [-Wconstant-conversion]
> >                 u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
> >                    ~~                               ^~
> > 
> 
> Yes, there was a patch sent to try and fix this but it was rejected by
> the maintainers:
> 
> https://github.com/ClangBuiltLinux/linux/issues/95
> 
> https://lore.kernel.org/lkml/20180619192504.180479-1-...@chromium.org/

Just looked through it. I don't think it's an outright reject. Paolo was
not totally against it and then the whole discussion degraded into bikeshed
painting and bitching about compiler error messaged. Try again or should I?

> > but it also makes objtool unhappy:
> > 
> >  arch/x86/events/intel/core.o: warning: objtool: 
> > intel_pmu_nhm_workaround()+0xb3: unreachable
instruction
> >  kernel/fork.o: warning: objtool: free_thread_stack()+0x126: unreachable 
> > instruction
> >  mm/workingset.o: warning: objtool: count_shadow_nodes()+0x11f: unreachable 
> > instruction
> >  arch/x86/kernel/cpu/mtrr/generic.o: warning: objtool: 
> > get_fixed_ranges()+0x9b: unreachable
instruction
> >  arch/x86/kernel/platform-quirks.o: warning: objtool: 
> > x86_early_init_platform_quirks()+0x84:
unreachable instruction
> >  drivers/iommu/irq_remapping.o: warning: objtool: 
> > irq_remap_enable_fault_handling()+0x1d:
unreachable instruction

> Unfortunately, we have quite a few of those outstanding, it's probably
> time to start really taking a look at them:
> 
> https://github.com/ClangBuiltLinux/linux/labels/objtool

I just checked two of them in the disassembly. In both cases it's jump
label related. Here is one:

      asm volatile("1: rdmsr\n"
 410:   b9 59 02 00 00          mov    $0x259,%ecx
 415:   0f 32                   rdmsr
 417:   49 89 c6                mov    %rax,%r14
 41a:   48 89 d3                mov    %rdx,%rbx
      return EAX_EDX_VAL(val, low, high);
 41d:   48 c1 e3 20             shl    $0x20,%rbx
 421:   48 09 c3                or     %rax,%rbx
 424:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
 429:   eb 0f                   jmp    43a <get_fixed_ranges+0xaa>
      do_trace_read_msr(msr, val, 0);
 42b:   bf 59 02 00 00          mov    $0x259,%edi   <------- "unreachable"
 430:   48 89 de                mov    %rbx,%rsi
 433:   31 d2                   xor    %edx,%edx
 435:   e8 00 00 00 00          callq  43a <get_fixed_ranges+0xaa>
 43a:   44 89 35 00 00 00 00    mov    %r14d,0x0(%rip)        # 441 
<get_fixed_ranges+0xb1>

Interestingly enough there are some more hunks of the same pattern in that
function which look all the same. Those are not upsetting objtool. Josh
might give an hint where to stare at.

Just for the fun of it I looked at the GCC output of the same file. It
takes a different apporach:

      asm volatile("1: rdmsr\n"
 c70:   b9 59 02 00 00          mov    $0x259,%ecx
 c75:   0f 32                   rdmsr
      return EAX_EDX_VAL(val, low, high);
 c77:   48 c1 e2 20             shl    $0x20,%rdx
 c7b:   48 89 d3                mov    %rdx,%rbx
 c7e:   48 09 c3                or     %rax,%rbx
 c81:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
 c86:   48 89 1d 00 00 00 00    mov    %rbx,0x0(%rip)        # c8d 
<get_fixed_ranges.constprop.5+0x7d>

and the tracing code is completely out of line:

      do_trace_read_msr(msr, val, 0);
 ce2:   31 d2                   xor    %edx,%edx
 ce4:   48 89 de                mov    %rbx,%rsi
 ce7:   bf 59 02 00 00          mov    $0x259,%edi
 cec:   e8 00 00 00 00          callq  cf1 <get_fixed_ranges.constprop.5+0xe1>
 cf1:   eb 93                   jmp    c86 <get_fixed_ranges.constprop.5+0x76>

which makes a lot of sense as the normal path (tracepoint disabled) just
runs through linearly while in the clang version it has to jump around the
tracepoint code.

The jump itself is not a problem, but what matters is the $I cache
footprint. The GCC version hotpath fits in 3 cache lines while the Clang
version unconditionally eats 4.2 of them. That's a huge difference.

> Thanks for trying it out and letting us know. Please keep us in the loop
> if you happen to find anything amiss.

Will do.

Thanks,

        tglx

Reply via email to