>
>
> On 12/12/2021 3:49 AM, Jan Hubicka via Gcc-patches wrote:
> > Hi,
> > As discussed in the PR, we miss some optimization becuase
> > gimple-ssa-isolate-paths turns NULL memory accesses to volatile and adds
> > __builtin_trap after them. This is seen as a side-effect by IPA analysis
> > and additionally the (fully unreachable) builtin_trap is believed to load
> > all global memory.
> >
> > I think we should think of less intrusive gimple representation of this, but
> > it is also easy enough to special case that in IPA analysers as done in
> > this patch. This is a win even if we improve the representation since
> > gimple-ssa-isolate-paths is run late and this way we improve optimization
> > early.
> So what's important about the IR representation is that we keep some kind of
> memory access around (so that it will fault), that the memory access
> reference a minimal amount of other stuff (we don't want the address
> computation to keep anything live for example) and that we have a subsequent
> trap so that the CFG routines know it traps.
>
> Otherwise we're free to do whatever we want.
I was thinking about __builtin_trap_load (ptr) and __builtin_trap_store (ptr)
which we could annotate correctly for PTA and avoid need for two
statements, but I am not sure if this is good idea.
coioncidentally I just ran across another artifact of this. One of
hottest functions in clang in getTerminator. Clang builds it as:
│ 000000000122f4b0 <llvm::BasicBlock::getTerminator() const>:
│ llvm::BasicBlock::getTerminator() const:
2.16 │ mov 0x28(%rdi),%rax
28.85 │ add $0x28,%rdi
0.20 │ cmp %rax,%rdi
3.55 │ ↓ je 29
│ lea -0x18(%rax),%rcx
0.79 │ test %rax,%rax
│ cmove %rax,%rcx
4.15 │ movzbl 0x10(%rcx),%edx
48.46 │ add $0xffffffe5,%edx
3.95 │ xor %eax,%eax
0.20 │ cmp $0xb,%edx
3.35 │ cmovb %rcx,%rax
4.33 │ ← ret
│29: xor %eax,%eax
│ ← ret
While we do:
│
│ 0000000001471840 <llvm::BasicBlock::getTerminator() const>:
│ llvm::BasicBlock::getTerminator() const:
3.24 │ mov 0x28(%rdi),%rax
31.31 │ add $0x28,%rdi
0.59 │ cmp %rdi,%rax
5.31 │ ↓ je 30
│ test %rax,%rax
2.36 │ → je 9366f4 <llvm::BasicBlock::getTerminator() const [clone
.cold]>
│ movzbl -0x8(%rax),%edx
45.73 │ sub $0x18,%rax
│ sub $0x1b,%edx
4.70 │ cmp $0xb,%edx
3.53 │ mov $0x0,%edx
│ cmovae %rdx,%rax
3.23 │ ← ret
│ xchg %ax,%ax
│30: xor %eax,%eax
│ ← ret
....
│
│ 00000000009366f4 <llvm::BasicBlock::getTerminator() const [clone
.cold]>:
│ llvm::BasicBlock::getTerminator() const [clone .cold]:
│ movzbl 0x10,%eax
│ ud2
The clang generated code obviously conditionally loads NULL pointer, but
replacing this cmov by conditional jump to cold section seems overkill.
Loading 10 into eax seems useless...
I think this is common pattern in C++ code originating from cast with
multiple inheritance. I would vote towards optimizing out the conditial
move in this case and I think it is correct.
Honza