On Tue, Jan 8, 2019 at 12:43 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Tue, Jan 08, 2019 at 11:49:10AM +0100, Uros Bizjak wrote: > > FLD from memory in SF and DFmode is considered a conversion, and > > converts sNaN to NaN (and emits #IA exception). But sNaN handling is > > already busted in the compiler as RA is free to spill the register in > > non-XFmode. IMO, the peephole2 pattern is no worse than the current > > situation. > > Ok. > > > At least for x86, there are no SUBREGs after reload, otherwise other > > parts of the compiler would break. > > The new patch would really handle even a SUBREG there... > > > > I don't see how, that would mean I'd have to write two peephole2s instead > > > of > > > one. It tries to deal with two different cases, one is where the > > > temporary > > > reg is dead, in that case we can optimize away both the load or store, the > > > second case is where the temporary reg isn't dead, in that case we can > > > optimize away the store, but not the load. With the optimizing away of > > > both > > > load and store I was just trying to do a cheap DCE there. > > > > I didn't realize this is an optimization, a comment would be welcome here. > > Ugh, except that it doesn't work. peep2_reg_dead_p (1, operands[0]) > is not what I meant, that is always false, as the register must be live in > between the first and second instruction. I meant > peep2_reg_dead_p (2, operands[0]), the register dead at the end of the > second instruction, except we don't really support > define_split/define_peephole2 splitting into zero instructions, DONE; in > that case returns NULL like FAIL; does. So, let's just wait for DCE to > finish it up. > > Here is what I'll bootstrap/regtest then. Added also > reg_overlap_mentioned_p, in case there is e.g. > movl (%eax,%edx), %eax > movl %eax, (%eax,%edx)
I doubt this would *ever* happen, but ... OK. > or similar and as I said earlier, explicit match_operand so that I can > check MEM_VOLATILE_P on both MEMs. > > 2019-01-08 Jakub Jelinek <ja...@redhat.com> > > PR rtl-optimization/79593 > * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2. OK for mainline. Thanks, Uros. > --- gcc/config/i386/i386.md.jj 2019-01-07 23:54:54.494800693 +0100 > +++ gcc/config/i386/i386.md 2019-01-08 12:34:18.916832780 +0100 > @@ -18740,6 +18740,18 @@ (define_peephole2 > const0_rtx); > }) > > +;; Attempt to optimize away memory stores of values the memory already > +;; has. See PR79593. > +(define_peephole2 > + [(set (match_operand 0 "register_operand") > + (match_operand 1 "memory_operand")) > + (set (match_operand 2 "memory_operand") (match_dup 0))] > + "!MEM_VOLATILE_P (operands[1]) > + && !MEM_VOLATILE_P (operands[2]) > + && rtx_equal_p (operands[1], operands[2]) > + && !reg_overlap_mentioned_p (operands[0], operands[2])" > + [(set (match_dup 0) (match_dup 1))]) > + > ;; Attempt to always use XOR for zeroing registers (including FP modes). > (define_peephole2 > [(set (match_operand 0 "general_reg_operand") > > > Jakub