------- Comment #4 from law at redhat dot com 2009-09-29 21:55 ------- Subject: Re: GCC choosing poor code sequence for certain stores (x86)
On 09/29/09 15:18, rth at gcc dot gnu dot org wrote: > ------- Comment #3 from rth at gcc dot gnu dot org 2009-09-29 21:18 ------- > There are already peepholes for this, though the condition appears to be > slightly wrong for -Os. See i386.md:21121 : > > (define_peephole2 > [(match_scratch:SI 1 "r") > (set (match_operand:SI 0 "memory_operand" "") > (const_int 0))] > "optimize_insn_for_speed_p () > && ! TARGET_USE_MOV0 > && TARGET_SPLIT_LONG_MOVES > && get_attr_length (insn)>= ix86_cur_cost ()->large_insn > && peep2_regno_dead_p (0, FLAGS_REG)" > > Ah, yes, the flags register needs to be available. As for the condition, after reading optimization guides for the various x86 chips that mov $0, <mem> is generally going to be faster than xor temp, temp mov temp, <mem> So I was thinking we'd want something like this for the condition. ((optimize_insn_for_size_p () || (!TARGET_USE_MOV0 && TARGET_SPLIT_LONG_MOVES && get_attr_length (insn) >= ix86_cur_cost()->large_insn)) && peep2_regno_dead_p (0, FLAGS_REG) Which I think should always give us the xor sequence when optimizing for size or when optimizing for the odd x86 implementation where the xor sequence is faster. I can easily bundle that up as a patch if it looks right to you... Jeff -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505