[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #8 from Andrew Pinski --- This is a dup of bug 11877 which is now fixed on the trunk. *** This bug has been marked as a duplicate of bug 11877 ***
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #7 from law at redhat dot com 2009-09-30 14:47 --- Subject: Re: GCC choosing poor code sequence for certain stores (x86) On 09/30/09 03:22, jakub at gcc dot gnu dot org wrote: > --- Comment #6 from jakub at gcc dot gnu dot org 2009-09-30 09:22 --- > For x86-64 we perhaps want further checks for the size optimization - if the > scratch register is %r8d through %r15d, 3 byte xorl %r8d, %r8d and e.g. 3 byte > movl %r8d, (%rdx) won't be shorter than movl $0, (%rdx) which is 6 bytes). > And likely the 2 insns will be slower. > But if the address already needs rex prefix, it is still a win. > > > Do we have any good way to test if the address needs a rex prefix? I see the rex_prefix attribute in i386.md, but that's for testing an entire insn and based on my quick reading of i386.md it's not complete as many insns set the attribute explicitly. Jeff -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #6 from jakub at gcc dot gnu dot org 2009-09-30 09:22 --- For x86-64 we perhaps want further checks for the size optimization - if the scratch register is %r8d through %r15d, 3 byte xorl %r8d, %r8d and e.g. 3 byte movl %r8d, (%rdx) won't be shorter than movl $0, (%rdx) which is 6 bytes). And likely the 2 insns will be slower. But if the address already needs rex prefix, it is still a win. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #5 from rth at gcc dot gnu dot org 2009-09-29 23:43 --- Yeah, that looks right. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #4 from law at redhat dot com 2009-09-29 21:55 --- Subject: Re: GCC choosing poor code sequence for certain stores (x86) On 09/29/09 15:18, rth at gcc dot gnu dot org wrote: > --- Comment #3 from rth at gcc dot gnu dot org 2009-09-29 21:18 --- > There are already peepholes for this, though the condition appears to be > slightly wrong for -Os. See i386.md:21121 : > > (define_peephole2 >[(match_scratch:SI 1 "r") > (set (match_operand:SI 0 "memory_operand" "") > (const_int 0))] >"optimize_insn_for_speed_p () > && ! TARGET_USE_MOV0 > && TARGET_SPLIT_LONG_MOVES > && get_attr_length (insn)>= ix86_cur_cost ()->large_insn > && peep2_regno_dead_p (0, FLAGS_REG)" > > Ah, yes, the flags register needs to be available. As for the condition, after reading optimization guides for the various x86 chips that mov $0, is generally going to be faster than xor temp, temp mov temp, So I was thinking we'd want something like this for the condition. ((optimize_insn_for_size_p () || (!TARGET_USE_MOV0 && TARGET_SPLIT_LONG_MOVES && get_attr_length (insn) >= ix86_cur_cost()->large_insn)) && peep2_regno_dead_p (0, FLAGS_REG) Which I think should always give us the xor sequence when optimizing for size or when optimizing for the odd x86 implementation where the xor sequence is faster. I can easily bundle that up as a patch if it looks right to you... Jeff -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #3 from rth at gcc dot gnu dot org 2009-09-29 21:18 --- There are already peepholes for this, though the condition appears to be slightly wrong for -Os. See i386.md:21121 : (define_peephole2 [(match_scratch:SI 1 "r") (set (match_operand:SI 0 "memory_operand" "") (const_int 0))] "optimize_insn_for_speed_p () && ! TARGET_USE_MOV0 && TARGET_SPLIT_LONG_MOVES && get_attr_length (insn) >= ix86_cur_cost ()->large_insn && peep2_regno_dead_p (0, FLAGS_REG)" -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #2 from law at redhat dot com 2009-09-29 17:12 --- I don't understand your comment Richard. Isn't it just something like this? (define_peephole2 [(match_scratch:SI 2 "r") (set (match_operand:SI 0 "memory_operand" "") (match_operand:SI 1 "const_0_operand" ""))] "" [(set (match_dup 2) (match_dup 1)) (set (match_dup 0) (match_dup 2))] "") -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505
[Bug target/41505] GCC choosing poor code sequence for certain stores (x86)
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-09-29 16:07 --- difficult -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41505