[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #15 from hp at gcc dot gnu dot org 2010-09-04 03:08 --- (In reply to comment #4) Good job picking up on that. There must be a better way of telling the compiler to generate lwr and lwl MIPS instructions without breaking strict aliasing rules...? When requiring a specific insn you want an asm: unsigned int result; unsigned char *p; /* Need the m (dummy) to mark memory as read. Need earlyclobber because gcc using the same register would cause...problems. Little endian assumed. */ asm (lwr %0,0(%1)\n\tlwl %0,3(%1) : =r (result) : r (p), m (*p)); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #13 from rguenth at gcc dot gnu dot org 2010-09-02 09:07 --- (In reply to comment #11) (In reply to comment #10) typedef my_unaligned_aliasing_uint32 uint32 __attribute__((aligned(1),may_alias)); inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr) { return *(const my_unaligned_aliasing_uint32 *)ptr; } It does not: READ_UINT32: j $31 lw $2,0($4) The aligned attribute is ignored there I think. It is if the target is STRICT_ALIGNMENT (which of course is a bug, but well ... and I happen to have a fix as well) memcpy produces: lbu $2,3($4) lbu $6,0($4) lbu $5,1($4) lbu $3,2($4) addiu $sp,$sp,-16 sb $6,0($sp) sb $5,1($sp) sb $3,2($sp) sb $2,3($sp) lw $2,0($sp) j $31 addiu $sp,$sp,16 Which is bad and could be improved by using lwl/lwr. I will file a bug about that. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #14 from yotambarnoy at gmail dot com 2010-09-02 20:47 --- Getting back to the original question, I did some reading online and I can't figure out why this breaks the strict aliasing rules. Isn't void * some kind of special case? Shouldn't I be able to convert it to whatever I need within the function without breaking aliasing? I think the problem is that gcc assumes that I want alignment (for the uint32 * inside the struct) and doesn't realize I've used PACKED, so it decides that it's undefined behavior. What do you guys think? This aliasing topic is so confusing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
Re: [Bug c++/45462] Bad optimization in -O3 sometimes
I am not talking about a library solution at all. I am talking about a solution inside the compiler. Gcc will optimize memcpy; how much for MIPS is a good question. Try it out and see. Oh if you are using scei's gcc you really should be reporting issues to them. On Aug 31, 2010, at 10:03 PM, yotambarnoy at gmail dot com gcc-bugzi...@gcc.gnu.org wrote: --- Comment #8 from yotambarnoy at gmail dot com 2010-09-01 05:03 --- Unfortunately, a lib based solutions are difficult for me to implement. The reason is that the current PSP SDK uses newlib. I can probably change my personal toolchain with some work, but then it's a custom modification that needs to be replicated to every other ScummVM dev as well as our buildbot. Not impossible, but not work I'd like to get in to right now. In any case, it sounds like what you're saying is that memcpy has asm instructions in the right place to use lwl and lwr. I can also do that in my implementation. My request was more general, as in gcc needs some kind of custom keyword to tell it to allow unaligned pointers and to generate appropriate unaligned code, so we don't have to trick the compiler into doing it in a way that ruins optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #9 from pinskia at gmail dot com 2010-09-01 06:17 --- Subject: Re: Bad optimization in -O3 sometimes I am not talking about a library solution at all. I am talking about a solution inside the compiler. Gcc will optimize memcpy; how much for MIPS is a good question. Try it out and see. Oh if you are using scei's gcc you really should be reporting issues to them. On Aug 31, 2010, at 10:03 PM, yotambarnoy at gmail dot com gcc-bugzi...@gcc.gnu.org wrote: --- Comment #8 from yotambarnoy at gmail dot com 2010-09-01 05:03 --- Unfortunately, a lib based solutions are difficult for me to implement. The reason is that the current PSP SDK uses newlib. I can probably change my personal toolchain with some work, but then it's a custom modification that needs to be replicated to every other ScummVM dev as well as our buildbot. Not impossible, but not work I'd like to get in to right now. In any case, it sounds like what you're saying is that memcpy has asm instructions in the right place to use lwl and lwr. I can also do that in my implementation. My request was more general, as in gcc needs some kind of custom keyword to tell it to allow unaligned pointers and to generate appropriate unaligned code, so we don't have to trick the compiler into doing it in a way that ruins optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #10 from rguenth at gcc dot gnu dot org 2010-09-01 09:45 --- (In reply to comment #8) Unfortunately, a lib based solutions are difficult for me to implement. The reason is that the current PSP SDK uses newlib. I can probably change my personal toolchain with some work, but then it's a custom modification that needs to be replicated to every other ScummVM dev as well as our buildbot. Not impossible, but not work I'd like to get in to right now. In any case, it sounds like what you're saying is that memcpy has asm instructions in the right place to use lwl and lwr. I can also do that in my implementation. My request was more general, as in gcc needs some kind of custom keyword to tell it to allow unaligned pointers and to generate appropriate unaligned code, so we don't have to trick the compiler into doing it in a way that ruins optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr; typedef my_unaligned_aliasing_uint32 uint32 __attribute__((aligned(1),may_alias)); inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr) { return *(const my_unaligned_aliasing_uint32 *)ptr; } should do it and does not require -fno-strict-aliasing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #11 from pinskia at gcc dot gnu dot org 2010-09-01 18:25 --- (In reply to comment #10) typedef my_unaligned_aliasing_uint32 uint32 __attribute__((aligned(1),may_alias)); inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr) { return *(const my_unaligned_aliasing_uint32 *)ptr; } It does not: READ_UINT32: j $31 lw $2,0($4) The aligned attribute is ignored there I think. memcpy produces: lbu $2,3($4) lbu $6,0($4) lbu $5,1($4) lbu $3,2($4) addiu $sp,$sp,-16 sb $6,0($sp) sb $5,1($sp) sb $3,2($sp) sb $2,3($sp) lw $2,0($sp) j $31 addiu $sp,$sp,16 Which is bad and could be improved by using lwl/lwr. I will file a bug about that. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #12 from yotambarnoy at gmail dot com 2010-09-01 18:35 --- Right. Unfortunately typedef my_unaligned_aliasing_uint32 uint32 __attribute__((aligned(1),may_alias)); inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr) { return *(const my_unaligned_aliasing_uint32 *)ptr; } doesn't work and doesn't align. I kept the struct method and added the __may_alias__ attribute to fix the problem on my end. I'm glad to see gcc has these attributes after all. Regarding memcpy, I can't get gcc to optimize it for me at all, probably because the PSP toolchain adds -fno-builtin to newlib. If I use -Wl,--wrap,memcpy can I then create a __builtin_memcpy and have gcc optimize using it? Thanks for all your feedback guys. You've been a huge help. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #1 from yotambarnoy at gmail dot com 2010-08-31 11:52 --- Created an attachment (id=21602) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21602action=view) Logic.ii, where gcc makes the mistake LogicUp() is the critical function -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #2 from yotambarnoy at gmail dot com 2010-08-31 11:53 --- Created an attachment (id=21603) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21603action=view) header.h, used by logic.cpp -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #3 from rguenth at gcc dot gnu dot org 2010-08-31 14:17 --- inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr) { struct Unaligned32 { uint32 val; } __attribute__ ((__packed__)); return ((const Unaligned32 *)ptr)-val; } and similar look like they might violate C aliasing rules. Try using -fno-strict-aliasing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #4 from yotambarnoy at gmail dot com 2010-08-31 15:24 --- Good job picking up on that. There must be a better way of telling the compiler to generate lwr and lwl MIPS instructions without breaking strict aliasing rules...? Thanks a bunch! -- yotambarnoy at gmail dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
Re: [Bug c++/45462] Bad optimization in -O3 sometimes
On Aug 31, 2010, at 8:24 AM, yotambarnoy at gmail dot com gcc-bugzi...@gcc.gnu.org wrote: --- Comment #4 from yotambarnoy at gmail dot com 2010-08-31 15:24 --- Good job picking up on that. There must be a better way of telling the compiler to generate lwr and lwl MIPS instructions without breaking strict aliasing rules...? Have you tried using memcpy? Thanks a bunch! -- yotambarnoy at gmail dot com changed: What|Removed |Added --- --- -- Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #5 from pinskia at gmail dot com 2010-08-31 19:09 --- Subject: Re: Bad optimization in -O3 sometimes On Aug 31, 2010, at 8:24 AM, yotambarnoy at gmail dot com gcc-bugzi...@gcc.gnu.org wrote: --- Comment #4 from yotambarnoy at gmail dot com 2010-08-31 15:24 --- Good job picking up on that. There must be a better way of telling the compiler to generate lwr and lwl MIPS instructions without breaking strict aliasing rules...? Have you tried using memcpy? Thanks a bunch! -- yotambarnoy at gmail dot com changed: What|Removed |Added --- --- -- Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #6 from yotambarnoy at gmail dot com 2010-09-01 04:32 --- I recently implemented a custom memcpy for ScummVM. I didn't notice the standard memcpy using lwl and lwr. In any case, how would memcpy do it any better? Unless you're referring to the new memcpy inlining in newer versions of gcc? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
Re: [Bug c++/45462] Bad optimization in -O3 sometimes
On Aug 31, 2010, at 9:32 PM, yotambarnoy at gmail dot com gcc-bugzi...@gcc.gnu.org wrote: --- Comment #6 from yotambarnoy at gmail dot com 2010-09-01 04:32 --- I recently implemented a custom memcpy for ScummVM. I didn't notice the standard memcpy using lwl and lwr. In any case, how would memcpy do it any better? Unless you're referring to the new memcpy inlining in newer versions of gcc? I am referring to the standard builtin version of memcpy. It is not just in newer versions; it has been there since 3.0. What is new is the more optimized version for x86 with either a large constant or a non constant. Can you try memcpy? If that does not work, please file a bug and cc me, I will see what I can do. I am working with MIPS lately. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #7 from pinskia at gmail dot com 2010-09-01 04:41 --- Subject: Re: Bad optimization in -O3 sometimes On Aug 31, 2010, at 9:32 PM, yotambarnoy at gmail dot com gcc-bugzi...@gcc.gnu.org wrote: --- Comment #6 from yotambarnoy at gmail dot com 2010-09-01 04:32 --- I recently implemented a custom memcpy for ScummVM. I didn't notice the standard memcpy using lwl and lwr. In any case, how would memcpy do it any better? Unless you're referring to the new memcpy inlining in newer versions of gcc? I am referring to the standard builtin version of memcpy. It is not just in newer versions; it has been there since 3.0. What is new is the more optimized version for x86 with either a large constant or a non constant. Can you try memcpy? If that does not work, please file a bug and cc me, I will see what I can do. I am working with MIPS lately. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462
[Bug c++/45462] Bad optimization in -O3 sometimes
--- Comment #8 from yotambarnoy at gmail dot com 2010-09-01 05:03 --- Unfortunately, a lib based solutions are difficult for me to implement. The reason is that the current PSP SDK uses newlib. I can probably change my personal toolchain with some work, but then it's a custom modification that needs to be replicated to every other ScummVM dev as well as our buildbot. Not impossible, but not work I'd like to get in to right now. In any case, it sounds like what you're saying is that memcpy has asm instructions in the right place to use lwl and lwr. I can also do that in my implementation. My request was more general, as in gcc needs some kind of custom keyword to tell it to allow unaligned pointers and to generate appropriate unaligned code, so we don't have to trick the compiler into doing it in a way that ruins optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462