[Bug c++/45462] Bad optimization in -O3 sometimes

2010-09-02 Thread yotambarnoy at gmail dot com


--- Comment #14 from yotambarnoy at gmail dot com  2010-09-02 20:47 ---
Getting back to the original question, I did some reading online and I can't
figure out why this breaks the strict aliasing rules. 

Isn't void * some kind of special case? Shouldn't I be able to convert it to
whatever I need within the function without breaking aliasing? 

I think the problem is that gcc assumes that I want alignment (for the uint32 *
inside the struct) and doesn't realize I've used PACKED, so it decides that
it's undefined behavior. What do you guys think? This aliasing topic is so
confusing.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-09-01 Thread yotambarnoy at gmail dot com


--- Comment #12 from yotambarnoy at gmail dot com  2010-09-01 18:35 ---
Right. Unfortunately 
 typedef my_unaligned_aliasing_uint32 uint32
 __attribute__((aligned(1),may_alias));
 
 inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr)
 {
   return *(const my_unaligned_aliasing_uint32 *)ptr;
 }

doesn't work and doesn't align. I kept the struct method and added the
__may_alias__ attribute to fix the problem on my end. I'm glad to see gcc has
these attributes after all.

Regarding memcpy, I can't get gcc to optimize it for me at all, probably
because the PSP toolchain adds -fno-builtin to newlib. If I use
-Wl,--wrap,memcpy can I then create a __builtin_memcpy and have gcc optimize
using it?

Thanks for all your feedback guys. You've been a huge help.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] New: Bad optimization in -O3 sometimes

2010-08-31 Thread yotambarnoy at gmail dot com
This bug was very hard to trace. I'm on the ScummVM dev team and I develop
specifically for the PSP(MIPS). I came across a crash when trying to start one
of our engines. The bug only occurred under very specific circumstances -- I
bisected it and adding class variables or adding some code made it go away, but
I'm not sure what the pattern is.

Here's the problematic code, packaged together easily:

void Logic::logicUp(uint32 new_script) {
   debug(5, new pc = %d, new_script  0x);

   // going up a level - and we'll keep going this cycle
   _curObjectHub.setLogicLevel(_curObjectHub.getLogicLevel() + 1);

   assert(_curObjectHub.getLogicLevel()  3);  // Can be 0, 1, 2
   logicReplace(new_script);
}

void Logic::logicReplace(uint32 new_script) {
   uint32 level = _curObjectHub.getLogicLevel();

   _curObjectHub.setScriptId(level, new_script);
   _curObjectHub.setScriptPc(level, new_script  0x);
}

void setScriptId(int level, uint32 x) {
WRITE_LE_UINT32(_addr + 20 + 4 * level, x);
}

uint32 getScriptId(int level) {
return READ_LE_UINT32(_addr + 20 + 4 * level);
}
void setScriptPc(int level, uint32 x) {
WRITE_LE_UINT32(_addr + 32 + 4 * level, x);
}


G++ optimized these 2 functions into 1 and came up with this code:

8934bc0 _ZN6Sword25Logic7logicUpEj:
 8934bc0:   27bdfff0addiu   sp,sp,-16
 8934bc4:   afb20008sw  s2,8(sp)
 8934bc8:   afb10004sw  s1,4(sp)
 8934bcc:   30b2andis2,a1,0x  # s2 = new_scip  0x
 8934bd0:   00a08821moves1,a1 # s1 = new_scipt
 8934bd4:   3c0508aalui a1,0x8aa # a1 = 0x8aa
 8934bd8:   afb0sw  s0,0(sp)
 8934bdc:   24a50d68addiu   a1,a1,3432 # a1 = 0x8aa3432
 8934be0:   00808021moves0,a0 # s0 = this
 8934be4:   02403021movea2,s2 # a2 = new_script  0x
 8934be8:   afbf000csw  ra,12(sp)
 8934bec:   0e286377jal 8a18ddc _Z5debugiPKcz
 8934bf0:   24040005li  a0,5 # a0 = 5 (jump)
 8934bf4:   8e0400d8lw  a0,216(s0)  # a0 = *(this + 216)
 8934bf8:   88850007lwl a1,7(a0) # a1 =
 8934bfc:   98850004lwr a1,4(a0) # a1 = logicLevel
 8934c00:   24a20001addiu   v0,a1,1 # v0 = logicLevel + 1
 8934c04:   a8820007swl v0,7(a0) # 7(a0) = 0.5 new logicLevel
 8934c08:   2ca30003sltiu   v1,a1,3 # v1 = a1  3?
 8934c0c:   10600011beqzv1,8934c54
_ZN6Sword25Logic7logicUpEj+0x94 # assert
 8934c10:   b8820004swr v0,4(a0) # 4(a0) = 0.5 new logicLevel
 8934c14:   24a20005addiu   v0,a1,5 # v0 = logicLevel + 5 ???
 8934c18:   00021080sll v0,v0,0x2 # v0 = 4*logicLevel + 20
(scriptId offset)
 8934c1c:   00821021adduv0,a0,v0 # v0 = scriptId
 8934c20:   24a30008addiu   v1,a1,8 # v1 = logicLevel + 8
 8934c24:   a8510003swl s1,3(v0) # scriptId[3] = new_script
 8934c28:   00031880sll v1,v1,0x2 # v1 = logicLevel * 4 + 32
 8934c2c:   b851swr s1,0(v0) # scriptId[0] = new_script
 8934c30:   00831821adduv1,a0,v1 # v1 = *(this + 216)

Note the mistake in line 8934c00. v0 is used for incrementing the
logicLevel, and v0 is indeed saved into memory (ie.
_curObjectHub.setLogicLevel(_curObjectHub.getLogicLevel() + 1); )
But the optimization gcc made prevents it from realizing it needs to
use the newly incremented logicLevel value for the other calculations,
so instead it keeps using a1 in the rest of the function. This causes
it to write the new scriptId value in the wrong place, which causes
the VM to think its next scriptId is 0.

I'd like to attach the .ii files but I don't know how (can't find a button for
it). You can see the full code at
https://scummvm.svn.sourceforge.net/svnroot/scummvm/scummvm. The 'problem' code
is under the engines/sword2 directory in header.h, logic.cpp and function.cpp.
This bug does not happen on any other platform as far as I know, but there's a
huge random element involved regarding a particular memory layout.

Thanks
Yotam Barnoy


-- 
   Summary: Bad optimization in -O3 sometimes
   Product: gcc
   Version: 4.3.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: yotambarnoy at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread yotambarnoy at gmail dot com


--- Comment #1 from yotambarnoy at gmail dot com  2010-08-31 11:52 ---
Created an attachment (id=21602)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21602action=view)
Logic.ii, where gcc makes the mistake

LogicUp() is the critical function


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread yotambarnoy at gmail dot com


--- Comment #2 from yotambarnoy at gmail dot com  2010-08-31 11:53 ---
Created an attachment (id=21603)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21603action=view)
header.h, used by logic.cpp


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread yotambarnoy at gmail dot com


--- Comment #4 from yotambarnoy at gmail dot com  2010-08-31 15:24 ---
Good job picking up on that. 

There must be a better way of telling the compiler to generate lwr and lwl MIPS
instructions without breaking strict aliasing rules...?

Thanks a bunch!


-- 

yotambarnoy at gmail dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread yotambarnoy at gmail dot com


--- Comment #6 from yotambarnoy at gmail dot com  2010-09-01 04:32 ---
I recently implemented a custom memcpy for ScummVM. I didn't notice the
standard memcpy using lwl and lwr. In any case, how would memcpy do it any
better? Unless you're referring to the new memcpy inlining in newer versions of
gcc?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread yotambarnoy at gmail dot com


--- Comment #8 from yotambarnoy at gmail dot com  2010-09-01 05:03 ---
Unfortunately, a lib based solutions are difficult for me to implement. The
reason is that the current PSP SDK uses newlib. I can probably change my
personal toolchain with some work, but then it's a custom modification that
needs to be replicated to every other ScummVM dev as well as our buildbot. Not
impossible, but not work I'd like to get in to right now. 

In any case, it sounds like what you're saying is that memcpy has asm
instructions in the right place to use lwl and lwr. I can also do that in my
implementation.

My request was more general, as in gcc needs some kind of custom keyword to
tell it to allow unaligned pointers and to generate appropriate unaligned code,
so we don't have to trick the compiler into doing it in a way that ruins
optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr; 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462