[Bug c++/45462] Bad optimization in -O3 sometimes

2010-09-03 Thread hp at gcc dot gnu dot org


--- Comment #15 from hp at gcc dot gnu dot org  2010-09-04 03:08 ---
(In reply to comment #4)
 Good job picking up on that. 
 
 There must be a better way of telling the compiler to generate lwr and lwl 
 MIPS
 instructions without breaking strict aliasing rules...?

When requiring a specific insn you want an asm:

unsigned int result;
unsigned char *p;

/* Need the m (dummy) to mark memory as read. Need earlyclobber because gcc
using the same register would cause...problems.  Little endian assumed. */
asm (lwr %0,0(%1)\n\tlwl %0,3(%1) : =r (result) : r (p), m (*p));


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-09-02 Thread rguenth at gcc dot gnu dot org


--- Comment #13 from rguenth at gcc dot gnu dot org  2010-09-02 09:07 
---
(In reply to comment #11)
 (In reply to comment #10)
  typedef my_unaligned_aliasing_uint32 uint32
  __attribute__((aligned(1),may_alias));
  
  inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void 
  *ptr)
  {
return *(const my_unaligned_aliasing_uint32 *)ptr;
  }
 
 It does not:
 READ_UINT32:
 j   $31
 lw  $2,0($4)
 
 The aligned attribute is ignored there I think.

It is if the target is STRICT_ALIGNMENT (which of course is a bug, but
well ... and I happen to have a fix as well)

  memcpy produces:
 lbu $2,3($4)
 lbu $6,0($4)
 lbu $5,1($4)
 lbu $3,2($4)
 addiu   $sp,$sp,-16
 sb  $6,0($sp)
 sb  $5,1($sp)
 sb  $3,2($sp)
 sb  $2,3($sp)
 lw  $2,0($sp)
 j   $31
 addiu   $sp,$sp,16
 
 Which is bad and could be improved by using lwl/lwr.  I will file a bug about
 that.
 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-09-02 Thread yotambarnoy at gmail dot com


--- Comment #14 from yotambarnoy at gmail dot com  2010-09-02 20:47 ---
Getting back to the original question, I did some reading online and I can't
figure out why this breaks the strict aliasing rules. 

Isn't void * some kind of special case? Shouldn't I be able to convert it to
whatever I need within the function without breaking aliasing? 

I think the problem is that gcc assumes that I want alignment (for the uint32 *
inside the struct) and doesn't realize I've used PACKED, so it decides that
it's undefined behavior. What do you guys think? This aliasing topic is so
confusing.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



Re: [Bug c++/45462] Bad optimization in -O3 sometimes

2010-09-01 Thread Andrew Pinski
I am not talking about a library solution at all. I am talking about a  
solution inside the compiler. Gcc will optimize memcpy; how much for  
MIPS is a good question. Try it out and see. Oh if you are using  
scei's gcc you really should be reporting issues to them.


On Aug 31, 2010, at 10:03 PM, yotambarnoy at gmail dot com gcc-bugzi...@gcc.gnu.org 
 wrote:





--- Comment #8 from yotambarnoy at gmail dot com  2010-09-01  
05:03 ---
Unfortunately, a lib based solutions are difficult for me to  
implement. The
reason is that the current PSP SDK uses newlib. I can probably  
change my
personal toolchain with some work, but then it's a custom  
modification that
needs to be replicated to every other ScummVM dev as well as our  
buildbot. Not

impossible, but not work I'd like to get in to right now.

In any case, it sounds like what you're saying is that memcpy has asm
instructions in the right place to use lwl and lwr. I can also do  
that in my

implementation.

My request was more general, as in gcc needs some kind of custom  
keyword to
tell it to allow unaligned pointers and to generate appropriate  
unaligned code,
so we don't have to trick the compiler into doing it in a way that  
ruins

optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr;


--


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-09-01 Thread pinskia at gmail dot com


--- Comment #9 from pinskia at gmail dot com  2010-09-01 06:17 ---
Subject: Re:  Bad optimization in -O3 sometimes

I am not talking about a library solution at all. I am talking about a  
solution inside the compiler. Gcc will optimize memcpy; how much for  
MIPS is a good question. Try it out and see. Oh if you are using  
scei's gcc you really should be reporting issues to them.

On Aug 31, 2010, at 10:03 PM, yotambarnoy at gmail dot com
gcc-bugzi...@gcc.gnu.org 
  wrote:



 --- Comment #8 from yotambarnoy at gmail dot com  2010-09-01  
 05:03 ---
 Unfortunately, a lib based solutions are difficult for me to  
 implement. The
 reason is that the current PSP SDK uses newlib. I can probably  
 change my
 personal toolchain with some work, but then it's a custom  
 modification that
 needs to be replicated to every other ScummVM dev as well as our  
 buildbot. Not
 impossible, but not work I'd like to get in to right now.

 In any case, it sounds like what you're saying is that memcpy has asm
 instructions in the right place to use lwl and lwr. I can also do  
 that in my
 implementation.

 My request was more general, as in gcc needs some kind of custom  
 keyword to
 tell it to allow unaligned pointers and to generate appropriate  
 unaligned code,
 so we don't have to trick the compiler into doing it in a way that  
 ruins
 optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr;


 -- 


 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-09-01 Thread rguenth at gcc dot gnu dot org


--- Comment #10 from rguenth at gcc dot gnu dot org  2010-09-01 09:45 
---
(In reply to comment #8)
 Unfortunately, a lib based solutions are difficult for me to implement. The
 reason is that the current PSP SDK uses newlib. I can probably change my
 personal toolchain with some work, but then it's a custom modification that
 needs to be replicated to every other ScummVM dev as well as our buildbot. Not
 impossible, but not work I'd like to get in to right now. 
 
 In any case, it sounds like what you're saying is that memcpy has asm
 instructions in the right place to use lwl and lwr. I can also do that in my
 implementation.
 
 My request was more general, as in gcc needs some kind of custom keyword to
 tell it to allow unaligned pointers and to generate appropriate unaligned 
 code,
 so we don't have to trick the compiler into doing it in a way that ruins
 optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr; 
 

typedef my_unaligned_aliasing_uint32 uint32
__attribute__((aligned(1),may_alias));

inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr)
{
  return *(const my_unaligned_aliasing_uint32 *)ptr;
}

should do it and does not require -fno-strict-aliasing.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-09-01 Thread pinskia at gcc dot gnu dot org


--- Comment #11 from pinskia at gcc dot gnu dot org  2010-09-01 18:25 
---
(In reply to comment #10)
 typedef my_unaligned_aliasing_uint32 uint32
 __attribute__((aligned(1),may_alias));
 
 inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr)
 {
   return *(const my_unaligned_aliasing_uint32 *)ptr;
 }

It does not:
READ_UINT32:
j   $31
lw  $2,0($4)

The aligned attribute is ignored there I think.  memcpy produces:
lbu $2,3($4)
lbu $6,0($4)
lbu $5,1($4)
lbu $3,2($4)
addiu   $sp,$sp,-16
sb  $6,0($sp)
sb  $5,1($sp)
sb  $3,2($sp)
sb  $2,3($sp)
lw  $2,0($sp)
j   $31
addiu   $sp,$sp,16

Which is bad and could be improved by using lwl/lwr.  I will file a bug about
that.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-09-01 Thread yotambarnoy at gmail dot com


--- Comment #12 from yotambarnoy at gmail dot com  2010-09-01 18:35 ---
Right. Unfortunately 
 typedef my_unaligned_aliasing_uint32 uint32
 __attribute__((aligned(1),may_alias));
 
 inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr)
 {
   return *(const my_unaligned_aliasing_uint32 *)ptr;
 }

doesn't work and doesn't align. I kept the struct method and added the
__may_alias__ attribute to fix the problem on my end. I'm glad to see gcc has
these attributes after all.

Regarding memcpy, I can't get gcc to optimize it for me at all, probably
because the PSP toolchain adds -fno-builtin to newlib. If I use
-Wl,--wrap,memcpy can I then create a __builtin_memcpy and have gcc optimize
using it?

Thanks for all your feedback guys. You've been a huge help.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread yotambarnoy at gmail dot com


--- Comment #1 from yotambarnoy at gmail dot com  2010-08-31 11:52 ---
Created an attachment (id=21602)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21602action=view)
Logic.ii, where gcc makes the mistake

LogicUp() is the critical function


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread yotambarnoy at gmail dot com


--- Comment #2 from yotambarnoy at gmail dot com  2010-08-31 11:53 ---
Created an attachment (id=21603)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21603action=view)
header.h, used by logic.cpp


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread rguenth at gcc dot gnu dot org


--- Comment #3 from rguenth at gcc dot gnu dot org  2010-08-31 14:17 ---
 inline __attribute__((__always_inline__)) uint32 READ_UINT32(const void *ptr)
{
  struct Unaligned32 { uint32 val; } __attribute__ ((__packed__));
  return ((const Unaligned32 *)ptr)-val;
 }

and similar look like they might violate C aliasing rules.  Try using
-fno-strict-aliasing.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread yotambarnoy at gmail dot com


--- Comment #4 from yotambarnoy at gmail dot com  2010-08-31 15:24 ---
Good job picking up on that. 

There must be a better way of telling the compiler to generate lwr and lwl MIPS
instructions without breaking strict aliasing rules...?

Thanks a bunch!


-- 

yotambarnoy at gmail dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



Re: [Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread Andrew Pinski



On Aug 31, 2010, at 8:24 AM, yotambarnoy at gmail dot com gcc-bugzi...@gcc.gnu.org 
 wrote:





--- Comment #4 from yotambarnoy at gmail dot com  2010-08-31  
15:24 ---

Good job picking up on that.

There must be a better way of telling the compiler to generate lwr  
and lwl MIPS

instructions without breaking strict aliasing rules...?


Have you tried using memcpy?



Thanks a bunch!


--

yotambarnoy at gmail dot com changed:

  What|Removed |Added
--- 
--- 
--

Status|UNCONFIRMED |RESOLVED
Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread pinskia at gmail dot com


--- Comment #5 from pinskia at gmail dot com  2010-08-31 19:09 ---
Subject: Re:  Bad optimization in -O3 sometimes



On Aug 31, 2010, at 8:24 AM, yotambarnoy at gmail dot com
gcc-bugzi...@gcc.gnu.org 
  wrote:



 --- Comment #4 from yotambarnoy at gmail dot com  2010-08-31  
 15:24 ---
 Good job picking up on that.

 There must be a better way of telling the compiler to generate lwr  
 and lwl MIPS
 instructions without breaking strict aliasing rules...?

Have you tried using memcpy?


 Thanks a bunch!


 -- 

 yotambarnoy at gmail dot com changed:

   What|Removed |Added
 --- 
 --- 
 --
 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread yotambarnoy at gmail dot com


--- Comment #6 from yotambarnoy at gmail dot com  2010-09-01 04:32 ---
I recently implemented a custom memcpy for ScummVM. I didn't notice the
standard memcpy using lwl and lwr. In any case, how would memcpy do it any
better? Unless you're referring to the new memcpy inlining in newer versions of
gcc?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



Re: [Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread Andrew Pinski



On Aug 31, 2010, at 9:32 PM, yotambarnoy at gmail dot com gcc-bugzi...@gcc.gnu.org 
 wrote:





--- Comment #6 from yotambarnoy at gmail dot com  2010-09-01  
04:32 ---
I recently implemented a custom memcpy for ScummVM. I didn't notice  
the
standard memcpy using lwl and lwr. In any case, how would memcpy do  
it any
better? Unless you're referring to the new memcpy inlining in newer  
versions of

gcc?


I am referring to the standard builtin version of memcpy.  It is not  
just in newer versions; it has been there since 3.0. What is new is  
the more optimized version for x86 with either a large constant or a  
non constant. Can you try memcpy? If that does not work, please file a  
bug and cc me, I will see what I can do. I am working with MIPS lately.





--


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread pinskia at gmail dot com


--- Comment #7 from pinskia at gmail dot com  2010-09-01 04:41 ---
Subject: Re:  Bad optimization in -O3 sometimes



On Aug 31, 2010, at 9:32 PM, yotambarnoy at gmail dot com
gcc-bugzi...@gcc.gnu.org 
  wrote:



 --- Comment #6 from yotambarnoy at gmail dot com  2010-09-01  
 04:32 ---
 I recently implemented a custom memcpy for ScummVM. I didn't notice  
 the
 standard memcpy using lwl and lwr. In any case, how would memcpy do  
 it any
 better? Unless you're referring to the new memcpy inlining in newer  
 versions of
 gcc?

I am referring to the standard builtin version of memcpy.  It is not  
just in newer versions; it has been there since 3.0. What is new is  
the more optimized version for x86 with either a large constant or a  
non constant. Can you try memcpy? If that does not work, please file a  
bug and cc me, I will see what I can do. I am working with MIPS lately.



 -- 


 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462



[Bug c++/45462] Bad optimization in -O3 sometimes

2010-08-31 Thread yotambarnoy at gmail dot com


--- Comment #8 from yotambarnoy at gmail dot com  2010-09-01 05:03 ---
Unfortunately, a lib based solutions are difficult for me to implement. The
reason is that the current PSP SDK uses newlib. I can probably change my
personal toolchain with some work, but then it's a custom modification that
needs to be replicated to every other ScummVM dev as well as our buildbot. Not
impossible, but not work I'd like to get in to right now. 

In any case, it sounds like what you're saying is that memcpy has asm
instructions in the right place to use lwl and lwr. I can also do that in my
implementation.

My request was more general, as in gcc needs some kind of custom keyword to
tell it to allow unaligned pointers and to generate appropriate unaligned code,
so we don't have to trick the compiler into doing it in a way that ruins
optimization. Something like __unaligned__ uint32 *ptr32 = bytePtr; 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45462