[Bug middle-end/34992] compiler produces wrong code when optimizing

2008-01-28 Thread roebel at ircam dot fr


--- Comment #8 from roebel at ircam dot fr  2008-01-28 10:00 ---
For completeness :
I now use this function that was proposed in 
PR 323. It seems to solve my issue. Thanks for the pointer!

  inline
  void set_math_double_precision() {
fpu_control_t fpu_control ;
_FPU_GETCW(fpu_control);
fpu_control = (fpu_control  ~_FPU_EXTENDED) | _FPU_DOUBLE;
_FPU_SETCW(fpu_control);
  }


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992



[Bug tree-optimization/26788] optimization of expression templates not as performant as g++ 4.0.2

2008-01-27 Thread roebel at ircam dot fr


--- Comment #13 from roebel at ircam dot fr  2008-01-27 12:35 ---
Hi,

I run the tests with g++ 422 and it seems to  me the issue is closed.
Compilation without the salias-max-implicit-fields flag is nor producing
any substantial increase in run time any more and with and without
this parameter the hand optimized and compiler template version 
of the code have very similar run time.

I would be really happy with this, if gcc422 would produce
correct code in all my projects. I tried it already a while ago
and found a problem with std::set where the optimized version of the program
simply did and up with duplicate entries in the set 
(while gcc 4.1.2 has no problems with the very same code)!!!

Besides that show stopper we had other problems with code using
sse/sse2 intrinsics  producing wrong results when optimization was enabled.

All this may have changed in gcc4.3. I will give it another trial.

Thanks


-- 

roebel at ircam dot fr changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788



[Bug c++/34992] New: compiler produces wrong code when optimizing

2008-01-27 Thread roebel at ircam dot fr
Hello,

for some time now I had problems with one of my projects with compilers gcc
4.2.1 and above. For obscure reasons my testsuite did not run correctly when I
did compile with optimization enabled. I now tracked the problem down
to a very strange bug that I can reproduce with a test case that I will 
attach. The problem I encounter is the fact that inserting duplicated values
into a std::set will end up with a set holding these values more than once.
This happens only with optimization enabled.

I was able to prevent the problem in the test case by means of just
moving the function that would create the error further away from
the position where the function is called. In the test case the offset is
created by means of another function that is not used, and that can be removed
from the object file by means of compiler macros. 

If the unused function is part of the object file, then the std::set
manipulation runs fine, if the function is disabled and removed from the object
file the manipulation of the std::set fails. The function is inserted into the
object file by means of defining the precompiler macro MAKEOFFSET.

Kind regards,


-- 
   Summary: compiler produces wrong code when optimizing
   Product: gcc
   Version: 4.2.3
Status: UNCONFIRMED
  Severity: critical
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: roebel at ircam dot fr
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992



[Bug c++/34992] compiler produces wrong code when optimizing

2008-01-27 Thread roebel at ircam dot fr


--- Comment #1 from roebel at ircam dot fr  2008-01-27 22:55 ---
Created an attachment (id=15034)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15034action=view)
source file for reproducing the problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992



[Bug c++/34992] compiler produces wrong code when optimizing

2008-01-27 Thread roebel at ircam dot fr


--- Comment #2 from roebel at ircam dot fr  2008-01-27 22:57 ---
Created an attachment (id=15035)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15035action=view)
script to compile the source and reproduce the problem 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992



[Bug middle-end/34992] compiler produces wrong code when optimizing

2008-01-27 Thread roebel at ircam dot fr


--- Comment #4 from roebel at ircam dot fr  2008-01-27 23:05 ---

yes indeed, that fixes the problem.
now, does that mean holding double values in a set 
is not possible?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992



[Bug middle-end/34992] compiler produces wrong code when optimizing

2008-01-27 Thread roebel at ircam dot fr


--- Comment #6 from roebel at ircam dot fr  2008-01-28 00:14 ---
Andrew,

while -ffloat-store fixes the problem, this solution is obviously not
acceptable. Moreover, here the problem is not that I compare floats 
using = the problem is that std::setdouble::insert(double) compares
set elements using std::lessdouble which in this case just does not work. 

Now using std::lessdouble for a set does not seem not carefull
to me, be cause this is the default. So the default is nonsense ?

I fixed the problem by means of using  std::setdouble,dless
with dless being

struct dless{
  typedef double first_argument_type;
  typedef double second_argument_type;
  typedef bool result_type;
  volatile double d1;
  volatile double d2;
  bool operator()(double in1, double in2) {
d1 = in1;
d2 = in2;
return d1  d2;
  }
};

which proves that the problem is in the std::less operator
and not in my code. In fact this even means that 
std::lessdouble and std::greaterdouble and all 
their company is useless because you can never be sure what they 
are comparing. So many of the functions of the std::library
may fail!! Am I wrong, here ? 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992



[Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2

2006-03-22 Thread roebel at ircam dot fr


--- Comment #5 from roebel at ircam dot fr  2006-03-22 11:13 ---
Created an attachment (id=11090)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11090action=view)
Results file for testcase

As you requested I provide a testcase. It consists of 2 shell scripts
that run the different compilers and then run the testcase.
The testcase has two cases and two compilation modes:

switch 1
compiled with -DHAND it gives hand optimized pointer only version
compiled with -DMATMTL it gives the equivalent expression templates version
switch 2
compiled with -DBENCH=1 it calculates an addition of three vectors
compiled with -DBENCH=2 it calculates an addition of three vectors with some
scalar multiplications

The name of the excutable will indicate the experiment by two final characters
H1 stands for hand optimized first benchmark, M2 stands for matmtl second
benchmark ...

The two scripts comp.sh and master.sh run the whole experiment:
comp.sh runs the experiment for a single compiler and a user supplied set of 
vector sizes. Note, that each experiment always uses 1
vector element operations. By means of the vector size the amount of overhead
can be controlled.
master.sh runs comp.sh with a single compiler and the vector size arguments 5
and 1000

Results are produced with

./master.sh 21 | tee mout
egrep #|user  mout

First result is that gcc 4.1.0 with --param  salias-max-implicit-fields=50
is a real success. As you see the compile time does not change at least for
this testcase but the performance is identical to the pointer only
case

second result is that for gcc 4.1.0 with default parameter set
we get performance worse then gcc 4.0.2 especially for small vectors
(large overhead). The larger the vectors become the more
gcc 4.1.0 approaches 4.0.2

###
# g++ 4.0.2 the reference
###
#compile times
user0m0.702s
user0m0.697s
user0m1.066s
user0m1.077s
#run times : vector size 5
# benchmarkredH1
user0m0.295s
# benchmarkredM1
user0m0.307s
# benchmarkredH2
user0m0.381s
# benchmarkredM2
user0m0.412s
#run times : vector size 1000
# benchmarkredH1
user0m0.230s
# benchmarkredM1
user0m0.243s
# benchmarkredH2
user0m0.287s
# benchmarkredM2
user0m0.370s
# g++ 4.1.0 default
###
#compile times
user0m0.747s
user0m0.752s
user0m1.211s
user0m1.227s
#run times : vector size 5
# benchmarkredH1
user0m0.264s
# benchmarkredM1
user0m0.519s
# benchmarkredH2
user0m0.347s
# benchmarkredM2
user0m1.211s
#run times : vector size 1000
# benchmarkredH1
user0m0.222s
# benchmarkredM1
user0m0.286s
# benchmarkredH2
user0m0.298s
# benchmarkredM2
user0m0.375s
# g++ 4.1.0 salias=50
###
#compile times
user0m0.753s
user0m0.741s
user0m1.225s
user0m1.239s
#run times : vector size 5
# benchmarkredH1
user0m0.262s
# benchmarkredM1
user0m0.307s
# benchmarkredH2
user0m0.344s
# benchmarkredM2
user0m0.313s
#run times : vector size 1000
# benchmarkredH1
user0m0.223s
# benchmarkredM1
user0m0.234s
# benchmarkredH2
user0m0.299s
# benchmarkredM2
user0m0.260s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788



[Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2

2006-03-22 Thread roebel at ircam dot fr


--- Comment #6 from roebel at ircam dot fr  2006-03-22 11:14 ---
Created an attachment (id=11091)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11091action=view)
master shell script

for comments 
see 11090: Results file for testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788



[Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2

2006-03-22 Thread roebel at ircam dot fr


--- Comment #7 from roebel at ircam dot fr  2006-03-22 11:15 ---
Created an attachment (id=11092)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11092action=view)
single experiment shell script

for comments 
see 11090: Results file for testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788



[Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2

2006-03-22 Thread roebel at ircam dot fr


--- Comment #8 from roebel at ircam dot fr  2006-03-22 11:16 ---
Created an attachment (id=11093)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11093action=view)
testcase source file

for comments 
see 11090: Results file for testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788



[Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2

2006-03-22 Thread roebel at ircam dot fr


--- Comment #10 from roebel at ircam dot fr  2006-03-22 11:55 ---

Not that I understand what you just said, but, I wanted to mention, that 
in contrast to my initial email the data I just sent
indicates a small performance penalty  of about 25% for g++ 4.0.2 
for large vectors on a pentium 4 (that are the results I've sent) 
while there is no such 
penalty for large vectors on a pentium m. On a pentium m g++ 4.0.2
works as well as g++ 4.1.0 on pentium 4 with the --param salias...=50.
Unfortunately, I dont have gcc 4.1.0 on my pentium m machine

thanks,

Axel


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788



[Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2

2006-03-21 Thread roebel at ircam dot fr
Hi,

I just installed gcc 4.1.0 to compile my template expression 
matrix arithmetric library (a la Blitz). 
I recently did benchmarks with g++ 3.4.4
and 4.0.2 an I was pretty much impressed that g++ 4.0.2 managed to
optimize the expressions such that I obtained performance nearly
twice as fast as with g++ 3.4.4, and even better
the performance was the same as my hand coded pointer only implementation.
I was rather happy with this result. It seems that the handling of 
pointer arrays that are stored in a struct that represents the expression
has been significantly improved.  

Now, the downside. I tried 4.1.0 and I noticed that the performance dropped
down too a level even worse than gcc 3.4.4. I wondered about the reason and 
scanned the optimization parameters. I found salias-max-implicit-fields
with a default value of 5. I guessed that might be the reason
and increased the value to 50. With this value I've got back the impressive
performance of g++ 4.0.2.

I wonder why the default value has been set so low that apparently it
cripples the optimizer to a level of optimization consierably
below what has been achieved with g++ 4.0.2 (where this option does not exist).
Does this option negatively affects performance elsewhere? If not
it seems to me that a default value that resembles the 
settings in gcc 4.0.2 would be more sensible.

Kind regards,

and thanks anyway for this great compiler suite.

Axel


-- 
   Summary: optimization of expression templates not as performant
as g++ 4.0.2
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: roebel at ircam dot fr
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788