[Bug middle-end/34992] compiler produces wrong code when optimizing
--- Comment #8 from roebel at ircam dot fr 2008-01-28 10:00 --- For completeness : I now use this function that was proposed in PR 323. It seems to solve my issue. Thanks for the pointer! inline void set_math_double_precision() { fpu_control_t fpu_control ; _FPU_GETCW(fpu_control); fpu_control = (fpu_control ~_FPU_EXTENDED) | _FPU_DOUBLE; _FPU_SETCW(fpu_control); } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992
[Bug tree-optimization/26788] optimization of expression templates not as performant as g++ 4.0.2
--- Comment #13 from roebel at ircam dot fr 2008-01-27 12:35 --- Hi, I run the tests with g++ 422 and it seems to me the issue is closed. Compilation without the salias-max-implicit-fields flag is nor producing any substantial increase in run time any more and with and without this parameter the hand optimized and compiler template version of the code have very similar run time. I would be really happy with this, if gcc422 would produce correct code in all my projects. I tried it already a while ago and found a problem with std::set where the optimized version of the program simply did and up with duplicate entries in the set (while gcc 4.1.2 has no problems with the very same code)!!! Besides that show stopper we had other problems with code using sse/sse2 intrinsics producing wrong results when optimization was enabled. All this may have changed in gcc4.3. I will give it another trial. Thanks -- roebel at ircam dot fr changed: What|Removed |Added Status|WAITING |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
[Bug c++/34992] New: compiler produces wrong code when optimizing
Hello, for some time now I had problems with one of my projects with compilers gcc 4.2.1 and above. For obscure reasons my testsuite did not run correctly when I did compile with optimization enabled. I now tracked the problem down to a very strange bug that I can reproduce with a test case that I will attach. The problem I encounter is the fact that inserting duplicated values into a std::set will end up with a set holding these values more than once. This happens only with optimization enabled. I was able to prevent the problem in the test case by means of just moving the function that would create the error further away from the position where the function is called. In the test case the offset is created by means of another function that is not used, and that can be removed from the object file by means of compiler macros. If the unused function is part of the object file, then the std::set manipulation runs fine, if the function is disabled and removed from the object file the manipulation of the std::set fails. The function is inserted into the object file by means of defining the precompiler macro MAKEOFFSET. Kind regards, -- Summary: compiler produces wrong code when optimizing Product: gcc Version: 4.2.3 Status: UNCONFIRMED Severity: critical Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: roebel at ircam dot fr GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992
[Bug c++/34992] compiler produces wrong code when optimizing
--- Comment #1 from roebel at ircam dot fr 2008-01-27 22:55 --- Created an attachment (id=15034) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15034action=view) source file for reproducing the problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992
[Bug c++/34992] compiler produces wrong code when optimizing
--- Comment #2 from roebel at ircam dot fr 2008-01-27 22:57 --- Created an attachment (id=15035) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15035action=view) script to compile the source and reproduce the problem -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992
[Bug middle-end/34992] compiler produces wrong code when optimizing
--- Comment #4 from roebel at ircam dot fr 2008-01-27 23:05 --- yes indeed, that fixes the problem. now, does that mean holding double values in a set is not possible? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992
[Bug middle-end/34992] compiler produces wrong code when optimizing
--- Comment #6 from roebel at ircam dot fr 2008-01-28 00:14 --- Andrew, while -ffloat-store fixes the problem, this solution is obviously not acceptable. Moreover, here the problem is not that I compare floats using = the problem is that std::setdouble::insert(double) compares set elements using std::lessdouble which in this case just does not work. Now using std::lessdouble for a set does not seem not carefull to me, be cause this is the default. So the default is nonsense ? I fixed the problem by means of using std::setdouble,dless with dless being struct dless{ typedef double first_argument_type; typedef double second_argument_type; typedef bool result_type; volatile double d1; volatile double d2; bool operator()(double in1, double in2) { d1 = in1; d2 = in2; return d1 d2; } }; which proves that the problem is in the std::less operator and not in my code. In fact this even means that std::lessdouble and std::greaterdouble and all their company is useless because you can never be sure what they are comparing. So many of the functions of the std::library may fail!! Am I wrong, here ? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34992
[Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
--- Comment #5 from roebel at ircam dot fr 2006-03-22 11:13 --- Created an attachment (id=11090) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11090action=view) Results file for testcase As you requested I provide a testcase. It consists of 2 shell scripts that run the different compilers and then run the testcase. The testcase has two cases and two compilation modes: switch 1 compiled with -DHAND it gives hand optimized pointer only version compiled with -DMATMTL it gives the equivalent expression templates version switch 2 compiled with -DBENCH=1 it calculates an addition of three vectors compiled with -DBENCH=2 it calculates an addition of three vectors with some scalar multiplications The name of the excutable will indicate the experiment by two final characters H1 stands for hand optimized first benchmark, M2 stands for matmtl second benchmark ... The two scripts comp.sh and master.sh run the whole experiment: comp.sh runs the experiment for a single compiler and a user supplied set of vector sizes. Note, that each experiment always uses 1 vector element operations. By means of the vector size the amount of overhead can be controlled. master.sh runs comp.sh with a single compiler and the vector size arguments 5 and 1000 Results are produced with ./master.sh 21 | tee mout egrep #|user mout First result is that gcc 4.1.0 with --param salias-max-implicit-fields=50 is a real success. As you see the compile time does not change at least for this testcase but the performance is identical to the pointer only case second result is that for gcc 4.1.0 with default parameter set we get performance worse then gcc 4.0.2 especially for small vectors (large overhead). The larger the vectors become the more gcc 4.1.0 approaches 4.0.2 ### # g++ 4.0.2 the reference ### #compile times user0m0.702s user0m0.697s user0m1.066s user0m1.077s #run times : vector size 5 # benchmarkredH1 user0m0.295s # benchmarkredM1 user0m0.307s # benchmarkredH2 user0m0.381s # benchmarkredM2 user0m0.412s #run times : vector size 1000 # benchmarkredH1 user0m0.230s # benchmarkredM1 user0m0.243s # benchmarkredH2 user0m0.287s # benchmarkredM2 user0m0.370s # g++ 4.1.0 default ### #compile times user0m0.747s user0m0.752s user0m1.211s user0m1.227s #run times : vector size 5 # benchmarkredH1 user0m0.264s # benchmarkredM1 user0m0.519s # benchmarkredH2 user0m0.347s # benchmarkredM2 user0m1.211s #run times : vector size 1000 # benchmarkredH1 user0m0.222s # benchmarkredM1 user0m0.286s # benchmarkredH2 user0m0.298s # benchmarkredM2 user0m0.375s # g++ 4.1.0 salias=50 ### #compile times user0m0.753s user0m0.741s user0m1.225s user0m1.239s #run times : vector size 5 # benchmarkredH1 user0m0.262s # benchmarkredM1 user0m0.307s # benchmarkredH2 user0m0.344s # benchmarkredM2 user0m0.313s #run times : vector size 1000 # benchmarkredH1 user0m0.223s # benchmarkredM1 user0m0.234s # benchmarkredH2 user0m0.299s # benchmarkredM2 user0m0.260s -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
[Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
--- Comment #6 from roebel at ircam dot fr 2006-03-22 11:14 --- Created an attachment (id=11091) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11091action=view) master shell script for comments see 11090: Results file for testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
[Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
--- Comment #7 from roebel at ircam dot fr 2006-03-22 11:15 --- Created an attachment (id=11092) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11092action=view) single experiment shell script for comments see 11090: Results file for testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
[Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
--- Comment #8 from roebel at ircam dot fr 2006-03-22 11:16 --- Created an attachment (id=11093) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11093action=view) testcase source file for comments see 11090: Results file for testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
[Bug c++/26788] optimization of expression templates not as performant as g++ 4.0.2
--- Comment #10 from roebel at ircam dot fr 2006-03-22 11:55 --- Not that I understand what you just said, but, I wanted to mention, that in contrast to my initial email the data I just sent indicates a small performance penalty of about 25% for g++ 4.0.2 for large vectors on a pentium 4 (that are the results I've sent) while there is no such penalty for large vectors on a pentium m. On a pentium m g++ 4.0.2 works as well as g++ 4.1.0 on pentium 4 with the --param salias...=50. Unfortunately, I dont have gcc 4.1.0 on my pentium m machine thanks, Axel -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788
[Bug c++/26788] New: optimization of expression templates not as performant as g++ 4.0.2
Hi, I just installed gcc 4.1.0 to compile my template expression matrix arithmetric library (a la Blitz). I recently did benchmarks with g++ 3.4.4 and 4.0.2 an I was pretty much impressed that g++ 4.0.2 managed to optimize the expressions such that I obtained performance nearly twice as fast as with g++ 3.4.4, and even better the performance was the same as my hand coded pointer only implementation. I was rather happy with this result. It seems that the handling of pointer arrays that are stored in a struct that represents the expression has been significantly improved. Now, the downside. I tried 4.1.0 and I noticed that the performance dropped down too a level even worse than gcc 3.4.4. I wondered about the reason and scanned the optimization parameters. I found salias-max-implicit-fields with a default value of 5. I guessed that might be the reason and increased the value to 50. With this value I've got back the impressive performance of g++ 4.0.2. I wonder why the default value has been set so low that apparently it cripples the optimizer to a level of optimization consierably below what has been achieved with g++ 4.0.2 (where this option does not exist). Does this option negatively affects performance elsewhere? If not it seems to me that a default value that resembles the settings in gcc 4.0.2 would be more sensible. Kind regards, and thanks anyway for this great compiler suite. Axel -- Summary: optimization of expression templates not as performant as g++ 4.0.2 Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: roebel at ircam dot fr GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26788