Re: Performance problem with gcc 4.9.2-3 on 64 bit

2015-02-28 Thread Bengt Larsson
Marco Atzeri wrote:
>On 2/27/2015 5:49 PM, Bengt Larsson wrote:
>> Below are two benchmarks that explore maximum floating point
>> performance. loopm6 is double precision floating point and loopm6fp is
>> parallell single-precision. They are manually unrolled multiply-add
>> loops.
>>
>> I used to reach 2.8 and 11 GFlops on these. Now I only get
>> 2 and 6.
>>
>> If you explore the inner loop with gcc -O2 -S you can see that it seems
>> to use few registers.
>>
>> If you run them, there is a parameter expected. I use 3 - 5.
>>
>> gcc 4.9.2-3 on 64-bit. I use gcc -O2.
>>
>
>May be
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967

I think that was longer ago. On the version before 4.9.* on Cygwin I got
full speed. 

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Performance problem with gcc 4.9.2-3 on 64 bit

2015-02-27 Thread Marco Atzeri

On 2/27/2015 5:49 PM, Bengt Larsson wrote:

Below are two benchmarks that explore maximum floating point
performance. loopm6 is double precision floating point and loopm6fp is
parallell single-precision. They are manually unrolled multiply-add
loops.

I used to reach 2.8 and 11 GFlops on these. Now I only get
2 and 6.

If you explore the inner loop with gcc -O2 -S you can see that it seems
to use few registers.

If you run them, there is a parameter expected. I use 3 - 5.

gcc 4.9.2-3 on 64-bit. I use gcc -O2.



May be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Performance problem with gcc 4.9.2-3 on 64 bit

2015-02-27 Thread Bengt Larsson
Below are two benchmarks that explore maximum floating point
performance. loopm6 is double precision floating point and loopm6fp is
parallell single-precision. They are manually unrolled multiply-add
loops.

I used to reach 2.8 and 11 GFlops on these. Now I only get
2 and 6.

If you explore the inner loop with gcc -O2 -S you can see that it seems
to use few registers.

If you run them, there is a parameter expected. I use 3 - 5.

gcc 4.9.2-3 on 64-bit. I use gcc -O2.


loopm6.c
Description: Binary data


loopm6fp.c
Description: Binary data


timers.h
Description: Binary data
--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple