>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>change and did suggest a different approach for -mtune=generic.

Something must have been broken for the unaligned load splitting in generic 
mode.

While we lose 1.3% on CFP2006 in geomean by splitting unaligned loads for 
-mtune=bdver1, splitting
unaligned loads in generic mode is KILLING us:

For 459.GemsFDTD (ref) on Bulldozer,
 -Ofast -mavx -mno-avx256-split-unaligned-load:   480s
-Ofast -mavx                                                       :    2527s

So, splitting unaligned loads results in the program to run 5~6 times slower!

For 434.zeusmp train run
 -Ofast -mavx -mno-avx256-split-unaligned-load:   32.5s
-Ofast -mavx                                                       :    106s

Other tests are on-going!


Changpeng.


Reply via email to