[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 Xi Ruoyao changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #23 from Xi Ruoyao --- So fixed.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #22 from GCC Commits --- The master branch has been updated by LuluCheng : https://gcc.gnu.org/g:8f0ff6b998748f3581e0f06e3108193866b1209d commit r14-9824-g8f0ff6b998748f3581e0f06e3108193866b1209d Author: Lulu Cheng Date: Tue Apr 2 14:29:08 2024 +0800 LoongArch: Set default alignment for functions jumps and loops [PR112919]. Xi Ruoyao set the alignment rules under LA464 in commit r14-1839, but the macro ASM_OUTPUT_ALIGN_WITH_NOP was removed in R14-4674, which affected the alignment rules. So I set different aligns on LA464 and LA664 again to test the performance of spec2006, and modify the alignment based on the test results. gcc/ChangeLog: PR target/112919 * config/loongarch/loongarch-def.cc (la664_align): Newly defined function that sets alignment rules under the LA664 microarchitecture. * config/loongarch/loongarch-opts.cc (loongarch_target_option_override): If not optimizing for size, set the default alignment to what the target wants. * config/loongarch/loongarch-tune.h (struct loongarch_align): Add new member variables jump and loop.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #21 from chenglulu --- (In reply to Xi Ruoyao from comment #20) > (In reply to chenglulu from comment #19) > > (In reply to Xi Ruoyao from comment #18) > > > (In reply to chenglulu from comment #17) > > > > > > > The results of spec2006 on LA464 are: > > > > -falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=16 > > > > > > Would you send a patch for them or prefer I to do it? > > > > I'll send a patch tomorrow. > > Ping. > > I'd like to do another system rebuild after this patch lands for verifying > GCC 14. Oh sorry, I'm waiting for yujie's patch, just merged today. I'll send this align patch tomorrow.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #20 from Xi Ruoyao --- (In reply to chenglulu from comment #19) > (In reply to Xi Ruoyao from comment #18) > > (In reply to chenglulu from comment #17) > > > > > The results of spec2006 on LA464 are: > > > -falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=16 > > > > Would you send a patch for them or prefer I to do it? > > I'll send a patch tomorrow. Ping. I'd like to do another system rebuild after this patch lands for verifying GCC 14.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #19 from chenglulu --- (In reply to Xi Ruoyao from comment #18) > (In reply to chenglulu from comment #17) > > > The results of spec2006 on LA464 are: > > -falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=16 > > Would you send a patch for them or prefer I to do it? I'll send a patch tomorrow.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #18 from Xi Ruoyao --- (In reply to chenglulu from comment #17) > The results of spec2006 on LA464 are: > -falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=16 Would you send a patch for them or prefer I to do it?
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #17 from chenglulu --- (In reply to Xi Ruoyao from comment #15) > > Hi,Ruoyao: > > > > The results of spec2006 on 3A6000 were obtained, I removed the more > > volatile > > test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32 > > -falign-lables=4' this set of parameters got the highest score. This is the > > same combination of parameters as the coremark tested by Xu Chenghua. > > > > The test of the 3A5000 will also be completed around the 15th of this month, > > so I want to change the code after the test results of the 3a5000 are out. > > What do you think? > > Ok to me. > > I'm getting some different results on LA664: > > 22031.284424 Compiler flags : -O2 -falign-labels=4 -falign-functions=8 > -falign-loops=8 -falign-jumps=32 -DPERFORMANCE_RUN=1 -lrt > > vs the "best" one: > > 22075.055188 Compiler flags : -O2 -falign-labels=4 -falign-functions=32 > -falign-loops=16 -falign-jumps=8 -DPERFORMANCE_RUN=1 -lrt > > maybe such a 0.1% difference is some random fluctuation, or hardware or > kernel configuration difference anyway. The results of spec2006 on LA464 are: -falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=16
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #16 from chenglulu --- (In reply to Xi Ruoyao from comment #15) > > Hi,Ruoyao: > > > > The results of spec2006 on 3A6000 were obtained, I removed the more > > volatile > > test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32 > > -falign-lables=4' this set of parameters got the highest score. This is the > > same combination of parameters as the coremark tested by Xu Chenghua. > > > > The test of the 3A5000 will also be completed around the 15th of this month, > > so I want to change the code after the test results of the 3a5000 are out. > > What do you think? > > Ok to me. > > I'm getting some different results on LA664: > > 22031.284424 Compiler flags : -O2 -falign-labels=4 -falign-functions=8 > -falign-loops=8 -falign-jumps=32 -DPERFORMANCE_RUN=1 -lrt > > vs the "best" one: > > 22075.055188 Compiler flags : -O2 -falign-labels=4 -falign-functions=32 > -falign-loops=16 -falign-jumps=8 -DPERFORMANCE_RUN=1 -lrt > > maybe such a 0.1% difference is some random fluctuation, or hardware or > kernel configuration difference anyway. It's also possible that I'll find a few more machines to test the coremark score.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #15 from Xi Ruoyao --- > Hi,Ruoyao: > > The results of spec2006 on 3A6000 were obtained, I removed the more volatile > test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32 > -falign-lables=4' this set of parameters got the highest score. This is the > same combination of parameters as the coremark tested by Xu Chenghua. > > The test of the 3A5000 will also be completed around the 15th of this month, > so I want to change the code after the test results of the 3a5000 are out. > What do you think? Ok to me. I'm getting some different results on LA664: 22031.284424 Compiler flags : -O2 -falign-labels=4 -falign-functions=8 -falign-loops=8 -falign-jumps=32 -DPERFORMANCE_RUN=1 -lrt vs the "best" one: 22075.055188 Compiler flags : -O2 -falign-labels=4 -falign-functions=32 -falign-loops=16 -falign-jumps=8 -DPERFORMANCE_RUN=1 -lrt maybe such a 0.1% difference is some random fluctuation, or hardware or kernel configuration difference anyway.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #14 from chenglulu --- (In reply to chenglulu from comment #13) > (In reply to Xi Ruoyao from comment #9) > > (In reply to chenglulu from comment #8) > > > (In reply to Xi Ruoyao from comment #7) > > > > Any update? :) > > > > > > Well, I haven't run it yet. Since this does not have a big impact on the > > > spec score, I am currently testing it on a single-channel machine, so the > > > test time will be longer. > > > I will reply here as soon as the results are available. > > > > Can we determine on LA664 if the current default alignment is better than > > not aligning at all? Coremarks results suggest the current default is even > > worse than not aligning, but arguably Coremarks is far different from real > > workloads. However if the current default is not better than not aligning > > (or the difference is only marginal and is likely covered up by some random > > fluctuation) we can disable the aligning for LA664. > > > > (Maybe we and the HW engineers have done some repetitive work or even some > > work cancelling each other out :(. ) > > The results of spec2006 on 3A6000 were obtained, I removed the more volatile > test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32 > -falign-lables=4' this set of parameters got the highest score. This is the > same combination of parameters as the coremark tested by Xu Chenghua. Hi,Ruoyao: The test of the 3a5000 will also be completed around the 15th of this month, so I want to change the code after the test results of the 3a5000 are out. What do you think?(In reply to Xi Ruoyao from comment #9) > (In reply to chenglulu from comment #8) > > (In reply to Xi Ruoyao from comment #7) > > > Any update? :) > > > > Well, I haven't run it yet. Since this does not have a big impact on the > > spec score, I am currently testing it on a single-channel machine, so the > > test time will be longer. > > I will reply here as soon as the results are available. > > Can we determine on LA664 if the current default alignment is better than > not aligning at all? Coremarks results suggest the current default is even > worse than not aligning, but arguably Coremarks is far different from real > workloads. However if the current default is not better than not aligning > (or the difference is only marginal and is likely covered up by some random > fluctuation) we can disable the aligning for LA664. Hi,Ruoyao: The results of spec2006 on 3A6000 were obtained, I removed the more volatile test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32 -falign-lables=4' this set of parameters got the highest score. This is the same combination of parameters as the coremark tested by Xu Chenghua. The test of the 3A5000 will also be completed around the 15th of this month, so I want to change the code after the test results of the 3a5000 are out. What do you think?
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #13 from chenglulu --- (In reply to Xi Ruoyao from comment #9) > (In reply to chenglulu from comment #8) > > (In reply to Xi Ruoyao from comment #7) > > > Any update? :) > > > > Well, I haven't run it yet. Since this does not have a big impact on the > > spec score, I am currently testing it on a single-channel machine, so the > > test time will be longer. > > I will reply here as soon as the results are available. > > Can we determine on LA664 if the current default alignment is better than > not aligning at all? Coremarks results suggest the current default is even > worse than not aligning, but arguably Coremarks is far different from real > workloads. However if the current default is not better than not aligning > (or the difference is only marginal and is likely covered up by some random > fluctuation) we can disable the aligning for LA664. > > (Maybe we and the HW engineers have done some repetitive work or even some > work cancelling each other out :(. ) The results of spec2006 on 3A6000 were obtained, I removed the more volatile test items, '-falign-loops=8 -falign-functions=8 -falign-jumps=32 -falign-lables=4' this set of parameters got the highest score. This is the same combination of parameters as the coremark tested by Xu Chenghua.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #12 from chenglulu --- (In reply to Xi Ruoyao from comment #11) > (In reply to chenglulu from comment #10) > > (In reply to Xi Ruoyao from comment #9) > > > (In reply to chenglulu from comment #8) > > > > (In reply to Xi Ruoyao from comment #7) > > > > > Any update? :) > > > > > > > > Well, I haven't run it yet. Since this does not have a big impact on the > > > > spec score, I am currently testing it on a single-channel machine, so > > > > the > > > > test time will be longer. > > > > I will reply here as soon as the results are available. > > > > > > Can we determine on LA664 if the current default alignment is better than > > > not aligning at all? Coremarks results suggest the current default is > > > even > > > worse than not aligning, but arguably Coremarks is far different from real > > > workloads. However if the current default is not better than not aligning > > > (or the difference is only marginal and is likely covered up by some > > > random > > > fluctuation) we can disable the aligning for LA664. > > > > > > (Maybe we and the HW engineers have done some repetitive work or even some > > > work cancelling each other out :(. ) > > On March 8th I should be able to get the test results on the 3A6000 machine, > > I need to judge the fluctuation of the spec and then let's see if the > > default alignment is set? > > I just mean if we cannot get a decisive result before GCC 14 we may just > turn off alignment. But if we can get a decisive result as expected in Mar > we can just use the best we'll find. Well, the results should be available before GCC14 is released. It also seems that the setting of 3A5000 needs to be changed, because the value of '-falign-labels' was affected by the macro ASM_OUTPUT_ALIGN_WITH_NOP in the previous test.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #11 from Xi Ruoyao --- (In reply to chenglulu from comment #10) > (In reply to Xi Ruoyao from comment #9) > > (In reply to chenglulu from comment #8) > > > (In reply to Xi Ruoyao from comment #7) > > > > Any update? :) > > > > > > Well, I haven't run it yet. Since this does not have a big impact on the > > > spec score, I am currently testing it on a single-channel machine, so the > > > test time will be longer. > > > I will reply here as soon as the results are available. > > > > Can we determine on LA664 if the current default alignment is better than > > not aligning at all? Coremarks results suggest the current default is even > > worse than not aligning, but arguably Coremarks is far different from real > > workloads. However if the current default is not better than not aligning > > (or the difference is only marginal and is likely covered up by some random > > fluctuation) we can disable the aligning for LA664. > > > > (Maybe we and the HW engineers have done some repetitive work or even some > > work cancelling each other out :(. ) > On March 8th I should be able to get the test results on the 3A6000 machine, > I need to judge the fluctuation of the spec and then let's see if the > default alignment is set? I just mean if we cannot get a decisive result before GCC 14 we may just turn off alignment. But if we can get a decisive result as expected in Mar we can just use the best we'll find.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #10 from chenglulu --- (In reply to Xi Ruoyao from comment #9) > (In reply to chenglulu from comment #8) > > (In reply to Xi Ruoyao from comment #7) > > > Any update? :) > > > > Well, I haven't run it yet. Since this does not have a big impact on the > > spec score, I am currently testing it on a single-channel machine, so the > > test time will be longer. > > I will reply here as soon as the results are available. > > Can we determine on LA664 if the current default alignment is better than > not aligning at all? Coremarks results suggest the current default is even > worse than not aligning, but arguably Coremarks is far different from real > workloads. However if the current default is not better than not aligning > (or the difference is only marginal and is likely covered up by some random > fluctuation) we can disable the aligning for LA664. > > (Maybe we and the HW engineers have done some repetitive work or even some > work cancelling each other out :(. ) On March 8th I should be able to get the test results on the 3A6000 machine, I need to judge the fluctuation of the spec and then let's see if the default alignment is set? In addition, I also tested it on the 3A5000 again, and the results will be available around March 15th. The conclusion of coremark from our team leader Xu Chenghua is that '-falign-labels' have a regular effect on the performance of coremark, and when the value of '-falign-labels' is greater than 4 bytes, the performance decreases significantly.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #9 from Xi Ruoyao --- (In reply to chenglulu from comment #8) > (In reply to Xi Ruoyao from comment #7) > > Any update? :) > > Well, I haven't run it yet. Since this does not have a big impact on the > spec score, I am currently testing it on a single-channel machine, so the > test time will be longer. > I will reply here as soon as the results are available. Can we determine on LA664 if the current default alignment is better than not aligning at all? Coremarks results suggest the current default is even worse than not aligning, but arguably Coremarks is far different from real workloads. However if the current default is not better than not aligning (or the difference is only marginal and is likely covered up by some random fluctuation) we can disable the aligning for LA664. (Maybe we and the HW engineers have done some repetitive work or even some work cancelling each other out :(. )
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #8 from chenglulu --- (In reply to Xi Ruoyao from comment #7) > Any update? :) Well, I haven't run it yet. Since this does not have a big impact on the spec score, I am currently testing it on a single-channel machine, so the test time will be longer. I will reply here as soon as the results are available.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #7 from Xi Ruoyao --- Any update? :)
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #6 from chenglulu --- Hi,Ruoyao: I am testing the spec2006 scores when the parameters 'align-loops', 'align-jumps', 'align-functions', and 'align-labels' are '1', '8', '16', and '32' respectively. However, the test was suspended due to the company's power maintenance last week, and it will take some time to retest.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #5 from chenglulu --- (In reply to Xi Ruoyao from comment #4) > Lulu: can you help to run some other benchmarks like SPEC (I don't have an > access to it) and update these values for LA464 and LA664? No problem, this is what I should do. However, there are many parameter combinations, so the time may be longer.
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #4 from Xi Ruoyao --- On LA664: 19970.709626 -falign-labels=4 -falign-functions=16 -falign-loops=64 -falign-jumps=16 19970.709626 -falign-labels=4 -falign-functions=32 -falign-loops=32 -falign-jumps=16 19976.028765 -falign-labels=4 -falign-functions=64 -falign-loops=32 -falign-jumps=16 19978.689398 -falign-labels=4 -falign-functions=8 -falign-loops=16 -falign-jumps=16 19997.333689 -falign-labels=4 -falign-functions=4 -falign-loops=32 -falign-jumps=32 20009.337691 -falign-labels=4 -falign-functions=32 -falign-loops=8 -falign-jumps=32 20009.337691 -falign-labels=4 -falign-functions=4 -falign-loops=32 -falign-jumps=16 20010.672359 -falign-labels=4 -falign-functions=64 -falign-loops=8 -falign-jumps=32 20050.795348 -falign-labels=4 -falign-functions=32 -falign-loops=8 -falign-jumps=64 20065.547455 -falign-labels=4 -falign-functions=64 -falign-loops=8 -falign-jumps=64 So it seems -falign-labels > 4 is just harming. And interestingly a high -falign-functions / -falign-jumps helps Coremark...
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #3 from Xi Ruoyao --- Top 10 configurations on LA464 for Coremark: 12757.542897 -falign-labels=4 -falign-functions=8 -falign-loops=32 -falign-jumps=32 12763.241863 -falign-labels=4 -falign-functions=64 -falign-loops=32 -falign-jumps=16 12764.871075 -falign-labels=4 -falign-functions=8 -falign-loops=8 -falign-jumps=64 12766.500702 -falign-labels=4 -falign-functions=16 -falign-loops=32 -falign-jumps=16 12777.919755 -falign-labels=4 -falign-functions=64 -falign-loops=8 -falign-jumps=64 12779.552716 -falign-labels=4 -falign-functions=8 -falign-loops=8 -falign-jumps=32 12799.180852 -falign-labels=4 -falign-functions=64 -falign-loops=8 -falign-jumps=16 12808.197246 -falign-labels=4 -falign-functions=32 -falign-loops=8 -falign-jumps=64 12823.800975 -falign-labels=4 -falign-functions=32 -falign-loops=32 -falign-jumps=16 12828.736369 -falign-labels=4 -falign-functions=64 -falign-loops=8 -falign-jumps=32
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 --- Comment #2 from Xi Ruoyao --- On LA464: 13095 with GCC 13.2.0 The best I've got is: 12639 with GCC 14.0.0 + -falign-loops=8 -falign-labels=4 -falign-jumps=4 -falign-functions=16 and I cannot really explain why this is the best. With the default: 12592 with GCC 14.0.0 So on LA464 the default seems not so bad...
[Bug target/112919] LoongArch: Alignments in tune parameters are not precise and they regress performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112919 Xi Ruoyao changed: What|Removed |Added See Also||https://github.com/loongson ||-community/discussions/issu ||es/23 CC||chenglulu at loongson dot cn, ||xen0n at gentoo dot org Target||loongarch64-*-* --- Comment #1 from Xi Ruoyao --- Jia Jie reported a huge performance regression running Coremarks from GCC 13 to 14, and I can confirm it on LA664. It seems a part of the regression is caused by over-aligning the labels. On a LA664 with different configurations I get Coremarks Iterations/Sec values (the larger the better): 21120 with GCC 13.2.0 18320 with GCC 14.0.0 (with the default: -falign-labels=16 -falign-functions=32) 19972 with GCC 14.0.0 + -falign-loops=32 -falign-labels=4 -falign-jumps=4 -falign-functions=32 (the best I've got) 19938 with GCC 14.0.0 + -falign-loops=32 -falign-labels=4 -falign-jumps=4 -falign-functions=16 19964 with GCC 14.0.0 + -falign-loops=32 -falign-labels=4 -falign-jumps=4 -falign-functions=64 19276 with GCC 14.0.0 + -falign-loops=32 -falign-labels=8 -falign-jumps=4 -falign-functions=32 19674 with GCC 14.0.0 + -falign-loops=32 -falign-labels=4 -falign-jumps=8 -falign-functions=32 19752 with GCC 14.0.0 + -falign-loops=16 -falign-labels=4 -falign-jumps=4 -falign-functions=32 19922 with GCC 14.0.0 + -falign-loops=64 -falign-labels=4 -falign-jumps=4 -falign-functions=32 Lulu: can you help to run some other benchmarks like SPEC (I don't have an access to it) and update these values for LA464 and LA664?