RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

2021-09-30 Thread Mekhanoshin, Stanislav
[AMD Official Use Only]

I assume some of the newly rematerialized instructions caused perf drops. 
Probably some very specific ones. I would appreciate if you could point them to 
me.
In addition I believe I would need to have a linked or optimized bitcode to 
feed into llc.

Stas

-Original Message-
From: Maxim Kuvyrkov 
Sent: Wednesday, September 22, 2021 12:06
To: Mekhanoshin, Stanislav 
Cc: linaro-toolchain 
Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
rematerialization of virtual reg uses

[CAUTION: External Email]

Hi Stanislav,

That's fair; I or someone from Linaro will try to analyze this and follow up 
here.

On a more general note, what info would you like to see in these benchmarking 
regression reports?

Thanks,

--
Maxim Kuvyrkov
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Ccb8b53f8e69f4fa8b2d508d97dfc017a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679343573433629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FP9FReEFKUi0Pvr%2FB1K3Z1VB%2BL2EuU7GqqZx2XOnawE%3D&reserved=0


> On Sep 22, 2021, at 9:40 PM, Mekhanoshin, Stanislav 
>  wrote:
>
> [AMD Official Use Only]
>
> Hm... I'd really like to help, but I do not think I can do anything with 
> megabytes of code in an asm which I do not understand and tons of differences 
> in 48 asm files.
> What I can see there is overall less spilling code which was the intent in 
> the first place: hmmer has 4 less spill opcodes overall and sphinx has 27 
> less of them.
> I doubt I could say much more without someone pointing to the actual root 
> cause.
>
> Stas
>
> -Original Message-
> From: Maxim Kuvyrkov 
> Sent: Wednesday, September 22, 2021 5:16
> To: Mekhanoshin, Stanislav 
> Cc: linaro-toolchain 
> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
> rematerialization of virtual reg uses
>
> [CAUTION: External Email]
>
> Hi Stanislav,
>
> Attached is a tarball with -save-temps output (pre-processed source and 
> generated assembly) for first-bad run (your commit) and last-good run 
> (immediate parent of your commit).
>
> --
> Maxim Kuvyrkov
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Ccb8b53f8e69f4fa8b2d508d97dfc017a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679343573433629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FP9FReEFKUi0Pvr%2FB1K3Z1VB%2BL2EuU7GqqZx2XOnawE%3D&reserved=0
>
>> On 20 Sep 2021, at 23:15, Mekhanoshin, Stanislav 
>>  wrote:
>>
>> [AMD Official Use Only]
>>
>> Thanks for letting me know. Some regressions are inevitable, however do you 
>> happen to have any analysis and dumps? I myself do not understand ARM ISA 
>> well...
>>
>> Stas
>>
>> -----Original Message-
>> From: Maxim Kuvyrkov 
>> Sent: Wednesday, September 15, 2021 5:52
>> To: Mekhanoshin, Stanislav 
>> Cc: linaro-toolchain 
>> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
>> rematerialization of virtual reg uses
>>
>> [CAUTION: External Email]
>>
>> Hi Stanislav,
>>
>> FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit 
>> ARM at -O2 and -O3 optimization levels.
>>
>> --
>> Maxim Kuvyrkov
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Ccb8b53f8e69f4fa8b2d508d97dfc017a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679343573433629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FP9FReEFKUi0Pvr%2FB1K3Z1VB%2BL2EuU7GqqZx2XOnawE%3D&reserved=0
>>
>
>

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

2021-09-30 Thread Mekhanoshin, Stanislav
[AMD Official Use Only]

Hm... I'd really like to help, but I do not think I can do anything with 
megabytes of code in an asm which I do not understand and tons of differences 
in 48 asm files.
What I can see there is overall less spilling code which was the intent in the 
first place: hmmer has 4 less spill opcodes overall and sphinx has 27 less of 
them.
I doubt I could say much more without someone pointing to the actual root cause.

Stas

-Original Message-
From: Maxim Kuvyrkov 
Sent: Wednesday, September 22, 2021 5:16
To: Mekhanoshin, Stanislav 
Cc: linaro-toolchain 
Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
rematerialization of virtual reg uses

[CAUTION: External Email]

Hi Stanislav,

Attached is a tarball with -save-temps output (pre-processed source and 
generated assembly) for first-bad run (your commit) and last-good run 
(immediate parent of your commit).

--
Maxim Kuvyrkov
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Cc5e78e4205f24cc6bccd08d97dc2d7af%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679098557282668%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7RylLjpy8kcOqRC2kxQLIQ8Z8IY9VN%2FSoe3%2BUQii0c8%3D&reserved=0

> On 20 Sep 2021, at 23:15, Mekhanoshin, Stanislav 
>  wrote:
>
> [AMD Official Use Only]
>
> Thanks for letting me know. Some regressions are inevitable, however do you 
> happen to have any analysis and dumps? I myself do not understand ARM ISA 
> well...
>
> Stas
>
> -Original Message-
> From: Maxim Kuvyrkov 
> Sent: Wednesday, September 15, 2021 5:52
> To: Mekhanoshin, Stanislav 
> Cc: linaro-toolchain 
> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
> rematerialization of virtual reg uses
>
> [CAUTION: External Email]
>
> Hi Stanislav,
>
> FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit 
> ARM at -O2 and -O3 optimization levels.
>
> --
> Maxim Kuvyrkov
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Cc5e78e4205f24cc6bccd08d97dc2d7af%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679098557282668%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7RylLjpy8kcOqRC2kxQLIQ8Z8IY9VN%2FSoe3%2BUQii0c8%3D&reserved=0
>


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

2021-09-30 Thread Mekhanoshin, Stanislav
[AMD Official Use Only]

There are actually couple things worth to try if that is easy:

https://reviews.llvm.org/D109077
https://reviews.llvm.org/differential/diff/374324/

Both may slightly change spill weights and then spilling pattern.

Stas

-Original Message-
From: Mekhanoshin, Stanislav
Sent: Wednesday, September 22, 2021 12:09
To: Maxim Kuvyrkov 
Cc: linaro-toolchain 
Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
rematerialization of virtual reg uses

I assume some of the newly rematerialized instructions caused perf drops. 
Probably some very specific ones. I would appreciate if you could point them to 
me.
In addition I believe I would need to have a linked or optimized bitcode to 
feed into llc.

Stas

-Original Message-
From: Maxim Kuvyrkov 
Sent: Wednesday, September 22, 2021 12:06
To: Mekhanoshin, Stanislav 
Cc: linaro-toolchain 
Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
rematerialization of virtual reg uses

[CAUTION: External Email]

Hi Stanislav,

That's fair; I or someone from Linaro will try to analyze this and follow up 
here.

On a more general note, what info would you like to see in these benchmarking 
regression reports?

Thanks,

--
Maxim Kuvyrkov
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Ccb8b53f8e69f4fa8b2d508d97dfc017a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679343573433629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FP9FReEFKUi0Pvr%2FB1K3Z1VB%2BL2EuU7GqqZx2XOnawE%3D&reserved=0


> On Sep 22, 2021, at 9:40 PM, Mekhanoshin, Stanislav 
>  wrote:
>
> [AMD Official Use Only]
>
> Hm... I'd really like to help, but I do not think I can do anything with 
> megabytes of code in an asm which I do not understand and tons of differences 
> in 48 asm files.
> What I can see there is overall less spilling code which was the intent in 
> the first place: hmmer has 4 less spill opcodes overall and sphinx has 27 
> less of them.
> I doubt I could say much more without someone pointing to the actual root 
> cause.
>
> Stas
>
> -Original Message-
> From: Maxim Kuvyrkov 
> Sent: Wednesday, September 22, 2021 5:16
> To: Mekhanoshin, Stanislav 
> Cc: linaro-toolchain 
> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
> rematerialization of virtual reg uses
>
> [CAUTION: External Email]
>
> Hi Stanislav,
>
> Attached is a tarball with -save-temps output (pre-processed source and 
> generated assembly) for first-bad run (your commit) and last-good run 
> (immediate parent of your commit).
>
> --
> Maxim Kuvyrkov
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Ccb8b53f8e69f4fa8b2d508d97dfc017a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679343573433629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FP9FReEFKUi0Pvr%2FB1K3Z1VB%2BL2EuU7GqqZx2XOnawE%3D&reserved=0
>
>> On 20 Sep 2021, at 23:15, Mekhanoshin, Stanislav 
>>  wrote:
>>
>> [AMD Official Use Only]
>>
>> Thanks for letting me know. Some regressions are inevitable, however do you 
>> happen to have any analysis and dumps? I myself do not understand ARM ISA 
>> well...
>>
>> Stas
>>
>> -----Original Message-
>> From: Maxim Kuvyrkov 
>> Sent: Wednesday, September 15, 2021 5:52
>> To: Mekhanoshin, Stanislav 
>> Cc: linaro-toolchain 
>> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
>> rematerialization of virtual reg uses
>>
>> [CAUTION: External Email]
>>
>> Hi Stanislav,
>>
>> FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit 
>> ARM at -O2 and -O3 optimization levels.
>>
>> --
>> Maxim Kuvyrkov
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Ccb8b53f8e69f4fa8b2d508d97dfc017a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679343573433629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FP9FReEFKUi0Pvr%2FB1K3Z1VB%2BL2EuU7GqqZx2XOnawE%3D&reserved=0
>>
>
>

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

2021-09-30 Thread Mekhanoshin, Stanislav
[AMD Official Use Only]

Thanks for letting me know. Some regressions are inevitable, however do you 
happen to have any analysis and dumps? I myself do not understand ARM ISA 
well...

Stas

-Original Message-
From: Maxim Kuvyrkov 
Sent: Wednesday, September 15, 2021 5:52
To: Mekhanoshin, Stanislav 
Cc: linaro-toolchain 
Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
rematerialization of virtual reg uses

[CAUTION: External Email]

Hi Stanislav,

FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit 
ARM at -O2 and -O3 optimization levels.

--
Maxim Kuvyrkov
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7C70fc4b555fa8419b283708d97847a8c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637673071485470682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bkObPKjWEcK%2FLKi4Vc0q0an1gwCHUmro6OUILcE4Qpg%3D&reserved=0

> On 15 Sep 2021, at 12:54, ci_not...@linaro.org wrote:
>
> After llvm commit 92c1fd19abb15bc68b1127a26137a69e033cdb39
> Author: Stanislav Mekhanoshin 
>
>Allow rematerialization of virtual reg uses
>
> the following benchmarks slowed down by more than 2%:
> - 456.hmmer slowed down by 6%
> - 482.sphinx3 slowed down by 3%
>
> Benchmark:
> Toolchain: Clang + Glibc + LLVM Linker
> Version: all components were built from their tip of trunk
> Target: arm-linux-gnueabihf
> Compiler flags: -O3 -marm
> Hardware: NVidia TK1 4x Cortex-A15
>
> This commit has regressed these CI configurations:
> - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2
> - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3
>
> First_bad build:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.l
> inaro.org%2Fjob%2Ftcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm
> -spec2k6-O3%2F18%2Fartifact%2Fartifacts%2Fbuild-92c1fd19abb15bc68b1127
> a26137a69e033cdb39%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.c
> om%7C70fc4b555fa8419b283708d97847a8c7%7C3dd8961fe4884e608e11a82d994e18
> 3d%7C0%7C0%7C637673071485470682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=
> GljZDVgW8760UCfr1qyngCxuopjepmdXUge33wLisCI%3D&reserved=0
> Last_good build:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.l
> inaro.org%2Fjob%2Ftcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm
> -spec2k6-O3%2F18%2Fartifact%2Fartifacts%2Fbuild-1d02a8bcd393ea9c50f021
> 2797059888efc78002%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.c
> om%7C70fc4b555fa8419b283708d97847a8c7%7C3dd8961fe4884e608e11a82d994e18
> 3d%7C0%7C0%7C637673071485470682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=
> taDN2Fi59j9%2B6szDRr6hdgJXqxI2z%2BTCDMvId6DkfmI%3D&reserved=0
> Baseline build:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.l
> inaro.org%2Fjob%2Ftcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm
> -spec2k6-O3%2F18%2Fartifact%2Fartifacts%2Fbuild-baseline%2F&data=0
> 4%7C01%7CStanislav.Mekhanoshin%40amd.com%7C70fc4b555fa8419b283708d9784
> 7a8c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637673071485470682%
> 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik
> 1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pwDR%2F%2Fov20nC%2BhfJfZHNyMLp1w
> u8x%2FJk235qJX3U8iU%3D&reserved=0
> Even more details:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.l
> inaro.org%2Fjob%2Ftcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm
> -spec2k6-O3%2F18%2Fartifact%2Fartifacts%2F&data=04%7C01%7CStanisla
> v.Mekhanoshin%40amd.com%7C70fc4b555fa8419b283708d97847a8c7%7C3dd8961fe
> 4884e608e11a82d994e183d%7C0%7C0%7C637673071485470682%7CUnknown%7CTWFpb
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%7C1000&sdata=KglfzfXsbBZvvY6Kmaxg54ula8EnSbSDUmONB%2Fm2V6k%3D&
> amp;reserved=0
>
> Reproduce builds:
> 
> mkdir investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39
> cd investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39
>
> # Fetch scripts
> git clone
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> linaro.org%2Ftoolchain%2Fjenkins-scripts&data=04%7C01%7CStanislav.
> Mekhanoshin%40amd.com%7C70fc4b555fa8419b283708d97847a8c7%7C3dd8961fe48
> 84e608e11a82d994e183d%7C0%7C0%7C637673071485470682%7CUnknown%7CTWFpbGZ
> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C1000&sdata=em95PA93GfEu29UFxbP3re%2Bgs1azu2%2BhT%2F96Axaq%2FvE
> %3D&reserved=0
>
> # Fetch manifests and test.sh script
> mkdir -p artifacts/manifests
> curl -o arti

Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

2021-09-30 Thread Maxim Kuvyrkov
Thanks, Stanislav,

FWIW, it will be, probably, easier for you to just rebuild the compiler, it is 
an x86_64-linux-gnu -> arm-linux-gnueabihf cross.  This link has the build log 
[1].

cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' 
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True 
-DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=ARM

Then compile the pre-processed source with plain -O2 or -O3 optimisation 
settings.

[1] 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm-spec2k6-O3/18/artifact/artifacts/build-92c1fd19abb15bc68b1127a26137a69e033cdb39/09-build_llvm-true/

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org

> On 24 Sep 2021, at 20:30, Mekhanoshin, Stanislav 
>  wrote:
> 
> [AMD Official Use Only]
> 
> I have reverted the whole change. There was yet another perf regression 
> report.
>  
> Stas
>  
> From: Mekhanoshin, Stanislav 
> Sent: Thursday, September 23, 2021 11:48
> To: Maxim Kuvyrkov 
> Cc: linaro-toolchain 
> Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
> rematerialization of virtual reg uses
>  
> Thanks. I see the reload. There shall not be extra pressure since that is the 
> whole idea, make pressure less. However, I see more spills in that specific 
> file, fast_algorithms.s if I get it right.
> Can I get the IR for it? Something to feed llc.
>  
> Stas
>  
> From: Maxim Kuvyrkov  
> Sent: Thursday, September 23, 2021 2:31
> To: Mekhanoshin, Stanislav 
> Cc: linaro-toolchain 
> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
> rematerialization of virtual reg uses
>  
> [CAUTION: External Email]
> 
> Thanks, Stanislav.
> 
> I’ve looked into profile dumps, and 456.hmmer’s hot loop get several 
> additional reloads.  E.g., "ldrr1, [sp, #84]” generates 203 additional 
> samples, which translates into 20 seconds of time just for that one 
> instruction.
> 
> See the attached profile dumps and the the screenshot with the hot loop 
> highlighted.
> 
> Maybe your patch increases register pressure too much?
> 
> Regards,
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 
> > On 22 Sep 2021, at 22:35, Mekhanoshin, Stanislav 
> >  wrote:
> >
> > [AMD Official Use Only]
> >
> > There are actually couple things worth to try if that is easy:
> >
> > https://reviews.llvm.org/D109077
> > https://reviews.llvm.org/differential/diff/374324/
> >
> > Both may slightly change spill weights and then spilling pattern.
> >
> > Stas
> >
> > -----Original Message-
> > From: Mekhanoshin, Stanislav
> > Sent: Wednesday, September 22, 2021 12:09
> > To: Maxim Kuvyrkov 
> > Cc: linaro-toolchain 
> > Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
> > rematerialization of virtual reg uses
> >
> > I assume some of the newly rematerialized instructions caused perf drops. 
> > Probably some very specific ones. I would appreciate if you could point 
> > them to me.
> > In addition I believe I would need to have a linked or optimized bitcode to 
> > feed into llc.
> >
> > Stas
> >
> > -Original Message-
> > From: Maxim Kuvyrkov 
> > Sent: Wednesday, September 22, 2021 12:06
> > To: Mekhanoshin, Stanislav 
> > Cc: linaro-toolchain 
> > Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
> > rematerialization of virtual reg uses
> >
> > [CAUTION: External Email]
> >
> > Hi Stanislav,
> >
> > That's fair; I or someone from Linaro will try to analyze this and follow 
> > up here.
> >
> > On a more general note, what info would you like to see in these 
> > benchmarking regression reports?
> >
> > Thanks,
> >
> > --
> > Maxim Kuvyrkov
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Ccb8b53f8e69f4fa8b2d508d97dfc017a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679343573433629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FP9FReEFKUi0Pvr%2FB1K3Z1VB%2BL2EuU7GqqZx2XOnawE%3D&reserved=0
> >
> >
> >> On Sep 22, 2021, at 9:40 PM, Mekhanoshin, Stanislav 
> >>  wrote:
> >>
> >> [AMD Official Use Only]
> >>
> >> Hm... I'd really like to help, but I do not think I can do anything with 
> >> megabytes of code in an asm which I do not understand and tons of 
> >> differences in 48 asm files.
> >> What I can see there is 

Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

2021-09-30 Thread Maxim Kuvyrkov
Thanks, Stanislav.

I’ve looked into profile dumps, and 456.hmmer’s hot loop get several additional 
reloads.  E.g., "ldrr1, [sp, #84]” generates 203 additional samples, which 
translates into 20 seconds of time just for that one instruction.

See the attached profile dumps and the the screenshot with the hot loop 
highlighted.

Maybe your patch increases register pressure too much?

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org

> On 22 Sep 2021, at 22:35, Mekhanoshin, Stanislav 
>  wrote:
> 
> [AMD Official Use Only]
> 
> There are actually couple things worth to try if that is easy:
> 
> https://reviews.llvm.org/D109077
> https://reviews.llvm.org/differential/diff/374324/
> 
> Both may slightly change spill weights and then spilling pattern.
> 
> Stas
> 
> -Original Message-
> From: Mekhanoshin, Stanislav
> Sent: Wednesday, September 22, 2021 12:09
> To: Maxim Kuvyrkov 
> Cc: linaro-toolchain 
> Subject: RE: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
> rematerialization of virtual reg uses
> 
> I assume some of the newly rematerialized instructions caused perf drops. 
> Probably some very specific ones. I would appreciate if you could point them 
> to me.
> In addition I believe I would need to have a linked or optimized bitcode to 
> feed into llc.
> 
> Stas
> 
> -Original Message-
> From: Maxim Kuvyrkov 
> Sent: Wednesday, September 22, 2021 12:06
> To: Mekhanoshin, Stanislav 
> Cc: linaro-toolchain 
> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
> rematerialization of virtual reg uses
> 
> [CAUTION: External Email]
> 
> Hi Stanislav,
> 
> That's fair; I or someone from Linaro will try to analyze this and follow up 
> here.
> 
> On a more general note, what info would you like to see in these benchmarking 
> regression reports?
> 
> Thanks,
> 
> --
> Maxim Kuvyrkov
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Ccb8b53f8e69f4fa8b2d508d97dfc017a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679343573433629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FP9FReEFKUi0Pvr%2FB1K3Z1VB%2BL2EuU7GqqZx2XOnawE%3D&reserved=0
> 
> 
>> On Sep 22, 2021, at 9:40 PM, Mekhanoshin, Stanislav 
>>  wrote:
>> 
>> [AMD Official Use Only]
>> 
>> Hm... I'd really like to help, but I do not think I can do anything with 
>> megabytes of code in an asm which I do not understand and tons of 
>> differences in 48 asm files.
>> What I can see there is overall less spilling code which was the intent in 
>> the first place: hmmer has 4 less spill opcodes overall and sphinx has 27 
>> less of them.
>> I doubt I could say much more without someone pointing to the actual root 
>> cause.
>> 
>> Stas
>> 
>> -Original Message-
>> From: Maxim Kuvyrkov 
>> Sent: Wednesday, September 22, 2021 5:16
>> To: Mekhanoshin, Stanislav 
>> Cc: linaro-toolchain 
>> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
>> rematerialization of virtual reg uses
>> 
>> [CAUTION: External Email]
>> 
>> Hi Stanislav,
>> 
>> Attached is a tarball with -save-temps output (pre-processed source and 
>> generated assembly) for first-bad run (your commit) and last-good run 
>> (immediate parent of your commit).
>> 
>> --
>> Maxim Kuvyrkov
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Ccb8b53f8e69f4fa8b2d508d97dfc017a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679343573433629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FP9FReEFKUi0Pvr%2FB1K3Z1VB%2BL2EuU7GqqZx2XOnawE%3D&reserved=0
>> 
>>> On 20 Sep 2021, at 23:15, Mekhanoshin, Stanislav 
>>>  wrote:
>>> 
>>> [AMD Official Use Only]
>>> 
>>> Thanks for letting me know. Some regressions are inevitable, however do you 
>>> happen to have any analysis and dumps? I myself do not understand ARM ISA 
>>> well...
>>> 
>>> Stas
>>> 
>>> -Original Message-
>>> From: Maxim Kuvyrkov 
>>> Sent: Wednesday, September 15, 2021 5:52
>>> To: Mekhanoshin, Stanislav 
>>> Cc: linaro-toolchain 
>>> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
>>> rematerialization of virtual reg uses
>>> 
>>> [CAUTION: 

Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

2021-09-30 Thread Maxim Kuvyrkov
Hi Stanislav,

Attached is a tarball with -save-temps output (pre-processed source and 
generated assembly) for first-bad run (your commit) and last-good run 
(immediate parent of your commit).

--
Maxim Kuvyrkov
https://www.linaro.org

> On 20 Sep 2021, at 23:15, Mekhanoshin, Stanislav 
>  wrote:
> 
> [AMD Official Use Only]
> 
> Thanks for letting me know. Some regressions are inevitable, however do you 
> happen to have any analysis and dumps? I myself do not understand ARM ISA 
> well...
> 
> Stas
> 
> -Original Message-
> From: Maxim Kuvyrkov 
> Sent: Wednesday, September 15, 2021 5:52
> To: Mekhanoshin, Stanislav 
> Cc: linaro-toolchain 
> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
> rematerialization of virtual reg uses
> 
> [CAUTION: External Email]
> 
> Hi Stanislav,
> 
> FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit 
> ARM at -O2 and -O3 optimization levels.
> 
> --
> Maxim Kuvyrkov
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7C70fc4b555fa8419b283708d97847a8c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637673071485470682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bkObPKjWEcK%2FLKi4Vc0q0an1gwCHUmro6OUILcE4Qpg%3D&reserved=0
> 


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

2021-09-22 Thread Maxim Kuvyrkov
Hi Stanislav,

That's fair; I or someone from Linaro will try to analyze this and follow up 
here.

On a more general note, what info would you like to see in these benchmarking 
regression reports?

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org


> On Sep 22, 2021, at 9:40 PM, Mekhanoshin, Stanislav 
>  wrote:
> 
> [AMD Official Use Only]
> 
> Hm... I'd really like to help, but I do not think I can do anything with 
> megabytes of code in an asm which I do not understand and tons of differences 
> in 48 asm files.
> What I can see there is overall less spilling code which was the intent in 
> the first place: hmmer has 4 less spill opcodes overall and sphinx has 27 
> less of them.
> I doubt I could say much more without someone pointing to the actual root 
> cause.
> 
> Stas
> 
> -Original Message-
> From: Maxim Kuvyrkov 
> Sent: Wednesday, September 22, 2021 5:16
> To: Mekhanoshin, Stanislav 
> Cc: linaro-toolchain 
> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
> rematerialization of virtual reg uses
> 
> [CAUTION: External Email]
> 
> Hi Stanislav,
> 
> Attached is a tarball with -save-temps output (pre-processed source and 
> generated assembly) for first-bad run (your commit) and last-good run 
> (immediate parent of your commit).
> 
> --
> Maxim Kuvyrkov
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Cc5e78e4205f24cc6bccd08d97dc2d7af%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679098557282668%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7RylLjpy8kcOqRC2kxQLIQ8Z8IY9VN%2FSoe3%2BUQii0c8%3D&reserved=0
> 
>> On 20 Sep 2021, at 23:15, Mekhanoshin, Stanislav 
>>  wrote:
>> 
>> [AMD Official Use Only]
>> 
>> Thanks for letting me know. Some regressions are inevitable, however do you 
>> happen to have any analysis and dumps? I myself do not understand ARM ISA 
>> well...
>> 
>> Stas
>> 
>> -----Original Message-
>> From: Maxim Kuvyrkov 
>> Sent: Wednesday, September 15, 2021 5:52
>> To: Mekhanoshin, Stanislav 
>> Cc: linaro-toolchain 
>> Subject: Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow 
>> rematerialization of virtual reg uses
>> 
>> [CAUTION: External Email]
>> 
>> Hi Stanislav,
>> 
>> FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit 
>> ARM at -O2 and -O3 optimization levels.
>> 
>> --
>> Maxim Kuvyrkov
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linaro.org%2F&data=04%7C01%7CStanislav.Mekhanoshin%40amd.com%7Cc5e78e4205f24cc6bccd08d97dc2d7af%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679098557282668%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7RylLjpy8kcOqRC2kxQLIQ8Z8IY9VN%2FSoe3%2BUQii0c8%3D&reserved=0
>> 
> 
> 

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [TCWG CI] 456.hmmer slowed down by 6% after llvm: Allow rematerialization of virtual reg uses

2021-09-15 Thread Maxim Kuvyrkov
Hi Stanislav,

FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit 
ARM at -O2 and -O3 optimization levels.

--
Maxim Kuvyrkov
https://www.linaro.org

> On 15 Sep 2021, at 12:54, ci_not...@linaro.org wrote:
> 
> After llvm commit 92c1fd19abb15bc68b1127a26137a69e033cdb39
> Author: Stanislav Mekhanoshin 
> 
>Allow rematerialization of virtual reg uses
> 
> the following benchmarks slowed down by more than 2%:
> - 456.hmmer slowed down by 6%
> - 482.sphinx3 slowed down by 3%
> 
> Benchmark: 
> Toolchain: Clang + Glibc + LLVM Linker
> Version: all components were built from their tip of trunk
> Target: arm-linux-gnueabihf
> Compiler flags: -O3 -marm
> Hardware: NVidia TK1 4x Cortex-A15
> 
> This commit has regressed these CI configurations:
> - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2
> - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3
> 
> First_bad build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm-spec2k6-O3/18/artifact/artifacts/build-92c1fd19abb15bc68b1127a26137a69e033cdb39/
> Last_good build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm-spec2k6-O3/18/artifact/artifacts/build-1d02a8bcd393ea9c50f0212797059888efc78002/
> Baseline build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm-spec2k6-O3/18/artifact/artifacts/build-baseline/
> Even more details: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm-spec2k6-O3/18/artifact/artifacts/
> 
> Reproduce builds:
> 
> mkdir investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39
> cd investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39
> 
> # Fetch scripts
> git clone https://git.linaro.org/toolchain/jenkins-scripts
> 
> # Fetch manifests and test.sh script
> mkdir -p artifacts/manifests
> curl -o artifacts/manifests/build-baseline.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm-spec2k6-O3/18/artifact/artifacts/manifests/build-baseline.sh
>  --fail
> curl -o artifacts/manifests/build-parameters.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm-spec2k6-O3/18/artifact/artifacts/manifests/build-parameters.sh
>  --fail
> curl -o artifacts/test.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-arm-spec2k6-O3/18/artifact/artifacts/test.sh
>  --fail
> chmod +x artifacts/test.sh
> 
> # Reproduce the baseline build (build all pre-requisites)
> ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
> 
> # Save baseline build state (which is then restored in artifacts/test.sh)
> mkdir -p ./bisect
> rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
> --exclude /llvm/ ./ ./bisect/baseline/
> 
> cd llvm
> 
> # Reproduce first_bad build
> git checkout --detach 92c1fd19abb15bc68b1127a26137a69e033cdb39
> ../artifacts/test.sh
> 
> # Reproduce last_good build
> git checkout --detach 1d02a8bcd393ea9c50f0212797059888efc78002
> ../artifacts/test.sh
> 
> cd ..
> 
> 
> Full commit (up to 1000 lines):
> 
> commit 92c1fd19abb15bc68b1127a26137a69e033cdb39
> Author: Stanislav Mekhanoshin 
> Date:   Thu Aug 19 11:42:09 2021 -0700
> 
>Allow rematerialization of virtual reg uses
> 
>Currently isReallyTriviallyReMaterializableGeneric() implementation
>prevents rematerialization on any virtual register use on the grounds
>that is not a trivial rematerialization and that we do not want to
>extend liveranges.
> 
>It appears that LRE logic does not attempt to extend a liverange of
>a source register for rematerialization so that is not an issue.
>That is checked in the LiveRangeEdit::allUsesAvailableAt().
> 
>The only non-trivial aspect of it is accounting for tied-defs which
>normally represent a read-modify-write operation and not rematerializable.
> 
>The test for a tied-def situation already exists in the
>/CodeGen/AMDGPU/remat-vop.mir,
>test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.
> 
>The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
>where I more or less understand the asm it seems to reduce spilling
>(as expected) or be neutral. However, it needs a review by all targets'
>specialists.
> 
>Differential Revision: https://reviews.llvm.org/D106408
> ---
> llvm/include/llvm/CodeGen/TargetInstrInfo.h|   12 +-
> llvm/lib/CodeGen/TargetInstrInfo.cpp   |9 +-
> llvm/test/CodeGen/AMDGPU/remat-sop.mir |   60 +
> llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll |   28 +-
> llvm/test/CodeGen/ARM/funnel-shift-rot.ll  |   32 +-
> llvm/test/CodeGen/ARM/funnel-shift.ll  |   30 +-
> .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll |   30 +-
> llvm/test/CodeGen/ARM/neon-copy.ll |   10 +-
> llvm/test/CodeGen/Mips/llvm-ir/ashr.ll |  227 +-
> llvm/test/CodeGen/Mips/llvm-ir/lshr.ll |  206 +-
>