[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
rampitec added a comment. Please retitle it without AMDGPU and remove the changes to pass ORE to targets. It is not a part of this change, it is a part of the folloup target specific change. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 updated this revision to Diff 366301. gandhi21299 added a comment. - rebased against main branch - cleaned up code Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 Files: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl llvm/include/llvm/CodeGen/TargetLowering.h llvm/lib/CodeGen/AtomicExpandPass.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll llvm/test/CodeGen/AMDGPU/llc-pipeline.ll llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/opt-pipeline.ll Index: llvm/test/CodeGen/X86/opt-pipeline.ll === --- llvm/test/CodeGen/X86/opt-pipeline.ll +++ llvm/test/CodeGen/X86/opt-pipeline.ll @@ -16,15 +16,20 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Type-Based Alias Analysis ; CHECK-NEXT: Scoped NoAlias Alias Analysis ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/X86/O0-pipeline.ll === --- llvm/test/CodeGen/X86/O0-pipeline.ll +++ llvm/test/CodeGen/X86/O0-pipeline.ll @@ -10,13 +10,18 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll === --- llvm/test/CodeGen/AMDGPU/llc-pipeline.ll +++ llvm/test/CodeGen/AMDGPU/llc-pipeline.ll @@ -44,6 +44,11 @@ ; GCN-O0-NEXT:Lower OpenCL enqueued blocks ; GCN-O0-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O0-NEXT:FunctionPass Manager +; GCN-O0-NEXT: Dominator Tree Construction +; GCN-O0-NEXT: Natural Loop Information +; GCN-O0-NEXT: Lazy Branch Probability Analysis +; GCN-O0-NEXT: Lazy Block Frequency Analysis +; GCN-O0-NEXT: Optimization Remark Emitter ; GCN-O0-NEXT: Expand Atomic instructions ; GCN-O0-NEXT: Lower constant intrinsics ; GCN-O0-NEXT: Remove unreachable blocks from the CFG @@ -180,6 +185,11 @@ ; GCN-O1-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-NEXT:FunctionPass Manager ; GCN-O1-NEXT: Infer address spaces +; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Natural Loop Information +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Optimization Remark Emitter ; GCN-O1-NEXT: Expand Atomic instructions ; GCN-O1-NEXT: AMDGPU Promote Alloca ; GCN-O1-NEXT: Dominator Tree Construction @@ -431,6 +441,11 @@ ; GCN-O1-OPTS-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-OPTS-NEXT:FunctionPass Manager ; GCN-O1-OPTS-NEXT: Infer address spaces +; GCN-O1-OPTS-NEXT: Dominator Tree Construction +; GCN-O1-OPTS-NEXT: Natural Loop Information +; GCN-O1-OPTS-NEXT: Lazy Branch Probability Analysis +; GCN-O1-OPTS-NEXT: Lazy Block Frequency Analysis +; GCN-O1-OPTS-NEXT: Optimization Remark Emitter ; GCN-O1-OPTS-NEXT: Expand
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 updated this revision to Diff 366294. gandhi21299 added a comment. - added clang/test/CodeGenCUDA/fp-atomics-optremarks.cu back - moved `Remark` declaration into the `else` block Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 Files: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl llvm/include/llvm/CodeGen/TargetLowering.h llvm/lib/CodeGen/AtomicExpandPass.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll llvm/test/CodeGen/AMDGPU/llc-pipeline.ll llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/opt-pipeline.ll Index: llvm/test/CodeGen/X86/opt-pipeline.ll === --- llvm/test/CodeGen/X86/opt-pipeline.ll +++ llvm/test/CodeGen/X86/opt-pipeline.ll @@ -16,15 +16,20 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Type-Based Alias Analysis ; CHECK-NEXT: Scoped NoAlias Alias Analysis ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/X86/O0-pipeline.ll === --- llvm/test/CodeGen/X86/O0-pipeline.ll +++ llvm/test/CodeGen/X86/O0-pipeline.ll @@ -10,13 +10,18 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll === --- llvm/test/CodeGen/AMDGPU/llc-pipeline.ll +++ llvm/test/CodeGen/AMDGPU/llc-pipeline.ll @@ -44,6 +44,11 @@ ; GCN-O0-NEXT:Lower OpenCL enqueued blocks ; GCN-O0-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O0-NEXT:FunctionPass Manager +; GCN-O0-NEXT: Dominator Tree Construction +; GCN-O0-NEXT: Natural Loop Information +; GCN-O0-NEXT: Lazy Branch Probability Analysis +; GCN-O0-NEXT: Lazy Block Frequency Analysis +; GCN-O0-NEXT: Optimization Remark Emitter ; GCN-O0-NEXT: Expand Atomic instructions ; GCN-O0-NEXT: Lower constant intrinsics ; GCN-O0-NEXT: Remove unreachable blocks from the CFG @@ -180,6 +185,11 @@ ; GCN-O1-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-NEXT:FunctionPass Manager ; GCN-O1-NEXT: Infer address spaces +; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Natural Loop Information +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Optimization Remark Emitter ; GCN-O1-NEXT: Expand Atomic instructions ; GCN-O1-NEXT: AMDGPU Promote Alloca ; GCN-O1-NEXT: Dominator Tree Construction @@ -431,6 +441,11 @@ ; GCN-O1-OPTS-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-OPTS-NEXT:FunctionPass Manager ; GCN-O1-OPTS-NEXT: Infer address spaces +; GCN-O1-OPTS-NEXT: Dominator Tree Construction +; GCN-O1-OPTS-NEXT: Natural Loop Information +; GCN-O1-OPTS-NEXT: Lazy Branch Probability Analysis +; GCN-O1-OPTS-NEXT: Lazy Block Frequency Analysis +; GCN-O1-OPTS-NEXT:
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 marked an inline comment as done. gandhi21299 added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:585 + TLI->shouldExpandAtomicRMWInIR(AI, ORE); + OptimizationRemark Remark(DEBUG_TYPE, "Passed", AI->getFunction()); + switch (Kind) { gandhi21299 wrote: > rampitec wrote: > > What should this "Passed" do and why wouldn't just declare it where you use > > it? > https://llvm.org/docs/Remarks.html > > Since this is an informative pass and not that pass failed to optimize, the > "Passed" argument is used. I will move it downwards, I thought it might be > useful in the future for other operations. Its better below for now anyways. Actually I am getting a runtime error at the line where I declare Remark when I bring it down. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 marked an inline comment as done. gandhi21299 added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf rampitec wrote: > gandhi21299 wrote: > > rampitec wrote: > > > gandhi21299 wrote: > > > > rampitec wrote: > > > > > gandhi21299 wrote: > > > > > > rampitec wrote: > > > > > > > Need tests for all scopes. > > > > > > `__atomic_fetch_add` does not take scope as an argument, how could > > > > > > I add tests with different scopes? > > > > > At least in the IR test. > > > > What do you mean by that? > > > You need to test all of that. If you cannot write a proper .cu test, then > > > write an IR test and run llc. > > Should I discard this test then since the test fp-atomics-remarks-gfx90a.ll > > already satisfies? > CU test is still needed. You also need it in the .cl test below. Alright, I am not sure how I can test for the other scopes though. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:585 + TLI->shouldExpandAtomicRMWInIR(AI, ORE); + OptimizationRemark Remark(DEBUG_TYPE, "Passed", AI->getFunction()); + switch (Kind) { rampitec wrote: > What should this "Passed" do and why wouldn't just declare it where you use > it? https://llvm.org/docs/Remarks.html Since this is an informative pass and not that pass failed to optimize, the "Passed" argument is used. I will move it downwards, I thought it might be useful in the future for other operations. Its better below for now anyways. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 updated this revision to Diff 366132. gandhi21299 marked 3 inline comments as done. gandhi21299 added a comment. no way to pass memory_scope in `__atomic_fetch_add(...)`, discarded the test. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 Files: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl llvm/include/llvm/CodeGen/TargetLowering.h llvm/lib/CodeGen/AtomicExpandPass.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll llvm/test/CodeGen/AMDGPU/llc-pipeline.ll llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/opt-pipeline.ll Index: llvm/test/CodeGen/X86/opt-pipeline.ll === --- llvm/test/CodeGen/X86/opt-pipeline.ll +++ llvm/test/CodeGen/X86/opt-pipeline.ll @@ -16,15 +16,20 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Type-Based Alias Analysis ; CHECK-NEXT: Scoped NoAlias Alias Analysis ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/X86/O0-pipeline.ll === --- llvm/test/CodeGen/X86/O0-pipeline.ll +++ llvm/test/CodeGen/X86/O0-pipeline.ll @@ -10,13 +10,18 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll === --- llvm/test/CodeGen/AMDGPU/llc-pipeline.ll +++ llvm/test/CodeGen/AMDGPU/llc-pipeline.ll @@ -44,6 +44,11 @@ ; GCN-O0-NEXT:Lower OpenCL enqueued blocks ; GCN-O0-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O0-NEXT:FunctionPass Manager +; GCN-O0-NEXT: Dominator Tree Construction +; GCN-O0-NEXT: Natural Loop Information +; GCN-O0-NEXT: Lazy Branch Probability Analysis +; GCN-O0-NEXT: Lazy Block Frequency Analysis +; GCN-O0-NEXT: Optimization Remark Emitter ; GCN-O0-NEXT: Expand Atomic instructions ; GCN-O0-NEXT: Lower constant intrinsics ; GCN-O0-NEXT: Remove unreachable blocks from the CFG @@ -180,6 +185,11 @@ ; GCN-O1-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-NEXT:FunctionPass Manager ; GCN-O1-NEXT: Infer address spaces +; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Natural Loop Information +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Optimization Remark Emitter ; GCN-O1-NEXT: Expand Atomic instructions ; GCN-O1-NEXT: AMDGPU Promote Alloca ; GCN-O1-NEXT: Dominator Tree Construction @@ -431,6 +441,11 @@ ; GCN-O1-OPTS-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-OPTS-NEXT:FunctionPass Manager ; GCN-O1-OPTS-NEXT: Infer address spaces +; GCN-O1-OPTS-NEXT: Dominator Tree Construction +; GCN-O1-OPTS-NEXT: Natural Loop Information +; GCN-O1-OPTS-NEXT: Lazy Branch Probability Analysis +; GCN-O1-OPTS-NEXT: Lazy Block Frequency Analysis +; GCN-O1-OPTS-NEXT: Optimization Remark Emitter ;
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
rampitec added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf gandhi21299 wrote: > rampitec wrote: > > gandhi21299 wrote: > > > rampitec wrote: > > > > gandhi21299 wrote: > > > > > rampitec wrote: > > > > > > Need tests for all scopes. > > > > > `__atomic_fetch_add` does not take scope as an argument, how could I > > > > > add tests with different scopes? > > > > At least in the IR test. > > > What do you mean by that? > > You need to test all of that. If you cannot write a proper .cu test, then > > write an IR test and run llc. > Should I discard this test then since the test fp-atomics-remarks-gfx90a.ll > already satisfies? CU test is still needed. You also need it in the .cl test below. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:585 + TLI->shouldExpandAtomicRMWInIR(AI, ORE); + OptimizationRemark Remark(DEBUG_TYPE, "Passed", AI->getFunction()); + switch (Kind) { What should this "Passed" do and why wouldn't just declare it where you use it? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 updated this revision to Diff 366127. gandhi21299 added a comment. - corrected remarks by replacing the operation name and updated tests accordingly - code format Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 Files: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl llvm/include/llvm/CodeGen/TargetLowering.h llvm/lib/CodeGen/AtomicExpandPass.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll llvm/test/CodeGen/AMDGPU/llc-pipeline.ll llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/opt-pipeline.ll Index: llvm/test/CodeGen/X86/opt-pipeline.ll === --- llvm/test/CodeGen/X86/opt-pipeline.ll +++ llvm/test/CodeGen/X86/opt-pipeline.ll @@ -16,15 +16,20 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Type-Based Alias Analysis ; CHECK-NEXT: Scoped NoAlias Alias Analysis ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/X86/O0-pipeline.ll === --- llvm/test/CodeGen/X86/O0-pipeline.ll +++ llvm/test/CodeGen/X86/O0-pipeline.ll @@ -10,13 +10,18 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll === --- llvm/test/CodeGen/AMDGPU/llc-pipeline.ll +++ llvm/test/CodeGen/AMDGPU/llc-pipeline.ll @@ -44,6 +44,11 @@ ; GCN-O0-NEXT:Lower OpenCL enqueued blocks ; GCN-O0-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O0-NEXT:FunctionPass Manager +; GCN-O0-NEXT: Dominator Tree Construction +; GCN-O0-NEXT: Natural Loop Information +; GCN-O0-NEXT: Lazy Branch Probability Analysis +; GCN-O0-NEXT: Lazy Block Frequency Analysis +; GCN-O0-NEXT: Optimization Remark Emitter ; GCN-O0-NEXT: Expand Atomic instructions ; GCN-O0-NEXT: Lower constant intrinsics ; GCN-O0-NEXT: Remove unreachable blocks from the CFG @@ -180,6 +185,11 @@ ; GCN-O1-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-NEXT:FunctionPass Manager ; GCN-O1-NEXT: Infer address spaces +; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Natural Loop Information +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Optimization Remark Emitter ; GCN-O1-NEXT: Expand Atomic instructions ; GCN-O1-NEXT: AMDGPU Promote Alloca ; GCN-O1-NEXT: Dominator Tree Construction @@ -431,6 +441,11 @@ ; GCN-O1-OPTS-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-OPTS-NEXT:FunctionPass Manager ; GCN-O1-OPTS-NEXT: Infer address spaces +; GCN-O1-OPTS-NEXT: Dominator Tree Construction +; GCN-O1-OPTS-NEXT: Natural Loop Information +; GCN-O1-OPTS-NEXT: Lazy Branch Probability Analysis +; GCN-O1-OPTS-NEXT: Lazy Block Frequency Analysis +; GCN-O1-OPTS-NEXT: Optimization
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 updated this revision to Diff 366131. gandhi21299 added a comment. - corrected atomics-remarks-gfx90a.cl test to emit remark as well Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 Files: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl llvm/include/llvm/CodeGen/TargetLowering.h llvm/lib/CodeGen/AtomicExpandPass.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll llvm/test/CodeGen/AMDGPU/llc-pipeline.ll llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/opt-pipeline.ll Index: llvm/test/CodeGen/X86/opt-pipeline.ll === --- llvm/test/CodeGen/X86/opt-pipeline.ll +++ llvm/test/CodeGen/X86/opt-pipeline.ll @@ -16,15 +16,20 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Type-Based Alias Analysis ; CHECK-NEXT: Scoped NoAlias Alias Analysis ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/X86/O0-pipeline.ll === --- llvm/test/CodeGen/X86/O0-pipeline.ll +++ llvm/test/CodeGen/X86/O0-pipeline.ll @@ -10,13 +10,18 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll === --- llvm/test/CodeGen/AMDGPU/llc-pipeline.ll +++ llvm/test/CodeGen/AMDGPU/llc-pipeline.ll @@ -44,6 +44,11 @@ ; GCN-O0-NEXT:Lower OpenCL enqueued blocks ; GCN-O0-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O0-NEXT:FunctionPass Manager +; GCN-O0-NEXT: Dominator Tree Construction +; GCN-O0-NEXT: Natural Loop Information +; GCN-O0-NEXT: Lazy Branch Probability Analysis +; GCN-O0-NEXT: Lazy Block Frequency Analysis +; GCN-O0-NEXT: Optimization Remark Emitter ; GCN-O0-NEXT: Expand Atomic instructions ; GCN-O0-NEXT: Lower constant intrinsics ; GCN-O0-NEXT: Remove unreachable blocks from the CFG @@ -180,6 +185,11 @@ ; GCN-O1-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-NEXT:FunctionPass Manager ; GCN-O1-NEXT: Infer address spaces +; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Natural Loop Information +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Optimization Remark Emitter ; GCN-O1-NEXT: Expand Atomic instructions ; GCN-O1-NEXT: AMDGPU Promote Alloca ; GCN-O1-NEXT: Dominator Tree Construction @@ -431,6 +441,11 @@ ; GCN-O1-OPTS-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-OPTS-NEXT:FunctionPass Manager ; GCN-O1-OPTS-NEXT: Infer address spaces +; GCN-O1-OPTS-NEXT: Dominator Tree Construction +; GCN-O1-OPTS-NEXT: Natural Loop Information +; GCN-O1-OPTS-NEXT: Lazy Branch Probability Analysis +; GCN-O1-OPTS-NEXT: Lazy Block Frequency Analysis +; GCN-O1-OPTS-NEXT: Optimization Remark Emitter ;
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 marked 4 inline comments as done. gandhi21299 added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf rampitec wrote: > gandhi21299 wrote: > > rampitec wrote: > > > gandhi21299 wrote: > > > > rampitec wrote: > > > > > Need tests for all scopes. > > > > `__atomic_fetch_add` does not take scope as an argument, how could I > > > > add tests with different scopes? > > > At least in the IR test. > > What do you mean by that? > You need to test all of that. If you cannot write a proper .cu test, then > write an IR test and run llc. Should I discard this test then since the test fp-atomics-remarks-gfx90a.ll already satisfies? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
rampitec added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf gandhi21299 wrote: > rampitec wrote: > > gandhi21299 wrote: > > > rampitec wrote: > > > > Need tests for all scopes. > > > `__atomic_fetch_add` does not take scope as an argument, how could I add > > > tests with different scopes? > > At least in the IR test. > What do you mean by that? You need to test all of that. If you cannot write a proper .cu test, then write an IR test and run llc. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:618 expandAtomicRMWToCmpXchg(AI, createCmpXchgInstFun); + Ctx.getSyncScopeNames(SSNs); + auto MemScope = SSNs[AI->getSyncScopeID()].empty() gandhi21299 wrote: > rampitec wrote: > > Only if SSNs.empty(). > Sorry, what do you mean? SSN will be empty at that point. I thought want to cache it. But really just declare it here. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:624 +Remark << "A compare and swap loop was generated for an " + << AI->getOpcodeName() << "operation at " << MemScope + << " memory scope"; gandhi21299 wrote: > rampitec wrote: > > I believe getOpcodeName() will return "atomicrmw" instead of the operation. > > Also missing space after it. > getOpcodeName() returns `atomicrmwoperation`, as per the tests the spacing > looks correct to me. The operation to report is AI->getOperation(). Spacing is wrong, "operation" is your text. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf rampitec wrote: > gandhi21299 wrote: > > rampitec wrote: > > > Need tests for all scopes. > > `__atomic_fetch_add` does not take scope as an argument, how could I add > > tests with different scopes? > At least in the IR test. What do you mean by that? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:618 expandAtomicRMWToCmpXchg(AI, createCmpXchgInstFun); + Ctx.getSyncScopeNames(SSNs); + auto MemScope = SSNs[AI->getSyncScopeID()].empty() rampitec wrote: > Only if SSNs.empty(). Sorry, what do you mean? SSN will be empty at that point. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:624 +Remark << "A compare and swap loop was generated for an " + << AI->getOpcodeName() << "operation at " << MemScope + << " memory scope"; rampitec wrote: > I believe getOpcodeName() will return "atomicrmw" instead of the operation. > Also missing space after it. getOpcodeName() returns `atomicrmwoperation`, as per the tests the spacing looks correct to me. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
rampitec added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf gandhi21299 wrote: > rampitec wrote: > > Need tests for all scopes. > `__atomic_fetch_add` does not take scope as an argument, how could I add > tests with different scopes? At least in the IR test. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:618 expandAtomicRMWToCmpXchg(AI, createCmpXchgInstFun); + Ctx.getSyncScopeNames(SSNs); + auto MemScope = SSNs[AI->getSyncScopeID()].empty() Only if SSNs.empty(). Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:624 +Remark << "A compare and swap loop was generated for an " + << AI->getOpcodeName() << "operation at " << MemScope + << " memory scope"; I believe getOpcodeName() will return "atomicrmw" instead of the operation. Also missing space after it. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 added inline comments. Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10 + +// GFX90A-CAS: A compare and swap loop was generated for an atomic operation at system memory scope +// GFX90A-CAS-LABEL: _Z14atomic_add_casPf rampitec wrote: > Need tests for all scopes. `__atomic_fetch_add` does not take scope as an argument, how could I add tests with different scopes? Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:25 + +// GFX90A-CAS-LABEL: @atomic_cas_system +// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} syncscope("workgroup-one-as") monotonic For some reason, remarks are not emitted here. The command to run looks right above... Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 updated this revision to Diff 366112. gandhi21299 added a comment. requested changes from reviewer - added memory scope tests and updated remarks and tests accordingly - still working on clang/test/CodeGenCUDA/fp-atomics-optremarks.cu and clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 Files: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl llvm/include/llvm/CodeGen/TargetLowering.h llvm/lib/CodeGen/AtomicExpandPass.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/AMDGPU/SIISelLowering.h llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86ISelLowering.h llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll llvm/test/CodeGen/AMDGPU/llc-pipeline.ll llvm/test/CodeGen/X86/O0-pipeline.ll llvm/test/CodeGen/X86/opt-pipeline.ll Index: llvm/test/CodeGen/X86/opt-pipeline.ll === --- llvm/test/CodeGen/X86/opt-pipeline.ll +++ llvm/test/CodeGen/X86/opt-pipeline.ll @@ -16,15 +16,20 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Type-Based Alias Analysis ; CHECK-NEXT: Scoped NoAlias Alias Analysis ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/X86/O0-pipeline.ll === --- llvm/test/CodeGen/X86/O0-pipeline.ll +++ llvm/test/CodeGen/X86/O0-pipeline.ll @@ -10,13 +10,18 @@ ; CHECK-NEXT: Target Pass Configuration ; CHECK-NEXT: Machine Module Information ; CHECK-NEXT: Target Transform Information +; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Create Garbage Collector Module Metadata ; CHECK-NEXT: Assumption Cache Tracker -; CHECK-NEXT: Profile summary info ; CHECK-NEXT: Machine Branch Probability Analysis ; CHECK-NEXT: ModulePass Manager ; CHECK-NEXT: Pre-ISel Intrinsic Lowering ; CHECK-NEXT: FunctionPass Manager +; CHECK-NEXT: Dominator Tree Construction +; CHECK-NEXT: Natural Loop Information +; CHECK-NEXT: Lazy Branch Probability Analysis +; CHECK-NEXT: Lazy Block Frequency Analysis +; CHECK-NEXT: Optimization Remark Emitter ; CHECK-NEXT: Expand Atomic instructions ; CHECK-NEXT: Lower AMX intrinsics ; CHECK-NEXT: Lower AMX type for load/store Index: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll === --- llvm/test/CodeGen/AMDGPU/llc-pipeline.ll +++ llvm/test/CodeGen/AMDGPU/llc-pipeline.ll @@ -44,6 +44,11 @@ ; GCN-O0-NEXT:Lower OpenCL enqueued blocks ; GCN-O0-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O0-NEXT:FunctionPass Manager +; GCN-O0-NEXT: Dominator Tree Construction +; GCN-O0-NEXT: Natural Loop Information +; GCN-O0-NEXT: Lazy Branch Probability Analysis +; GCN-O0-NEXT: Lazy Block Frequency Analysis +; GCN-O0-NEXT: Optimization Remark Emitter ; GCN-O0-NEXT: Expand Atomic instructions ; GCN-O0-NEXT: Lower constant intrinsics ; GCN-O0-NEXT: Remove unreachable blocks from the CFG @@ -180,6 +185,11 @@ ; GCN-O1-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-NEXT:FunctionPass Manager ; GCN-O1-NEXT: Infer address spaces +; GCN-O1-NEXT: Dominator Tree Construction +; GCN-O1-NEXT: Natural Loop Information +; GCN-O1-NEXT: Lazy Branch Probability Analysis +; GCN-O1-NEXT: Lazy Block Frequency Analysis +; GCN-O1-NEXT: Optimization Remark Emitter ; GCN-O1-NEXT: Expand Atomic instructions ; GCN-O1-NEXT: AMDGPU Promote Alloca ; GCN-O1-NEXT: Dominator Tree Construction @@ -431,6 +441,11 @@ ; GCN-O1-OPTS-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-OPTS-NEXT:FunctionPass Manager ; GCN-O1-OPTS-NEXT: Infer address spaces +; GCN-O1-OPTS-NEXT: Dominator Tree Construction +; GCN-O1-OPTS-NEXT: Natural Loop Information +; GCN-O1-OPTS-NEXT:
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); gandhi21299 wrote: > rampitec wrote: > > gandhi21299 wrote: > > > rampitec wrote: > > > > gandhi21299 wrote: > > > > > rampitec wrote: > > > > > > That does not help with target defined scope names, such as our > > > > > > "one-as" for example. > > > > > How can I get target defined scope names? > > > > It is right on the instruction: > > > > %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 > > > > syncscope("one-as") seq_cst > > > > > > > Sorry, I meant from the LLVM API. > > LLVMContext::getSyncScopeNames() > I think that gives me all sync scopes available for the target. If not, which > sync scope in the vector corresponds to the instruction I am dealing with? https://llvm.org/doxygen/MachineOperand_8cpp_source.html#l00474 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); rampitec wrote: > gandhi21299 wrote: > > rampitec wrote: > > > gandhi21299 wrote: > > > > rampitec wrote: > > > > > That does not help with target defined scope names, such as our > > > > > "one-as" for example. > > > > How can I get target defined scope names? > > > It is right on the instruction: > > > %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 > > > syncscope("one-as") seq_cst > > > > > Sorry, I meant from the LLVM API. > LLVMContext::getSyncScopeNames() I think that gives me all sync scopes available for the target. If not, which sync scope in the vector corresponds to the instruction I am dealing with? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); gandhi21299 wrote: > rampitec wrote: > > gandhi21299 wrote: > > > rampitec wrote: > > > > That does not help with target defined scope names, such as our > > > > "one-as" for example. > > > How can I get target defined scope names? > > It is right on the instruction: > > %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 > > syncscope("one-as") seq_cst > > > Sorry, I meant from the LLVM API. LLVMContext::getSyncScopeNames() Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); rampitec wrote: > gandhi21299 wrote: > > rampitec wrote: > > > That does not help with target defined scope names, such as our "one-as" > > > for example. > > How can I get target defined scope names? > It is right on the instruction: > %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 > syncscope("one-as") seq_cst > Sorry, I meant from the LLVM API. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
rampitec added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); gandhi21299 wrote: > rampitec wrote: > > That does not help with target defined scope names, such as our "one-as" > > for example. > How can I get target defined scope names? It is right on the instruction: %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("one-as") seq_cst Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop
gandhi21299 marked 2 inline comments as done. gandhi21299 added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631 +"at " + << (AI->getSyncScopeID() ? "system" : "single thread") + << " memory scope"); rampitec wrote: > That does not help with target defined scope names, such as our "one-as" for > example. How can I get target defined scope names? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits