[PATCH] D141700: AMDGPU: Move enqueued block handling into clang
arsenm updated this revision to Diff 558095. arsenm added a comment. Drop bitcode auto upgrade handling CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141700/new/ https://reviews.llvm.org/D141700 Files: clang/lib/CodeGen/Targets/AMDGPU.cpp clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel-linking.cl clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl llvm/docs/AMDGPUUsage.rst llvm/lib/IR/AutoUpgrade.cpp llvm/lib/IR/CMakeLists.txt llvm/lib/Target/AMDGPU/AMDGPU.h llvm/lib/Target/AMDGPU/AMDGPUExportKernelRuntimeHandles.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h llvm/lib/Target/AMDGPU/AMDGPUOpenCLEnqueuedBlockLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp llvm/lib/Target/AMDGPU/CMakeLists.txt llvm/test/CodeGen/AMDGPU/amdgpu-export-kernel-runtime-handles.ll llvm/test/CodeGen/AMDGPU/enqueue-kernel.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll llvm/test/CodeGen/AMDGPU/llc-pipeline.ll Index: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll === --- llvm/test/CodeGen/AMDGPU/llc-pipeline.ll +++ llvm/test/CodeGen/AMDGPU/llc-pipeline.ll @@ -37,7 +37,7 @@ ; GCN-O0-NEXT:Dominator Tree Construction ; GCN-O0-NEXT:Basic Alias Analysis (stateless AA impl) ; GCN-O0-NEXT:Function Alias Analysis Results -; GCN-O0-NEXT:Lower OpenCL enqueued blocks +; GCN-O0-NEXT:Externalize enqueued block runtime handles ; GCN-O0-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O0-NEXT:FunctionPass Manager ; GCN-O0-NEXT: Expand Atomic instructions @@ -178,7 +178,7 @@ ; GCN-O1-NEXT:Dominator Tree Construction ; GCN-O1-NEXT:Basic Alias Analysis (stateless AA impl) ; GCN-O1-NEXT:Function Alias Analysis Results -; GCN-O1-NEXT:Lower OpenCL enqueued blocks +; GCN-O1-NEXT:Externalize enqueued block runtime handles ; GCN-O1-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-NEXT:AMDGPU Attributor ; GCN-O1-NEXT: FunctionPass Manager @@ -445,7 +445,7 @@ ; GCN-O1-OPTS-NEXT:Dominator Tree Construction ; GCN-O1-OPTS-NEXT:Basic Alias Analysis (stateless AA impl) ; GCN-O1-OPTS-NEXT:Function Alias Analysis Results -; GCN-O1-OPTS-NEXT:Lower OpenCL enqueued blocks +; GCN-O1-OPTS-NEXT:Externalize enqueued block runtime handles ; GCN-O1-OPTS-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-OPTS-NEXT:AMDGPU Attributor ; GCN-O1-OPTS-NEXT: FunctionPass Manager @@ -736,7 +736,7 @@ ; GCN-O2-NEXT:Dominator Tree Construction ; GCN-O2-NEXT:Basic Alias Analysis (stateless AA impl) ; GCN-O2-NEXT:Function Alias Analysis Results -; GCN-O2-NEXT:Lower OpenCL enqueued blocks +; GCN-O2-NEXT:Externalize enqueued block runtime handles ; GCN-O2-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O2-NEXT:AMDGPU Attributor ; GCN-O2-NEXT: FunctionPass Manager @@ -1037,7 +1037,7 @@ ; GCN-O3-NEXT:Dominator Tree Construction ; GCN-O3-NEXT:Basic Alias Analysis (stateless AA impl) ; GCN-O3-NEXT:Function Alias Analysis Results -; GCN-O3-NEXT:Lower OpenCL enqueued blocks +; GCN-O3-NEXT:Externalize enqueued block runtime handles ; GCN-O3-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O3-NEXT:AMDGPU Attributor ; GCN-O3-NEXT: FunctionPass Manager Index: llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll === --- llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll +++ llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll @@ -14,7 +14,8 @@ %struct.B = type { ptr addrspace(1) } %opencl.clk_event_t = type opaque -@__test_block_invoke_kernel_runtime_handle = external addrspace(1) externally_initialized constant ptr addrspace(1) +@__test_block_invoke_kernel_runtime_handle = external addrspace(1) externally_initialized constant ptr addrspace(1), section ".amdgpu.kernel.runtime.handle" +@not.a.handle = external addrspace(1) externally_initialized constant ptr addrspace(1) ; CHECK: --- ; CHECK-NEXT: amdhsa.kernels: @@ -1678,7 +1679,7 @@ ; CHECK: .name: __test_block_invoke_kernel ; CHECK: .symbol: __test_block_invoke_kernel.kd define amdgpu_kernel void @__test_block_invoke_kernel( -<{ i32, i32, ptr, ptr addrspace(1), i8 }> %arg) #1 +<{ i32, i32, ptr, ptr addrspace(1), i8 }> %arg) #1 !associated !112 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !110 !kernel_arg_base_type !110 !kernel_arg_type_qual !4 { ret void @@ -1734,6 +1735,29 @@ ret void } +; Make sure the device_enqueue_symbol is not reported +; CHECK: - .args: [] +; CHECK-NEXT: .group_segment_fixed_size: 0 +; CHECK-NEXT: .kernarg
[PATCH] D141700: AMDGPU: Move enqueued block handling into clang
arsenm added inline comments. Comment at: llvm/lib/IR/CMakeLists.txt:84 Demangle + TransformUtils + This introduces a circular dependency between LLVMCore and TransformUtils. Options are: 1. Move appendToUsed into Module 2. Don't bother with bitcode compatibility for this 3. Avoid depending on llvm.used. I know I tried to do this but it was so long ago I don't remember how I ended up on this solution CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141700/new/ https://reviews.llvm.org/D141700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D141700: AMDGPU: Move enqueued block handling into clang
barannikov88 added inline comments. Comment at: clang/lib/CodeGen/Targets/AMDGPU.cpp:520 +static llvm::StructType *getAMDGPUKernelDescriptorType(llvm::LLVMContext &C) { + llvm::Type *Int8 = llvm::IntegerType::getInt8Ty(C); + llvm::Type *Int16 = llvm::IntegerType::getInt16Ty(C); Minor suggestion: you can get these types from CGF / CGM (Int8Ty etc.) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141700/new/ https://reviews.llvm.org/D141700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D141700: AMDGPU: Move enqueued block handling into clang
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp:299 + + Attrs.mRuntimeHandle = getEnqueuedBlockSymbolName(TM, Func); } kzhuravl wrote: > Do we really need/want to update code object v2? as long as the code is here yes. Not updating it would mean maintaining two paths in the implementation. This is just changing the internal representation CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141700/new/ https://reviews.llvm.org/D141700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D141700: AMDGPU: Move enqueued block handling into clang
kzhuravl added a comment. Overall looks good. Comment at: llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp:299 + + Attrs.mRuntimeHandle = getEnqueuedBlockSymbolName(TM, Func); } Do we really need/want to update code object v2? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141700/new/ https://reviews.llvm.org/D141700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D141700: AMDGPU: Move enqueued block handling into clang
sameerds added a comment. LGTM, to the extent that I can see that the change does what is advertised, and the ultimately emitted HSA metadata preserves the current contract with the runtime. A couple of tests can use a little more explanatory comments as noted. Comment at: clang/lib/CodeGen/TargetInfo.cpp:12581 + Mod, HandleTy, + /*isConstant=*/true, llvm::GlobalValue::InternalLinkage, + /*Initializer=*/RuntimeHandleInitializer, RuntimeHandleName, jmmartinez wrote: > Just a cosmetical remark: Is there any reason to keep the `/*isConstant=*/`, > `/*Initializer=*/`, ... comments? I think it would be better to avoid them. FWIW, I find these comments very helpful when spelunking through code. I could sympathise with not needing `Initializer=` because the value name makes it clear. But an undecorated constant literal like "true" or "10" or "nullptr" works best when accompanied by a comment. Comment at: llvm/test/Bitcode/amdgpu-autoupgrade-enqueued-block.ll:69 + +; __enqueue_kernel* functions may get inlined +define amdgpu_kernel void @inlined_caller(ptr addrspace(1) %a, i8 %b, ptr addrspace(1) %c, i64 %d) { I did not understand what is being tested here. Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-export-kernel-runtime-handles.ll:2 +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-attributes --check-globals +; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -amdgpu-export-kernel-runtime-handles < %s | FileCheck %s + Is there any visible effect of the pass being tested? Or the intention is simply to check that the output is the same as input, and there is no error? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141700/new/ https://reviews.llvm.org/D141700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D141700: AMDGPU: Move enqueued block handling into clang
jmmartinez added inline comments. Comment at: clang/lib/CodeGen/TargetInfo.cpp:12581 + Mod, HandleTy, + /*isConstant=*/true, llvm::GlobalValue::InternalLinkage, + /*Initializer=*/RuntimeHandleInitializer, RuntimeHandleName, Just a cosmetical remark: Is there any reason to keep the `/*isConstant=*/`, `/*Initializer=*/`, ... comments? I think it would be better to avoid them. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141700/new/ https://reviews.llvm.org/D141700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D141700: AMDGPU: Move enqueued block handling into clang
arsenm updated this revision to Diff 489394. arsenm added a comment. Rename CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141700/new/ https://reviews.llvm.org/D141700 Files: clang/lib/CodeGen/TargetInfo.cpp clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel-linking.cl clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl llvm/docs/AMDGPUUsage.rst llvm/lib/IR/AutoUpgrade.cpp llvm/lib/IR/CMakeLists.txt llvm/lib/Target/AMDGPU/AMDGPU.h llvm/lib/Target/AMDGPU/AMDGPUExportKernelRuntimeHandles.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h llvm/lib/Target/AMDGPU/AMDGPUOpenCLEnqueuedBlockLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp llvm/lib/Target/AMDGPU/CMakeLists.txt llvm/test/Bitcode/amdgpu-autoupgrade-enqueued-block.ll llvm/test/CodeGen/AMDGPU/amdgpu-export-kernel-runtime-handles.ll llvm/test/CodeGen/AMDGPU/enqueue-kernel.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll llvm/test/CodeGen/AMDGPU/llc-pipeline.ll Index: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll === --- llvm/test/CodeGen/AMDGPU/llc-pipeline.ll +++ llvm/test/CodeGen/AMDGPU/llc-pipeline.ll @@ -41,7 +41,7 @@ ; GCN-O0-NEXT:Call Graph SCC Pass Manager ; GCN-O0-NEXT: Inliner for always_inline functions ; GCN-O0-NEXT:A No-Op Barrier Pass -; GCN-O0-NEXT:Lower OpenCL enqueued blocks +; GCN-O0-NEXT:Externalize enqueued block runtime handles ; GCN-O0-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O0-NEXT:FunctionPass Manager ; GCN-O0-NEXT: Expand Atomic instructions @@ -186,7 +186,7 @@ ; GCN-O1-NEXT:Call Graph SCC Pass Manager ; GCN-O1-NEXT: Inliner for always_inline functions ; GCN-O1-NEXT:A No-Op Barrier Pass -; GCN-O1-NEXT:Lower OpenCL enqueued blocks +; GCN-O1-NEXT:Externalize enqueued block runtime handles ; GCN-O1-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-NEXT:FunctionPass Manager ; GCN-O1-NEXT: Infer address spaces @@ -454,7 +454,7 @@ ; GCN-O1-OPTS-NEXT:Call Graph SCC Pass Manager ; GCN-O1-OPTS-NEXT: Inliner for always_inline functions ; GCN-O1-OPTS-NEXT:A No-Op Barrier Pass -; GCN-O1-OPTS-NEXT:Lower OpenCL enqueued blocks +; GCN-O1-OPTS-NEXT:Externalize enqueued block runtime handles ; GCN-O1-OPTS-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-OPTS-NEXT:FunctionPass Manager ; GCN-O1-OPTS-NEXT: Infer address spaces @@ -754,7 +754,7 @@ ; GCN-O2-NEXT:Call Graph SCC Pass Manager ; GCN-O2-NEXT: Inliner for always_inline functions ; GCN-O2-NEXT:A No-Op Barrier Pass -; GCN-O2-NEXT:Lower OpenCL enqueued blocks +; GCN-O2-NEXT:Externalize enqueued block runtime handles ; GCN-O2-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O2-NEXT:FunctionPass Manager ; GCN-O2-NEXT: Infer address spaces @@ -1057,7 +1057,7 @@ ; GCN-O3-NEXT:Call Graph SCC Pass Manager ; GCN-O3-NEXT: Inliner for always_inline functions ; GCN-O3-NEXT:A No-Op Barrier Pass -; GCN-O3-NEXT:Lower OpenCL enqueued blocks +; GCN-O3-NEXT:Externalize enqueued block runtime handles ; GCN-O3-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O3-NEXT:FunctionPass Manager ; GCN-O3-NEXT: Infer address spaces Index: llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll === --- llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll +++ llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll @@ -14,7 +14,8 @@ %struct.B = type { ptr addrspace(1)} %opencl.clk_event_t = type opaque -@__test_block_invoke_kernel_runtime_handle = external addrspace(1) externally_initialized constant ptr addrspace(1) +@__test_block_invoke_kernel_runtime_handle = external addrspace(1) externally_initialized constant ptr addrspace(1), section ".amdgpu.kernel.runtime.handle" +@not.a.handle = external addrspace(1) externally_initialized constant ptr addrspace(1) ; CHECK: --- ; CHECK: Version: [ 1, 0 ] @@ -1808,7 +1809,7 @@ ; CHECK-NEXT: ValueKind: HiddenMultiGridSyncArg ; CHECK-NEXT: AddrSpaceQual: Global define amdgpu_kernel void @__test_block_invoke_kernel( -<{ i32, i32, ptr, ptr addrspace(1), i8 }> %arg) #1 +<{ i32, i32, ptr, ptr addrspace(1), i8 }> %arg) #1 !associated !112 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_type !110 !kernel_arg_base_type !110 !kernel_arg_type_qual !4 { ret void @@ -1866,9 +1867,30 @@ ret void } +; Make sure the RuntimeHandle is not reported +; CHECK: - Name:associated_global_not_handle +; CHECK-NEXT: SymbolName: 'associated_global_not_handle@kd' +; CHECK-NEXT: Language:OpenCL C +; CHECK-NEXT:
[PATCH] D141700: AMDGPU: Move enqueued block handling into clang
Anastasia added inline comments. Comment at: clang/lib/CodeGen/TargetInfo.cpp:12440 +/// AMDHSAKernelDescriptor.h) +static llvm::StructType *getKernelDescriptorType(llvm::LLVMContext &C) { + llvm::Type *Int8 = llvm::IntegerType::getInt8Ty(C); Is this AMDGPU target specific? If so perhaps it's better to reflect this in the name. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D141700/new/ https://reviews.llvm.org/D141700 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D141700: AMDGPU: Move enqueued block handling into clang
arsenm created this revision. arsenm added reviewers: yaxunl, t-tye, b-sumner, rampitec, AMDGPU, Anastasia, JonChesterfield, jhuber6. Herald added subscribers: kosarev, foad, kerbowa, hiraditya, tpr, dstuttard, jvesely, kzhuravl. Herald added a project: All. arsenm requested review of this revision. Herald added a subscriber: wdng. Herald added a project: LLVM. The previous implementation wasn't maintaining a faithful IR representation of how this really works. The value returned by createEnqueuedBlockKernel wasn't actually used as a function, and hacked up later to be a pointer to the runtime handle global variable. In reality, the enqueued block is a struct where the first field is a pointer to the kernel descriptor, not the kernel itself. We were also relying on passing around a reference to a global using a string attribute containing its name. It's better to base this on a proper IR symbol reference during final emission. This now avoids using a function attribute on kernels and avoids using the additional "runtime-handle" attribute to populate the final metadata. Instead, associate the runtime handle reference to the kernel with the !associated global metadata. We can then get a final, correctly mangled name at the end. I couldn't figure out how to get rename-with-external-symbol behavior using a combination of comdats and aliases, so leaves an IR pass to externalize the runtime handles for codegen. If anything breaks, it's most likely this, so leave avoiding this for a later step. Use a special section name to enable this behavior. This also means it's possible to declare enqueuable kernels in source without going through the dedicated block syntax or other dedicated compiler support. We could move towards initializing the runtime handle in the compiler/linker. I have a working patch where the linker sets up the first field of the handle, avoiding the need to export the block kernel symbol for the runtime. We would need new relocations to get the private and group sizes, but that would avoid the runtime's special case handling that requires the device_enqueue_symbol metadata field. Handle autoupgrade from the old kernel attribute. Not sure where I could put the code shared with clang (maybe could rename AMDGPUEmitPrintf to AMDGPUUtils). https://reviews.llvm.org/D141700 Files: clang/lib/CodeGen/TargetInfo.cpp clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel-linking.cl clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl llvm/docs/AMDGPUUsage.rst llvm/lib/IR/AutoUpgrade.cpp llvm/lib/IR/CMakeLists.txt llvm/lib/Target/AMDGPU/AMDGPU.h llvm/lib/Target/AMDGPU/AMDGPUExportKernelRuntimeHandles.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.h llvm/lib/Target/AMDGPU/AMDGPUOpenCLEnqueuedBlockLowering.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp llvm/lib/Target/AMDGPU/CMakeLists.txt llvm/test/Bitcode/amdgpu-autoupgrade-enqueued-block.ll llvm/test/CodeGen/AMDGPU/amdgpu-export-kernel-runtime-handles.ll llvm/test/CodeGen/AMDGPU/enqueue-kernel.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full-v3.ll llvm/test/CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll llvm/test/CodeGen/AMDGPU/llc-pipeline.ll Index: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll === --- llvm/test/CodeGen/AMDGPU/llc-pipeline.ll +++ llvm/test/CodeGen/AMDGPU/llc-pipeline.ll @@ -41,7 +41,7 @@ ; GCN-O0-NEXT:Call Graph SCC Pass Manager ; GCN-O0-NEXT: Inliner for always_inline functions ; GCN-O0-NEXT:A No-Op Barrier Pass -; GCN-O0-NEXT:Lower OpenCL enqueued blocks +; GCN-O0-NEXT:Externalize enqueued block runtime handles ; GCN-O0-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O0-NEXT:FunctionPass Manager ; GCN-O0-NEXT: Expand Atomic instructions @@ -186,7 +186,7 @@ ; GCN-O1-NEXT:Call Graph SCC Pass Manager ; GCN-O1-NEXT: Inliner for always_inline functions ; GCN-O1-NEXT:A No-Op Barrier Pass -; GCN-O1-NEXT:Lower OpenCL enqueued blocks +; GCN-O1-NEXT:Externalize enqueued block runtime handles ; GCN-O1-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-NEXT:FunctionPass Manager ; GCN-O1-NEXT: Infer address spaces @@ -454,7 +454,7 @@ ; GCN-O1-OPTS-NEXT:Call Graph SCC Pass Manager ; GCN-O1-OPTS-NEXT: Inliner for always_inline functions ; GCN-O1-OPTS-NEXT:A No-Op Barrier Pass -; GCN-O1-OPTS-NEXT:Lower OpenCL enqueued blocks +; GCN-O1-OPTS-NEXT:Externalize enqueued block runtime handles ; GCN-O1-OPTS-NEXT:Lower uses of LDS variables from non-kernel functions ; GCN-O1-OPTS-NEXT:FunctionPass Manager ; GCN-O1-OPTS-NEXT: Infer address spaces @@ -754,7 +754,7 @@ ; GCN-O2-NEXT:Call Graph SCC Pass Manager ; GCN-O2-NEXT: Inliner for always_inline functions ; GCN-O2-NEXT:A No-Op Barrier Pass -;