[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -705,28 +740,7 @@ convertOmpTaskOp(omp::TaskOp taskOp, llvm::IRBuilderBase , }; SmallVector dds; - if (!taskOp.getDependVars().empty() && taskOp.getDepends()) { -for (auto dep : - llvm::zip(taskOp.getDependVars(), taskOp.getDepends()->getValue())) { - llvm::omp::RTLDependenceKindTy type; - switch ( - cast(std::get<1>(dep)).getValue()) { - case mlir::omp::ClauseTaskDepend::taskdependin: -type = llvm::omp::RTLDependenceKindTy::DepIn; -break; - // The OpenMP runtime requires that the codegen for 'depend' clause for - // 'out' dependency kind must be the same as codegen for 'depend' clause - // with 'inout' dependency. - case mlir::omp::ClauseTaskDepend::taskdependout: - case mlir::omp::ClauseTaskDepend::taskdependinout: -type = llvm::omp::RTLDependenceKindTy::DepInOut; -break; - }; - llvm::Value *depVal = moduleTranslation.lookupValue(std::get<0>(dep)); - llvm::OpenMPIRBuilder::DependData dd(type, depVal->getType(), depVal); - dds.emplace_back(dd); -} - } + buildDependData(taskOp.getOperation(), moduleTranslation, dds); skatrak wrote: Suggestion to how this call could look if following my comment above: ```suggestion buildDependData(moduleTranslation, taskOp.getDepends(), taskOp.getDependVars(), dds); ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -682,6 +682,41 @@ convertOmpTeams(omp::TeamsOp op, llvm::IRBuilderBase , return bodyGenStatus; } +static void +buildDependData(Operation *op, LLVM::ModuleTranslation , skatrak wrote: I'd suggest taking `std::optional depends` and `OperandRange dependVars` as arguments instead of `Operation *op` here. This way you don't need to try casting `op` to each supported operation below, but rather make the body of this function what the `processDepends` lambda inside currently has. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/skatrak edited https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/skatrak approved this pull request. Thank you Pranav, LGTM. I have just one more comment but no need to wait for another review by me before merging this. Just make sure @Meinersbur is happy with it as well! https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
bhandarkar-pranav wrote: @skatrak - I have updated the PR with changes based on your feedback. Could you please take a look? https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -3088,10 +3102,13 @@ convertOmpTarget(Operation , llvm::IRBuilderBase , if (!mapData.IsDeclareTarget[i] && !mapData.IsAMember[i]) kernelInput.push_back(mapData.OriginalValue[i]); } + SmallVector dds; + buildDependData(targetOp.getOperation(), moduleTranslation, dds); bhandarkar-pranav wrote: @skatrak - Apologies for missing your previous comment about moving the empty depends list check into `buildDependData`. I have done that now in this iteration of the PR. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5229,13 +5382,284 @@ static void emitTargetOutlinedFunction( OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, true, OutlinedFn, OutlinedFnID); } +OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask( +Function *OutlinedFn, Value *OutlinedFnID, +EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , +Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP, +SmallVector , +bool HasNoWait) { + + // When we arrive at this function, the target region itself has been + // outlined into the function OutlinedFn. + // So at ths point, for + // -- + // void user_code_that_offloads(...) { + // omp target depend(..) map(from:a) map(to:b, c) + //a = b + c + // } + // + // -- + // + // we have + // + // -- + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // ;; target region has been outlined and now we need to + // ;; offload to it via a target task. + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + // We have to now do the following + // (i) Make an offloading call to outlined_device_function using the OpenMP + // RTL. See 'kernel_launch_function' in the pseudo code below. This is + // emitted by emitKernelLaunch + // (ii) Create a task entry point function that calls kernel_launch_function + // and is the entry point for the target task. See + // '@.omp_target_task_proxy_func in the pseudocode below. + // (iii) Create a task with the task entry point created in (ii) + // + // That is we create the following + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // + // %structArg = alloca { ptr, ptr, ptr }, align 8 + // %strucArg[0] = %.offload_baseptrs + // %strucArg[1] = %.offload_ptrs + // %strucArg[2] = %.offload_mappers + // proxy_target_task = @__kmpc_omp_task_alloc(..., + // @.omp_target_task_proxy_func) + // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg)) + // dependencies_array = ... + // ;; if nowait not present + // call @__kmpc_omp_wait_deps(..., dependencies_array) + // call @__kmpc_omp_task_begin_if0(...) + // call @ @.omp_target_task_proxy_func(i32 thread_id, ptr + // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...) + // } + // + // define internal void @.omp_target_task_proxy_func(i32 %thread.id, + // ptr %task) { + // %structArg = alloca {ptr, ptr, ptr} + // %shared_data = load (getelementptr %task, 0, 0) + // mempcy(%structArg, %shared_data, sizeof(structArg)) + // kernel_launch_function(%thread.id, %structArg) + // } + // + // We need the proxy function because the signature of the task entry point + // expected by kmpc_omp_task is always the same and will be different from + // that of the kernel_launch function. + // + // kernel_launch_function is generated by emitKernelLaunch and has the + // always_inline attribute. + // void kernel_launch_function(thread_id, + // structArg) alwaysinline { + // %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8 + // offload_baseptrs = load(getelementptr structArg, 0, 0) + // offload_ptrs = load(getelementptr structArg, 0, 1) + // offload_mappers = load(getelementptr structArg, 0, 2) + // ; setup kernel_args using offload_baseptrs, offload_ptrs and + // ; offload_mappers + // call i32 @__tgt_target_kernel(..., + // outlined_device_function, + // ptr %kernel_args) + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + BasicBlock *TargetTaskBodyBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.body"); + BasicBlock *TargetTaskAllocaBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.alloca"); + + InsertPointTy TargetTaskAllocaIP = + InsertPointTy(TargetTaskAllocaBB, TargetTaskAllocaBB->begin()); + InsertPointTy TargetTaskBodyIP = + InsertPointTy(TargetTaskBodyBB, TargetTaskBodyBB->begin()); + + OutlineInfo OI; + OI.EntryBB = TargetTaskAllocaBB; + OI.OuterAllocaBB = AllocaIP.getBlock(); + + // Add the thread
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -0,0 +1,83 @@ +! Offloading test checking the use of the depend clause on +! the target construct +! REQUIRES: flang, amdgcn-amd-amdhsa +! UNSUPPORTED: nvptx64-nvidia-cuda +! UNSUPPORTED: nvptx64-nvidia-cuda-LTO +! UNSUPPORTED: aarch64-unknown-linux-gnu +! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +! UNSUPPORTED: x86_64-pc-linux-gnu +! UNSUPPORTED: x86_64-pc-linux-gnu-LTO + +! RUN: %libomptarget-compile-fortran-run-and-check-generic +program main + implicit none + integer :: a = 0 + INTERFACE + FUNCTION omp_get_device_num() BIND(C) + USE, INTRINSIC :: iso_c_binding, ONLY: C_INT + integer :: omp_get_device_num + END FUNCTION omp_get_device_num + END INTERFACE + + call foo(5, a) + print*, "=== FORTRAN Test passed! ===" + print*, "foo(5) returned ", a, ", expected 6\n" + + ! stop 0 + contains +subroutine foo(N, r) + integer, intent(in) :: N + integer, intent(out) :: r + integer :: z, i, j, k, accumulator + z = 1 + accumulator = 0 + ! Spawn 3 threads + !$omp parallel num_threads(3) + + ! A single thread will then create two tasks skatrak wrote: Nit: It looks like comments in this file are split very early on, can you extend to 80 characters? https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -705,28 +728,9 @@ convertOmpTaskOp(omp::TaskOp taskOp, llvm::IRBuilderBase , }; SmallVector dds; - if (!taskOp.getDependVars().empty() && taskOp.getDepends()) { -for (auto dep : - llvm::zip(taskOp.getDependVars(), taskOp.getDepends()->getValue())) { - llvm::omp::RTLDependenceKindTy type; - switch ( - cast(std::get<1>(dep)).getValue()) { - case mlir::omp::ClauseTaskDepend::taskdependin: -type = llvm::omp::RTLDependenceKindTy::DepIn; -break; - // The OpenMP runtime requires that the codegen for 'depend' clause for - // 'out' dependency kind must be the same as codegen for 'depend' clause - // with 'inout' dependency. - case mlir::omp::ClauseTaskDepend::taskdependout: - case mlir::omp::ClauseTaskDepend::taskdependinout: -type = llvm::omp::RTLDependenceKindTy::DepInOut; -break; - }; - llvm::Value *depVal = moduleTranslation.lookupValue(std::get<0>(dep)); - llvm::OpenMPIRBuilder::DependData dd(type, depVal->getType(), depVal); - dds.emplace_back(dd); -} - } + if (!taskOp.getDependVars().empty() && taskOp.getDepends()) skatrak wrote: It looks like this comment hasn't been addressed. Let me know if you just missed it, if you disagree with it or if it isn't very clear and I should try to explain better. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -0,0 +1,83 @@ +! Offloading test checking the use of the depend clause on +! the target construct +! REQUIRES: flang, amdgcn-amd-amdhsa +! UNSUPPORTED: nvptx64-nvidia-cuda +! UNSUPPORTED: nvptx64-nvidia-cuda-LTO +! UNSUPPORTED: aarch64-unknown-linux-gnu +! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +! UNSUPPORTED: x86_64-pc-linux-gnu +! UNSUPPORTED: x86_64-pc-linux-gnu-LTO + +! RUN: %libomptarget-compile-fortran-run-and-check-generic +program main + implicit none + integer :: a = 0 + INTERFACE + FUNCTION omp_get_device_num() BIND(C) + USE, INTRINSIC :: iso_c_binding, ONLY: C_INT + integer :: omp_get_device_num + END FUNCTION omp_get_device_num + END INTERFACE + + call foo(5, a) + print*, "=== FORTRAN Test passed! ===" + print*, "foo(5) returned ", a, ", expected 6\n" + + ! stop 0 + contains +subroutine foo(N, r) + integer, intent(in) :: N + integer, intent(out) :: r + integer :: z, i, j, k, accumulator + z = 1 + accumulator = 0 + ! Spawn 3 threads + !$omp parallel num_threads(3) + + ! A single thread will then create two tasks + ! One is the 'producer' and potentially slower + ! task that updates 'z' to 'N'. The second is an + ! offloaded target task that increments 'z'. + ! If the depend clauses work properly, the + ! target task should wait for the 'producer' + ! task to complete before incrementing z + ! We use !$omp single here because only + ! the depend clause establishes dependencies + ! between sibling tasks only. This is the easiest + ! way of creating two sibling tasks. + !$omp single + !$omp task depend(out: z) shared(z) + do while (k < 32000) skatrak wrote: Couldn't this loop be written more concisely as `do k=1, 32000 ... end do`? Same for the other two loops nested in it. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -0,0 +1,83 @@ +! Offloading test checking the use of the depend clause on +! the target construct +! REQUIRES: flang, amdgcn-amd-amdhsa +! UNSUPPORTED: nvptx64-nvidia-cuda +! UNSUPPORTED: nvptx64-nvidia-cuda-LTO +! UNSUPPORTED: aarch64-unknown-linux-gnu +! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +! UNSUPPORTED: x86_64-pc-linux-gnu +! UNSUPPORTED: x86_64-pc-linux-gnu-LTO + +! RUN: %libomptarget-compile-fortran-run-and-check-generic +program main + implicit none + integer :: a = 0 + INTERFACE + FUNCTION omp_get_device_num() BIND(C) + USE, INTRINSIC :: iso_c_binding, ONLY: C_INT + integer :: omp_get_device_num + END FUNCTION omp_get_device_num + END INTERFACE + + call foo(5, a) + print*, "=== FORTRAN Test passed! ===" + print*, "foo(5) returned ", a, ", expected 6\n" + + ! stop 0 + contains +subroutine foo(N, r) + integer, intent(in) :: N + integer, intent(out) :: r + integer :: z, i, j, k, accumulator + z = 1 + accumulator = 0 + ! Spawn 3 threads + !$omp parallel num_threads(3) + + ! A single thread will then create two tasks + ! One is the 'producer' and potentially slower + ! task that updates 'z' to 'N'. The second is an + ! offloaded target task that increments 'z'. + ! If the depend clauses work properly, the + ! target task should wait for the 'producer' + ! task to complete before incrementing z + ! We use !$omp single here because only + ! the depend clause establishes dependencies + ! between sibling tasks only. This is the easiest + ! way of creating two sibling tasks. + !$omp single + !$omp task depend(out: z) shared(z) + do while (k < 32000) + do while (j < 32766) +do while (i < 32766) + ! dumb loop nest to slow down the update of + ! z + i = i + 1 + ! Adding a function call slows down the producer + ! to the point that removing the depend clause + ! from the target construct below frequently + ! results in the wrong answer. + accumulator = accumulator + omp_get_device_num() +end do +j = j +1 + end do + k = k + 1 + end do + z = N + !$omp end task + + ! z is 5 now. Increment z to 6. + !$omp target map(tofrom: z) depend(in:z) + z = z + 1 + !$omp end target + !$omp end single + !$omp end parallel + ! Use 'accumulator' so it is not optimized away + ! by the compiler. + print *, accumulator + r = z +end subroutine foo skatrak wrote: ```suggestion end subroutine foo ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/skatrak edited https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5229,13 +5382,284 @@ static void emitTargetOutlinedFunction( OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, true, OutlinedFn, OutlinedFnID); } +OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask( +Function *OutlinedFn, Value *OutlinedFnID, +EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , +Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP, +SmallVector , +bool HasNoWait) { + + // When we arrive at this function, the target region itself has been + // outlined into the function OutlinedFn. + // So at ths point, for + // -- + // void user_code_that_offloads(...) { + // omp target depend(..) map(from:a) map(to:b, c) + //a = b + c + // } + // + // -- + // + // we have + // + // -- + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // ;; target region has been outlined and now we need to + // ;; offload to it via a target task. + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + // We have to now do the following + // (i) Make an offloading call to outlined_device_function using the OpenMP + // RTL. See 'kernel_launch_function' in the pseudo code below. This is + // emitted by emitKernelLaunch + // (ii) Create a task entry point function that calls kernel_launch_function + // and is the entry point for the target task. See + // '@.omp_target_task_proxy_func in the pseudocode below. + // (iii) Create a task with the task entry point created in (ii) + // + // That is we create the following + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // + // %structArg = alloca { ptr, ptr, ptr }, align 8 + // %strucArg[0] = %.offload_baseptrs + // %strucArg[1] = %.offload_ptrs + // %strucArg[2] = %.offload_mappers + // proxy_target_task = @__kmpc_omp_task_alloc(..., + // @.omp_target_task_proxy_func) + // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg)) + // dependencies_array = ... + // ;; if nowait not present + // call @__kmpc_omp_wait_deps(..., dependencies_array) + // call @__kmpc_omp_task_begin_if0(...) + // call @ @.omp_target_task_proxy_func(i32 thread_id, ptr + // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...) + // } + // + // define internal void @.omp_target_task_proxy_func(i32 %thread.id, + // ptr %task) { + // %structArg = alloca {ptr, ptr, ptr} + // %shared_data = load (getelementptr %task, 0, 0) + // mempcy(%structArg, %shared_data, sizeof(structArg)) + // kernel_launch_function(%thread.id, %structArg) + // } + // + // We need the proxy function because the signature of the task entry point + // expected by kmpc_omp_task is always the same and will be different from + // that of the kernel_launch function. + // + // kernel_launch_function is generated by emitKernelLaunch and has the + // always_inline attribute. + // void kernel_launch_function(thread_id, + // structArg) alwaysinline { + // %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8 + // offload_baseptrs = load(getelementptr structArg, 0, 0) + // offload_ptrs = load(getelementptr structArg, 0, 1) + // offload_mappers = load(getelementptr structArg, 0, 2) + // ; setup kernel_args using offload_baseptrs, offload_ptrs and + // ; offload_mappers + // call i32 @__tgt_target_kernel(..., + // outlined_device_function, + // ptr %kernel_args) + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + BasicBlock *TargetTaskBodyBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.body"); + BasicBlock *TargetTaskAllocaBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.alloca"); + + InsertPointTy TargetTaskAllocaIP = + InsertPointTy(TargetTaskAllocaBB, TargetTaskAllocaBB->begin()); + InsertPointTy TargetTaskBodyIP = + InsertPointTy(TargetTaskBodyBB, TargetTaskBodyBB->begin()); + + OutlineInfo OI; + OI.EntryBB = TargetTaskAllocaBB; + OI.OuterAllocaBB = AllocaIP.getBlock(); + + // Add the thread
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5272,36 +5682,53 @@ static void emitTargetCall(OpenMPIRBuilder , IRBuilderBase , Value *DynCGGroupMem = Builder.getInt32(0); bool HasNoWait = false; + bool HasDependencies = Dependencies.size() > 0; + bool RequiresOuterTargetTask = HasNoWait || HasDependencies; OpenMPIRBuilder::TargetKernelArgs KArgs(NumTargetItems, RTArgs, NumIterations, NumTeamsVal, NumThreadsVal, DynCGGroupMem, HasNoWait); - Builder.restoreIP(OMPBuilder.emitKernelLaunch( - Builder, OutlinedFn, OutlinedFnID, EmitTargetCallFallbackCB, KArgs, - DeviceID, RTLoc, AllocaIP)); + // The presence of certain clauses on the target directive require the + // explicit generation of the target task. + if (RequiresOuterTargetTask) { +OMPBuilder.emitTargetTask(OutlinedFn, OutlinedFnID, skatrak wrote: I agree with you, since I have noticed this redundant pattern as well, though in a way I think it makes sense from a uniformity point of view. There are several codegen functions that return an insertion point, which should be used by the caller to proceed with any further codegen. I think that the fact that oftentimes the insertion point that's being returned is indeed already the current one shouldn't be relied on by the caller unless we make this a contract followed by all such functions. To be consistent, I think these codegen function must either always return an insertion point which must be used by the caller (the current approach) or all codegen functions must not return an insertion point and set it themselves to the spot follow-up codegen should go. In my opinion, I believe it's currently better to keep the current approach even though it results in some redundant setting of the insertion point. Skipping the call to `Builder.restoreIP()` in the caller could result in unintended problems in the future, since the current contract is that the returned insertion point is the one that should be used and not the current one. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/skatrak commented: Thanks for working on my previous comments. I just have a couple more minor ones. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5272,36 +5682,53 @@ static void emitTargetCall(OpenMPIRBuilder , IRBuilderBase , Value *DynCGGroupMem = Builder.getInt32(0); bool HasNoWait = false; + bool HasDependencies = Dependencies.size() > 0; + bool RequiresOuterTargetTask = HasNoWait || HasDependencies; OpenMPIRBuilder::TargetKernelArgs KArgs(NumTargetItems, RTArgs, NumIterations, NumTeamsVal, NumThreadsVal, DynCGGroupMem, HasNoWait); - Builder.restoreIP(OMPBuilder.emitKernelLaunch( - Builder, OutlinedFn, OutlinedFnID, EmitTargetCallFallbackCB, KArgs, - DeviceID, RTLoc, AllocaIP)); + // The presence of certain clauses on the target directive require the + // explicit generation of the target task. + if (RequiresOuterTargetTask) { +OMPBuilder.emitTargetTask(OutlinedFn, OutlinedFnID, bhandarkar-pranav wrote: Thanks for point this out. I want to use this opportunity to understand this idiom I see all around in `OMPIRBuilder.cpp` which is ``Builder.restoreIP(FunctThatReturnsInsertPointTy())``. Now, if the only return from `FuncThatReturnsInsertPointTy` is `Builder.saveIP()`, then do I need a call to `Builder.restoreIP()` on return ? Because `emitTargetTask()` returns `Builder.saveIP()`, which, in other words, means it is returning the state of the Builder's insert point. We then restore the same `Builder` to that insert point which sounds redundant. However, I see this in a number of places in `OMPIRBuilder.cpp` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
bhandarkar-pranav wrote: @Meinersbur, @ergawy, @skatrak - Could you please give this another look. I have updated the testcase `target-depend.f90` because the previous version was flaky; I had used `depend` incorrectly which I have fixed now. (Previously, `depend` was used on tasks that were not sibling tasks) https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/bhandarkar-pranav edited https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/ergawy approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); + OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(), +StaleCI->getIterator()); + LLVMContext = StaleCI->getParent()->getContext(); + Type *ThreadIDTy = Type::getInt32Ty(Ctx); + Type *TaskPtrTy = OMPBuilder.TaskPtr; + Type *TaskTy = OMPBuilder.Task; + auto ProxyFnTy = + FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy}, +/* isVarArg */ false); + auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage, + ".omp_target_task_proxy_func", + Builder.GetInsertBlock()->getModule()); + + BasicBlock *EntryBB = + BasicBlock::Create(Builder.getContext(), "entry", ProxyFn); + Builder.SetInsertPoint(EntryBB); + + bool HasShareds = StaleCI->arg_size() > 1; + // TODO: This is a temporary assert to prove to ourselves that + // the outlined target launch function is always going to have + // atmost two arguments if there is any data shared between + // host and device. + assert((!HasShareds || (StaleCI->arg_size() == 2)) && + "StaleCI with shareds should have exactly two arguments."); + if (HasShareds) { +AllocaInst *ArgStructAlloca = +dyn_cast(StaleCI->getArgOperand(1)); +assert(ArgStructAlloca && + "Unable to find the alloca instruction corresponding to arguments " + "for extracted function"); +StructType *ArgStructType = +dyn_cast(ArgStructAlloca->getAllocatedType()); +LLVM_DEBUG(dbgs() << "ArgStructType = " << *ArgStructType << "\n"); + +AllocaInst *NewArgStructAlloca = +Builder.CreateAlloca(ArgStructType, nullptr, "structArg"); +Value *TaskT = ProxyFn->getArg(1); +Value *ThreadId = ProxyFn->getArg(0); +LLVM_DEBUG(dbgs() << "TaskT = " << *TaskT << "\n"); +Value *SharedsSize = +Builder.getInt64(M.getDataLayout().getTypeStoreSize(ArgStructType)); + +Value *Shareds = Builder.CreateStructGEP(TaskTy, TaskT, 0); +LoadInst *LoadShared = +Builder.CreateLoad(PointerType::getUnqual(Ctx), Shareds); + +// TODO: Are these alignment values correct? bhandarkar-pranav wrote: Thanks https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); + OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(), +StaleCI->getIterator()); + LLVMContext = StaleCI->getParent()->getContext(); + Type *ThreadIDTy = Type::getInt32Ty(Ctx); + Type *TaskPtrTy = OMPBuilder.TaskPtr; + Type *TaskTy = OMPBuilder.Task; + auto ProxyFnTy = + FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy}, +/* isVarArg */ false); + auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage, + ".omp_target_task_proxy_func", + Builder.GetInsertBlock()->getModule()); + + BasicBlock *EntryBB = + BasicBlock::Create(Builder.getContext(), "entry", ProxyFn); + Builder.SetInsertPoint(EntryBB); + + bool HasShareds = StaleCI->arg_size() > 1; bhandarkar-pranav wrote: These are the values that are shared between the encountering task and the `target task` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5229,13 +5362,288 @@ static void emitTargetOutlinedFunction( OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, true, OutlinedFn, OutlinedFnID); } +OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask( +Function *OutlinedFn, Value *OutlinedFnID, +EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , +Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP, +SmallVector , +bool HasNoWait) { + + // When we arrive at this function, the target region itself has been + // outlined into the function OutlinedFn. + // So at ths point, for + // -- + // void user_code_that_offloads(...) { + // omp target depend(..) map(from:a) map(to:b, c) + //a = b + c + // } + // + // -- + // + // we have + // + // -- + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // ;; target region has been outlined and now we need to + // ;; offload to it via a target task. + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + // We have to now do the following + // (i) Make an offloading call to outlined_device_function using the OpenMP + // RTL. See 'kernel_launch_function' in the pseudo code below. This is + // emitted by emitKernelLaunch + // (ii) Create a task entry point function that calls kernel_launch_function + // and is the entry point for the target task. See + // '@.omp_target_task_proxy_func in the pseudocode below. + // (iii) Create a task with the task entry point created in (ii) + // + // That is we create the following + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // + // %structArg = alloca { ptr, ptr, ptr }, align 8 + // %strucArg[0] = %.offload_baseptrs + // %strucArg[1] = %.offload_ptrs + // %strucArg[2] = %.offload_mappers + // proxy_target_task = @__kmpc_omp_task_alloc(..., + // @.omp_target_task_proxy_func) + // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg)) + // dependencies_array = ... + // ;; if nowait not present + // call @__kmpc_omp_wait_deps(..., dependencies_array) + // call @__kmpc_omp_task_begin_if0(...) + // call @ @.omp_target_task_proxy_func(i32 thread_id, ptr + // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...) + // } + // + // define internal void @.omp_target_task_proxy_func(i32 %thread.id, + // ptr %task) { + // %structArg = alloca {ptr, ptr, ptr} + // %shared_data = load (getelementptr %task, 0, 0) + // mempcy(%structArg, %shared_data, sizeof(structArg)) + // kernel_launch_function(%thread.id, %structArg) + // } + // + // We need the proxy function because the signature of the task entry point + // expected by kmpc_omp_task is always the same and will be different from + // that of the kernel_launch function. + // + // kernel_launch_function is generated by emitKernelLaunch and has the + // always_inline attribute. void kernel_launch_function(thread_id, + //structArg) + //alwaysinline { + // %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8 + // offload_baseptrs = load(getelementptr structArg, 0, 0) + // offload_ptrs = load(getelementptr structArg, 0, 1) + // offload_mappers = load(getelementptr structArg, 0, 2) + // ; setup kernel_args using offload_baseptrs, offload_ptrs and + // ; offload_mappers + // call i32 @__tgt_target_kernel(..., + // outlined_device_function, + // ptr %kernel_args) + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + BasicBlock *TargetTaskBodyBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.body"); + BasicBlock *TargetTaskAllocaBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.alloca"); + + InsertPointTy TargetTaskAllocaIP = + InsertPointTy(TargetTaskAllocaBB, TargetTaskAllocaBB->begin()); + InsertPointTy TargetTaskBodyIP = + InsertPointTy(TargetTaskBodyBB, TargetTaskBodyBB->begin()); + + OutlineInfo OI; + OI.EntryBB =
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1762,6 +1762,26 @@ class OpenMPIRBuilder { EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , Value *DeviceID, Value *RTLoc, InsertPointTy AllocaIP); + /// Generate a target-task for the target construct + /// + /// \param OutlinedFn The outlined device/target kernel function. + /// \param OutlinedFnID The ooulined function ID. + /// \param EmitTargetCallFallbackCB Call back function to generate host + ///fallback code. + /// \param Args Data structure holding information about the kernel arguments. + /// \param DeviceID Identifier for the device via the 'device' clause. + /// \param RTLoc Source location identifier + /// \param AllocaIP The insertion point to be used for alloca instructions. + /// \param Dependencies Vector of DependData objects holding information of + ///dependencies as specified by the 'depend' clause. + /// \param HasNoWait True if the target construct had 'nowait' on it, false + ///otherwise + InsertPointTy emitTargetTask( + Function *OutlinedFn, Value *OutlinedFnID, + EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , + Value *DeviceID, Value *RTLoc, InsertPointTy AllocaIP, + SmallVector , bool HasNoWait); bhandarkar-pranav wrote: I don't think this is safe. I capture Dependencies in a lambda so storing ArrayRef which doesnt manage the lifetime of the data pointer isn't safe. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5229,13 +5362,288 @@ static void emitTargetOutlinedFunction( OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, true, OutlinedFn, OutlinedFnID); } +OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask( +Function *OutlinedFn, Value *OutlinedFnID, +EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , +Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP, +SmallVector , +bool HasNoWait) { + + // When we arrive at this function, the target region itself has been + // outlined into the function OutlinedFn. + // So at ths point, for + // -- + // void user_code_that_offloads(...) { + // omp target depend(..) map(from:a) map(to:b, c) + //a = b + c + // } + // + // -- + // + // we have + // + // -- + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // ;; target region has been outlined and now we need to + // ;; offload to it via a target task. + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + // We have to now do the following + // (i) Make an offloading call to outlined_device_function using the OpenMP + // RTL. See 'kernel_launch_function' in the pseudo code below. This is + // emitted by emitKernelLaunch + // (ii) Create a task entry point function that calls kernel_launch_function + // and is the entry point for the target task. See + // '@.omp_target_task_proxy_func in the pseudocode below. + // (iii) Create a task with the task entry point created in (ii) + // + // That is we create the following + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // + // %structArg = alloca { ptr, ptr, ptr }, align 8 + // %strucArg[0] = %.offload_baseptrs + // %strucArg[1] = %.offload_ptrs + // %strucArg[2] = %.offload_mappers + // proxy_target_task = @__kmpc_omp_task_alloc(..., + // @.omp_target_task_proxy_func) + // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg)) + // dependencies_array = ... + // ;; if nowait not present + // call @__kmpc_omp_wait_deps(..., dependencies_array) + // call @__kmpc_omp_task_begin_if0(...) + // call @ @.omp_target_task_proxy_func(i32 thread_id, ptr + // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...) + // } + // + // define internal void @.omp_target_task_proxy_func(i32 %thread.id, + // ptr %task) { + // %structArg = alloca {ptr, ptr, ptr} + // %shared_data = load (getelementptr %task, 0, 0) + // mempcy(%structArg, %shared_data, sizeof(structArg)) + // kernel_launch_function(%thread.id, %structArg) + // } + // + // We need the proxy function because the signature of the task entry point + // expected by kmpc_omp_task is always the same and will be different from + // that of the kernel_launch function. + // + // kernel_launch_function is generated by emitKernelLaunch and has the + // always_inline attribute. void kernel_launch_function(thread_id, + //structArg) + //alwaysinline { + // %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8 + // offload_baseptrs = load(getelementptr structArg, 0, 0) + // offload_ptrs = load(getelementptr structArg, 0, 1) + // offload_mappers = load(getelementptr structArg, 0, 2) + // ; setup kernel_args using offload_baseptrs, offload_ptrs and + // ; offload_mappers + // call i32 @__tgt_target_kernel(..., + // outlined_device_function, + // ptr %kernel_args) + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + BasicBlock *TargetTaskBodyBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.body"); + BasicBlock *TargetTaskAllocaBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.alloca"); + + InsertPointTy TargetTaskAllocaIP = + InsertPointTy(TargetTaskAllocaBB, TargetTaskAllocaBB->begin()); + InsertPointTy TargetTaskBodyIP = + InsertPointTy(TargetTaskBodyBB, TargetTaskBodyBB->begin()); + + OutlineInfo OI; + OI.EntryBB =
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -681,7 +681,30 @@ convertOmpTeams(omp::TeamsOp op, llvm::IRBuilderBase , ompLoc, bodyCB, numTeamsLower, numTeamsUpper, threadLimit, ifExpr)); return bodyGenStatus; } - +static void +buildDependData(std::optional depends, OperandRange dependVars, +LLVM::ModuleTranslation , +SmallVector ) { Meinersbur wrote: ```suggestion SmallVectorImpl ) { ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5229,13 +5362,288 @@ static void emitTargetOutlinedFunction( OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, true, OutlinedFn, OutlinedFnID); } +OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask( +Function *OutlinedFn, Value *OutlinedFnID, +EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , +Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP, +SmallVector , +bool HasNoWait) { + + // When we arrive at this function, the target region itself has been + // outlined into the function OutlinedFn. + // So at ths point, for + // -- + // void user_code_that_offloads(...) { + // omp target depend(..) map(from:a) map(to:b, c) + //a = b + c + // } + // + // -- + // + // we have + // + // -- + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // ;; target region has been outlined and now we need to + // ;; offload to it via a target task. + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + // We have to now do the following + // (i) Make an offloading call to outlined_device_function using the OpenMP + // RTL. See 'kernel_launch_function' in the pseudo code below. This is + // emitted by emitKernelLaunch + // (ii) Create a task entry point function that calls kernel_launch_function + // and is the entry point for the target task. See + // '@.omp_target_task_proxy_func in the pseudocode below. + // (iii) Create a task with the task entry point created in (ii) + // + // That is we create the following + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // + // %structArg = alloca { ptr, ptr, ptr }, align 8 + // %strucArg[0] = %.offload_baseptrs + // %strucArg[1] = %.offload_ptrs + // %strucArg[2] = %.offload_mappers + // proxy_target_task = @__kmpc_omp_task_alloc(..., + // @.omp_target_task_proxy_func) + // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg)) + // dependencies_array = ... + // ;; if nowait not present + // call @__kmpc_omp_wait_deps(..., dependencies_array) + // call @__kmpc_omp_task_begin_if0(...) + // call @ @.omp_target_task_proxy_func(i32 thread_id, ptr + // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...) + // } + // + // define internal void @.omp_target_task_proxy_func(i32 %thread.id, + // ptr %task) { + // %structArg = alloca {ptr, ptr, ptr} + // %shared_data = load (getelementptr %task, 0, 0) + // mempcy(%structArg, %shared_data, sizeof(structArg)) + // kernel_launch_function(%thread.id, %structArg) + // } + // + // We need the proxy function because the signature of the task entry point + // expected by kmpc_omp_task is always the same and will be different from + // that of the kernel_launch function. + // + // kernel_launch_function is generated by emitKernelLaunch and has the + // always_inline attribute. void kernel_launch_function(thread_id, + //structArg) + //alwaysinline { + // %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8 + // offload_baseptrs = load(getelementptr structArg, 0, 0) + // offload_ptrs = load(getelementptr structArg, 0, 1) + // offload_mappers = load(getelementptr structArg, 0, 2) + // ; setup kernel_args using offload_baseptrs, offload_ptrs and + // ; offload_mappers + // call i32 @__tgt_target_kernel(..., + // outlined_device_function, + // ptr %kernel_args) + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + BasicBlock *TargetTaskBodyBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.body"); + BasicBlock *TargetTaskAllocaBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.alloca"); + + InsertPointTy TargetTaskAllocaIP = + InsertPointTy(TargetTaskAllocaBB, TargetTaskAllocaBB->begin()); + InsertPointTy TargetTaskBodyIP = + InsertPointTy(TargetTaskBodyBB, TargetTaskBodyBB->begin()); + + OutlineInfo OI; + OI.EntryBB =
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5229,13 +5362,288 @@ static void emitTargetOutlinedFunction( OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, true, OutlinedFn, OutlinedFnID); } +OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask( +Function *OutlinedFn, Value *OutlinedFnID, +EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , +Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP, +SmallVector , +bool HasNoWait) { + + // When we arrive at this function, the target region itself has been + // outlined into the function OutlinedFn. + // So at ths point, for + // -- + // void user_code_that_offloads(...) { + // omp target depend(..) map(from:a) map(to:b, c) + //a = b + c + // } + // + // -- + // + // we have + // + // -- + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // ;; target region has been outlined and now we need to + // ;; offload to it via a target task. + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + // We have to now do the following + // (i) Make an offloading call to outlined_device_function using the OpenMP + // RTL. See 'kernel_launch_function' in the pseudo code below. This is + // emitted by emitKernelLaunch + // (ii) Create a task entry point function that calls kernel_launch_function + // and is the entry point for the target task. See + // '@.omp_target_task_proxy_func in the pseudocode below. + // (iii) Create a task with the task entry point created in (ii) + // + // That is we create the following + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // + // %structArg = alloca { ptr, ptr, ptr }, align 8 + // %strucArg[0] = %.offload_baseptrs + // %strucArg[1] = %.offload_ptrs + // %strucArg[2] = %.offload_mappers + // proxy_target_task = @__kmpc_omp_task_alloc(..., + // @.omp_target_task_proxy_func) + // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg)) + // dependencies_array = ... + // ;; if nowait not present + // call @__kmpc_omp_wait_deps(..., dependencies_array) + // call @__kmpc_omp_task_begin_if0(...) + // call @ @.omp_target_task_proxy_func(i32 thread_id, ptr + // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...) + // } + // + // define internal void @.omp_target_task_proxy_func(i32 %thread.id, + // ptr %task) { + // %structArg = alloca {ptr, ptr, ptr} + // %shared_data = load (getelementptr %task, 0, 0) + // mempcy(%structArg, %shared_data, sizeof(structArg)) + // kernel_launch_function(%thread.id, %structArg) + // } + // + // We need the proxy function because the signature of the task entry point + // expected by kmpc_omp_task is always the same and will be different from + // that of the kernel_launch function. + // + // kernel_launch_function is generated by emitKernelLaunch and has the + // always_inline attribute. void kernel_launch_function(thread_id, + //structArg) + //alwaysinline { + // %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8 + // offload_baseptrs = load(getelementptr structArg, 0, 0) + // offload_ptrs = load(getelementptr structArg, 0, 1) + // offload_mappers = load(getelementptr structArg, 0, 2) + // ; setup kernel_args using offload_baseptrs, offload_ptrs and + // ; offload_mappers + // call i32 @__tgt_target_kernel(..., + // outlined_device_function, + // ptr %kernel_args) + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + BasicBlock *TargetTaskBodyBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.body"); + BasicBlock *TargetTaskAllocaBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.alloca"); + + InsertPointTy TargetTaskAllocaIP = + InsertPointTy(TargetTaskAllocaBB, TargetTaskAllocaBB->begin()); + InsertPointTy TargetTaskBodyIP = + InsertPointTy(TargetTaskBodyBB, TargetTaskBodyBB->begin()); Meinersbur wrote:
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5229,13 +5362,288 @@ static void emitTargetOutlinedFunction( OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, true, OutlinedFn, OutlinedFnID); } +OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask( +Function *OutlinedFn, Value *OutlinedFnID, +EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , +Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP, +SmallVector , +bool HasNoWait) { + + // When we arrive at this function, the target region itself has been + // outlined into the function OutlinedFn. + // So at ths point, for + // -- + // void user_code_that_offloads(...) { + // omp target depend(..) map(from:a) map(to:b, c) + //a = b + c + // } + // + // -- + // + // we have + // + // -- + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // ;; target region has been outlined and now we need to + // ;; offload to it via a target task. + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + // We have to now do the following + // (i) Make an offloading call to outlined_device_function using the OpenMP + // RTL. See 'kernel_launch_function' in the pseudo code below. This is + // emitted by emitKernelLaunch + // (ii) Create a task entry point function that calls kernel_launch_function + // and is the entry point for the target task. See + // '@.omp_target_task_proxy_func in the pseudocode below. + // (iii) Create a task with the task entry point created in (ii) + // + // That is we create the following + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // + // %structArg = alloca { ptr, ptr, ptr }, align 8 + // %strucArg[0] = %.offload_baseptrs + // %strucArg[1] = %.offload_ptrs + // %strucArg[2] = %.offload_mappers + // proxy_target_task = @__kmpc_omp_task_alloc(..., + // @.omp_target_task_proxy_func) + // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg)) + // dependencies_array = ... + // ;; if nowait not present + // call @__kmpc_omp_wait_deps(..., dependencies_array) + // call @__kmpc_omp_task_begin_if0(...) + // call @ @.omp_target_task_proxy_func(i32 thread_id, ptr + // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...) + // } + // + // define internal void @.omp_target_task_proxy_func(i32 %thread.id, + // ptr %task) { + // %structArg = alloca {ptr, ptr, ptr} + // %shared_data = load (getelementptr %task, 0, 0) + // mempcy(%structArg, %shared_data, sizeof(structArg)) + // kernel_launch_function(%thread.id, %structArg) + // } + // + // We need the proxy function because the signature of the task entry point + // expected by kmpc_omp_task is always the same and will be different from + // that of the kernel_launch function. + // + // kernel_launch_function is generated by emitKernelLaunch and has the + // always_inline attribute. void kernel_launch_function(thread_id, + //structArg) + //alwaysinline { + // %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8 + // offload_baseptrs = load(getelementptr structArg, 0, 0) + // offload_ptrs = load(getelementptr structArg, 0, 1) + // offload_mappers = load(getelementptr structArg, 0, 2) + // ; setup kernel_args using offload_baseptrs, offload_ptrs and + // ; offload_mappers + // call i32 @__tgt_target_kernel(..., + // outlined_device_function, + // ptr %kernel_args) + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + BasicBlock *TargetTaskBodyBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.body"); + BasicBlock *TargetTaskAllocaBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.alloca"); + + InsertPointTy TargetTaskAllocaIP = + InsertPointTy(TargetTaskAllocaBB, TargetTaskAllocaBB->begin()); + InsertPointTy TargetTaskBodyIP = + InsertPointTy(TargetTaskBodyBB, TargetTaskBodyBB->begin()); + + OutlineInfo OI; + OI.EntryBB =
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5229,13 +5362,288 @@ static void emitTargetOutlinedFunction( OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, true, OutlinedFn, OutlinedFnID); } +OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask( +Function *OutlinedFn, Value *OutlinedFnID, +EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , +Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP, +SmallVector , +bool HasNoWait) { + + // When we arrive at this function, the target region itself has been + // outlined into the function OutlinedFn. + // So at ths point, for + // -- + // void user_code_that_offloads(...) { + // omp target depend(..) map(from:a) map(to:b, c) + //a = b + c + // } + // + // -- + // + // we have + // + // -- + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // ;; target region has been outlined and now we need to + // ;; offload to it via a target task. + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + // We have to now do the following + // (i) Make an offloading call to outlined_device_function using the OpenMP + // RTL. See 'kernel_launch_function' in the pseudo code below. This is + // emitted by emitKernelLaunch + // (ii) Create a task entry point function that calls kernel_launch_function + // and is the entry point for the target task. See + // '@.omp_target_task_proxy_func in the pseudocode below. + // (iii) Create a task with the task entry point created in (ii) + // + // That is we create the following + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // + // %structArg = alloca { ptr, ptr, ptr }, align 8 + // %strucArg[0] = %.offload_baseptrs + // %strucArg[1] = %.offload_ptrs + // %strucArg[2] = %.offload_mappers + // proxy_target_task = @__kmpc_omp_task_alloc(..., + // @.omp_target_task_proxy_func) + // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg)) + // dependencies_array = ... + // ;; if nowait not present + // call @__kmpc_omp_wait_deps(..., dependencies_array) + // call @__kmpc_omp_task_begin_if0(...) + // call @ @.omp_target_task_proxy_func(i32 thread_id, ptr + // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...) + // } + // + // define internal void @.omp_target_task_proxy_func(i32 %thread.id, + // ptr %task) { + // %structArg = alloca {ptr, ptr, ptr} + // %shared_data = load (getelementptr %task, 0, 0) + // mempcy(%structArg, %shared_data, sizeof(structArg)) + // kernel_launch_function(%thread.id, %structArg) + // } + // + // We need the proxy function because the signature of the task entry point + // expected by kmpc_omp_task is always the same and will be different from + // that of the kernel_launch function. + // + // kernel_launch_function is generated by emitKernelLaunch and has the + // always_inline attribute. void kernel_launch_function(thread_id, + //structArg) + //alwaysinline { + // %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8 + // offload_baseptrs = load(getelementptr structArg, 0, 0) + // offload_ptrs = load(getelementptr structArg, 0, 1) + // offload_mappers = load(getelementptr structArg, 0, 2) + // ; setup kernel_args using offload_baseptrs, offload_ptrs and + // ; offload_mappers + // call i32 @__tgt_target_kernel(..., + // outlined_device_function, + // ptr %kernel_args) + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + BasicBlock *TargetTaskBodyBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.body"); + BasicBlock *TargetTaskAllocaBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.alloca"); + + InsertPointTy TargetTaskAllocaIP = + InsertPointTy(TargetTaskAllocaBB, TargetTaskAllocaBB->begin()); + InsertPointTy TargetTaskBodyIP = + InsertPointTy(TargetTaskBodyBB, TargetTaskBodyBB->begin()); + + OutlineInfo OI; + OI.EntryBB =
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -681,7 +681,30 @@ convertOmpTeams(omp::TeamsOp op, llvm::IRBuilderBase , ompLoc, bodyCB, numTeamsLower, numTeamsUpper, threadLimit, ifExpr)); return bodyGenStatus; } - +static void Meinersbur wrote: [nit] ```suggestion static void ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); + OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(), +StaleCI->getIterator()); + LLVMContext = StaleCI->getParent()->getContext(); + Type *ThreadIDTy = Type::getInt32Ty(Ctx); + Type *TaskPtrTy = OMPBuilder.TaskPtr; + Type *TaskTy = OMPBuilder.Task; + auto ProxyFnTy = + FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy}, +/* isVarArg */ false); + auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage, + ".omp_target_task_proxy_func", + Builder.GetInsertBlock()->getModule()); + + BasicBlock *EntryBB = + BasicBlock::Create(Builder.getContext(), "entry", ProxyFn); + Builder.SetInsertPoint(EntryBB); + + bool HasShareds = StaleCI->arg_size() > 1; + // TODO: This is a temporary assert to prove to ourselves that + // the outlined target launch function is always going to have + // atmost two arguments if there is any data shared between + // host and device. + assert((!HasShareds || (StaleCI->arg_size() == 2)) && + "StaleCI with shareds should have exactly two arguments."); + if (HasShareds) { +AllocaInst *ArgStructAlloca = +dyn_cast(StaleCI->getArgOperand(1)); Meinersbur wrote: ```suggestion auto *ArgStructAlloca = dyn_cast(StaleCI->getArgOperand(1)); ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. Meinersbur wrote: Could be a Doxygen comment ```suggestion /// Create an entry point for a target task with the following. /// It'll have the following signature /// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) /// This function is called from emitTargetTask once the /// code to launch the target kernel has been outlined already. ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); + OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(), +StaleCI->getIterator()); + LLVMContext = StaleCI->getParent()->getContext(); + Type *ThreadIDTy = Type::getInt32Ty(Ctx); + Type *TaskPtrTy = OMPBuilder.TaskPtr; + Type *TaskTy = OMPBuilder.Task; + auto ProxyFnTy = + FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy}, +/* isVarArg */ false); + auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage, + ".omp_target_task_proxy_func", + Builder.GetInsertBlock()->getModule()); + + BasicBlock *EntryBB = + BasicBlock::Create(Builder.getContext(), "entry", ProxyFn); + Builder.SetInsertPoint(EntryBB); + + bool HasShareds = StaleCI->arg_size() > 1; + // TODO: This is a temporary assert to prove to ourselves that + // the outlined target launch function is always going to have + // atmost two arguments if there is any data shared between + // host and device. + assert((!HasShareds || (StaleCI->arg_size() == 2)) && + "StaleCI with shareds should have exactly two arguments."); + if (HasShareds) { +AllocaInst *ArgStructAlloca = +dyn_cast(StaleCI->getArgOperand(1)); +assert(ArgStructAlloca && + "Unable to find the alloca instruction corresponding to arguments " + "for extracted function"); +StructType *ArgStructType = +dyn_cast(ArgStructAlloca->getAllocatedType()); +LLVM_DEBUG(dbgs() << "ArgStructType = " << *ArgStructType << "\n"); + +AllocaInst *NewArgStructAlloca = +Builder.CreateAlloca(ArgStructType, nullptr, "structArg"); +Value *TaskT = ProxyFn->getArg(1); +Value *ThreadId = ProxyFn->getArg(0); +LLVM_DEBUG(dbgs() << "TaskT = " << *TaskT << "\n"); +Value *SharedsSize = +Builder.getInt64(M.getDataLayout().getTypeStoreSize(ArgStructType)); + +Value *Shareds = Builder.CreateStructGEP(TaskTy, TaskT, 0); +LoadInst *LoadShared = +Builder.CreateLoad(PointerType::getUnqual(Ctx), Shareds); + +// TODO: Are these alignment values correct? Meinersbur wrote: I think `NewArgStructAlloca->getAlign()` shold be sufficient. If the alloca/load doesn't have the align set explicitly, `memcpy` should apply pointer alignment itself. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); + OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(), +StaleCI->getIterator()); + LLVMContext = StaleCI->getParent()->getContext(); + Type *ThreadIDTy = Type::getInt32Ty(Ctx); + Type *TaskPtrTy = OMPBuilder.TaskPtr; + Type *TaskTy = OMPBuilder.Task; + auto ProxyFnTy = + FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy}, +/* isVarArg */ false); + auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage, + ".omp_target_task_proxy_func", + Builder.GetInsertBlock()->getModule()); + + BasicBlock *EntryBB = + BasicBlock::Create(Builder.getContext(), "entry", ProxyFn); + Builder.SetInsertPoint(EntryBB); + + bool HasShareds = StaleCI->arg_size() > 1; Meinersbur wrote: What is "Shareds"? https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/Meinersbur commented: Conceptually, looks quite good. Just some style comments. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/Meinersbur edited https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5229,13 +5362,288 @@ static void emitTargetOutlinedFunction( OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, true, OutlinedFn, OutlinedFnID); } +OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask( +Function *OutlinedFn, Value *OutlinedFnID, +EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , +Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP, +SmallVector , +bool HasNoWait) { + + // When we arrive at this function, the target region itself has been + // outlined into the function OutlinedFn. + // So at ths point, for + // -- + // void user_code_that_offloads(...) { + // omp target depend(..) map(from:a) map(to:b, c) + //a = b + c + // } + // + // -- + // + // we have + // + // -- + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // ;; target region has been outlined and now we need to + // ;; offload to it via a target task. + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + // We have to now do the following + // (i) Make an offloading call to outlined_device_function using the OpenMP + // RTL. See 'kernel_launch_function' in the pseudo code below. This is + // emitted by emitKernelLaunch + // (ii) Create a task entry point function that calls kernel_launch_function + // and is the entry point for the target task. See + // '@.omp_target_task_proxy_func in the pseudocode below. + // (iii) Create a task with the task entry point created in (ii) + // + // That is we create the following + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // + // %structArg = alloca { ptr, ptr, ptr }, align 8 + // %strucArg[0] = %.offload_baseptrs + // %strucArg[1] = %.offload_ptrs + // %strucArg[2] = %.offload_mappers + // proxy_target_task = @__kmpc_omp_task_alloc(..., + // @.omp_target_task_proxy_func) + // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg)) + // dependencies_array = ... + // ;; if nowait not present + // call @__kmpc_omp_wait_deps(..., dependencies_array) + // call @__kmpc_omp_task_begin_if0(...) + // call @ @.omp_target_task_proxy_func(i32 thread_id, ptr + // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...) + // } + // + // define internal void @.omp_target_task_proxy_func(i32 %thread.id, + // ptr %task) { + // %structArg = alloca {ptr, ptr, ptr} + // %shared_data = load (getelementptr %task, 0, 0) + // mempcy(%structArg, %shared_data, sizeof(structArg)) + // kernel_launch_function(%thread.id, %structArg) + // } + // + // We need the proxy function because the signature of the task entry point + // expected by kmpc_omp_task is always the same and will be different from + // that of the kernel_launch function. + // + // kernel_launch_function is generated by emitKernelLaunch and has the + // always_inline attribute. void kernel_launch_function(thread_id, Meinersbur wrote: [nit] line break https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/Meinersbur edited https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/Meinersbur edited https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/Meinersbur edited https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/Meinersbur edited https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1762,6 +1762,26 @@ class OpenMPIRBuilder { EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , Value *DeviceID, Value *RTLoc, InsertPointTy AllocaIP); + /// Generate a target-task for the target construct + /// + /// \param OutlinedFn The outlined device/target kernel function. + /// \param OutlinedFnID The ooulined function ID. + /// \param EmitTargetCallFallbackCB Call back function to generate host + ///fallback code. + /// \param Args Data structure holding information about the kernel arguments. + /// \param DeviceID Identifier for the device via the 'device' clause. + /// \param RTLoc Source location identifier + /// \param AllocaIP The insertion point to be used for alloca instructions. + /// \param Dependencies Vector of DependData objects holding information of + ///dependencies as specified by the 'depend' clause. + /// \param HasNoWait True if the target construct had 'nowait' on it, false + ///otherwise + InsertPointTy emitTargetTask( + Function *OutlinedFn, Value *OutlinedFnID, + EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , + Value *DeviceID, Value *RTLoc, InsertPointTy AllocaIP, + SmallVector , bool HasNoWait); Meinersbur wrote: ```suggestion ArrayRef Dependencies, bool HasNoWait); ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const LocationDescription ) { emitTaskyieldImpl(Loc); } +// Processes the dependencies in Dependencies and does the following +// - Allocates space on the stack of an array of DependInfo objects +// - Populates each DependInfo object with relevant information of +// the corresponding dependence. +// - All code is inserted in the entry block of the current function. +static Value * +emitDepArray(OpenMPIRBuilder , + SmallVector ) { + // Early return if we have no dependencies to process + if (!Dependencies.size()) +return nullptr; + + IRBuilderBase = OMPBuilder.Builder; + Type *DependInfo = OMPBuilder.DependInfo; + Module = OMPBuilder.M; + + Value *DepArray = nullptr; + if (Dependencies.size()) { +OpenMPIRBuilder::InsertPointTy OldIP = Builder.saveIP(); +Builder.SetInsertPoint( +()->getParent()->getEntryBlock().back()); Meinersbur wrote: ```suggestion Builder.SetInsertPoint( ()->getParent()->getEntryBlock().getTerminator()); ``` just makes it more explicit. Did you consider explicitly passing the AllocaIP, could be needed with combined constructs. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const LocationDescription ) { emitTaskyieldImpl(Loc); } +// Processes the dependencies in Dependencies and does the following +// - Allocates space on the stack of an array of DependInfo objects +// - Populates each DependInfo object with relevant information of +// the corresponding dependence. +// - All code is inserted in the entry block of the current function. +static Value * +emitDepArray(OpenMPIRBuilder , + SmallVector ) { + // Early return if we have no dependencies to process + if (!Dependencies.size()) +return nullptr; + + IRBuilderBase = OMPBuilder.Builder; + Type *DependInfo = OMPBuilder.DependInfo; + Module = OMPBuilder.M; + + Value *DepArray = nullptr; + if (Dependencies.size()) { +OpenMPIRBuilder::InsertPointTy OldIP = Builder.saveIP(); +Builder.SetInsertPoint( +()->getParent()->getEntryBlock().back()); + +Type *DepArrayTy = ArrayType::get(DependInfo, Dependencies.size()); +DepArray = Builder.CreateAlloca(DepArrayTy, nullptr, ".dep.arr.addr"); + +unsigned P = 0; +for (const OpenMPIRBuilder::DependData : Dependencies) { + Value *Base = + Builder.CreateConstInBoundsGEP2_64(DepArrayTy, DepArray, 0, P); + // Store the pointer to the variable + Value *Addr = Builder.CreateStructGEP( + DependInfo, Base, + static_cast(RTLDependInfoFields::BaseAddr)); + Value *DepValPtr = + Builder.CreatePtrToInt(Dep.DepVal, Builder.getInt64Ty()); + Builder.CreateStore(DepValPtr, Addr); + // Store the size of the variable + Value *Size = Builder.CreateStructGEP( + DependInfo, Base, + static_cast(RTLDependInfoFields::Len)); + Builder.CreateStore(Builder.getInt64(M.getDataLayout().getTypeStoreSize( + Dep.DepValueType)), + Size); + // Store the dependency kind + Value *Flags = Builder.CreateStructGEP( + DependInfo, Base, + static_cast(RTLDependInfoFields::Flags)); + Builder.CreateStore( + ConstantInt::get(Builder.getInt8Ty(), + static_cast(Dep.DepKind)), + Flags); Meinersbur wrote: Consider adding a comment with a mock source of what is generate here: ``` DepArray[P].BaseAddre = ... ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -2253,7 +2275,8 @@ class OpenMPIRBuilder { SmallVectorImpl , GenMapInfoCallbackTy GenMapInfoCB, TargetBodyGenCallbackTy BodyGenCB, - TargetGenArgAccessorsCallbackTy ArgAccessorFuncCB); + TargetGenArgAccessorsCallbackTy ArgAccessorFuncCB, + SmallVector Dependencies = {}); Meinersbur wrote: ```suggestion ArrayRef Dependencies = {}); ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const LocationDescription ) { emitTaskyieldImpl(Loc); } +// Processes the dependencies in Dependencies and does the following +// - Allocates space on the stack of an array of DependInfo objects +// - Populates each DependInfo object with relevant information of +// the corresponding dependence. +// - All code is inserted in the entry block of the current function. +static Value * +emitDepArray(OpenMPIRBuilder , + SmallVector ) { + // Early return if we have no dependencies to process + if (!Dependencies.size()) Meinersbur wrote: ```suggestion if (Dependencies.empty()) ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/Meinersbur edited https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/Meinersbur commented: Not a full a review, but the first notes that I started with https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -0,0 +1,37 @@ +! Offloading test checking the use of the depend clause on +! the target construct +! REQUIRES: flang, amdgcn-amd-amdhsa +! UNSUPPORTED: nvptx64-nvidia-cuda +! UNSUPPORTED: nvptx64-nvidia-cuda-LTO +! UNSUPPORTED: aarch64-unknown-linux-gnu +! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +! UNSUPPORTED: x86_64-pc-linux-gnu +! UNSUPPORTED: x86_64-pc-linux-gnu-LTO + +! RUN: %libomptarget-compile-fortran-run-and-check-generic +program main + integer :: a = 0 + call foo(5, a) + print*, "=== FORTRAN Test passed! ===" + print*, "foo(5) returned ", a, ", expected 6\n" + ! stop 0 +end program main +subroutine foo(N, r) + integer, intent(in) :: N + integer, intent(out) :: r + integer :: z + + z = 1 + !$omp task depend(out: z) shared(z) + z = N bhandarkar-pranav wrote: Yes, that makes sense. I'll add a omp parallel also so that if I remove the depend there is a chance that i'd get the wrong result. That'll really prove this works. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5572,6 +5999,8 @@ void OpenMPIRBuilder::emitOffloadingArrays( return; Builder.restoreIP(AllocaIP); + LLVM_DEBUG(dbgs() << "Basicblock before emitOffloadingArrays\n" +<< *(Builder.GetInsertBlock()) << "\n"); bhandarkar-pranav wrote: No, this should go :( https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5252,6 +5660,8 @@ static void emitTargetCall(OpenMPIRBuilder , IRBuilderBase , // emitKernelLaunch auto & = [&](OpenMPIRBuilder::InsertPointTy IP) -> OpenMPIRBuilder::InsertPointTy { +LLVM_DEBUG(dbgs() << "EmitTargetCallFallbackCB::Builder = " << + << "\n"); bhandarkar-pranav wrote: No, this should go. sorry about this. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); + OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(), +StaleCI->getIterator()); + LLVMContext = StaleCI->getParent()->getContext(); + Type *ThreadIDTy = Type::getInt32Ty(Ctx); + Type *TaskPtrTy = OMPBuilder.TaskPtr; + Type *TaskTy = OMPBuilder.Task; + auto ProxyFnTy = + FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy}, +/* isVarArg */ false); + auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage, + ".omp_target_task_proxy_func", + Builder.GetInsertBlock()->getModule()); + + BasicBlock *EntryBB = + BasicBlock::Create(Builder.getContext(), "entry", ProxyFn); + Builder.SetInsertPoint(EntryBB); + + bool HasShareds = StaleCI->arg_size() > 1; + // TODO: This is a temporary assert to prove to ourselves that + // the outlined target launch function is always going to have + // atmost two arguments if there is any data shared between + // host and device. + assert((!HasShareds || (StaleCI->arg_size() == 2)) && + "StaleCI with shareds should have exactly two arguments."); + if (HasShareds) { +AllocaInst *ArgStructAlloca = +dyn_cast(StaleCI->getArgOperand(1)); +assert(ArgStructAlloca && + "Unable to find the alloca instruction corresponding to arguments " + "for extracted function"); +StructType *ArgStructType = +dyn_cast(ArgStructAlloca->getAllocatedType()); +LLVM_DEBUG(dbgs() << "ArgStructType = " << *ArgStructType << "\n"); + +AllocaInst *NewArgStructAlloca = +Builder.CreateAlloca(ArgStructType, nullptr, "structArg"); +Value *TaskT = ProxyFn->getArg(1); +Value *ThreadId = ProxyFn->getArg(0); +LLVM_DEBUG(dbgs() << "TaskT = " << *TaskT << "\n"); +Value *SharedsSize = +Builder.getInt64(M.getDataLayout().getTypeStoreSize(ArgStructType)); + +Value *Shareds = Builder.CreateStructGEP(TaskTy, TaskT, 0); +LoadInst *LoadShared = +Builder.CreateLoad(PointerType::getUnqual(Ctx), Shareds); + +// TODO: Are these alignment values correct? +Builder.CreateMemCpy( +NewArgStructAlloca, +NewArgStructAlloca->getPointerAlignment(M.getDataLayout()), LoadShared, +LoadShared->getPointerAlignment(M.getDataLayout()), SharedsSize); + +Builder.CreateCall(CalledFunction, {ThreadId, NewArgStructAlloca}); + } + ProxyFn->getArg(0)->setName("thread.id"); + ProxyFn->getArg(1)->setName("task"); bhandarkar-pranav wrote: Sure, will do. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); + OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(), +StaleCI->getIterator()); + LLVMContext = StaleCI->getParent()->getContext(); + Type *ThreadIDTy = Type::getInt32Ty(Ctx); + Type *TaskPtrTy = OMPBuilder.TaskPtr; + Type *TaskTy = OMPBuilder.Task; + auto ProxyFnTy = + FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy}, +/* isVarArg */ false); + auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage, + ".omp_target_task_proxy_func", + Builder.GetInsertBlock()->getModule()); + + BasicBlock *EntryBB = + BasicBlock::Create(Builder.getContext(), "entry", ProxyFn); + Builder.SetInsertPoint(EntryBB); + + bool HasShareds = StaleCI->arg_size() > 1; + // TODO: This is a temporary assert to prove to ourselves that + // the outlined target launch function is always going to have + // atmost two arguments if there is any data shared between + // host and device. + assert((!HasShareds || (StaleCI->arg_size() == 2)) && + "StaleCI with shareds should have exactly two arguments."); + if (HasShareds) { +AllocaInst *ArgStructAlloca = +dyn_cast(StaleCI->getArgOperand(1)); +assert(ArgStructAlloca && + "Unable to find the alloca instruction corresponding to arguments " + "for extracted function"); +StructType *ArgStructType = +dyn_cast(ArgStructAlloca->getAllocatedType()); +LLVM_DEBUG(dbgs() << "ArgStructType = " << *ArgStructType << "\n"); bhandarkar-pranav wrote: Yes, this should have been removed before opening the PR even. I accidentally left something I had put in place when i was debugging my work. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); + OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(), +StaleCI->getIterator()); + LLVMContext = StaleCI->getParent()->getContext(); + Type *ThreadIDTy = Type::getInt32Ty(Ctx); + Type *TaskPtrTy = OMPBuilder.TaskPtr; + Type *TaskTy = OMPBuilder.Task; + auto ProxyFnTy = + FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy}, +/* isVarArg */ false); + auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage, + ".omp_target_task_proxy_func", + Builder.GetInsertBlock()->getModule()); + + BasicBlock *EntryBB = + BasicBlock::Create(Builder.getContext(), "entry", ProxyFn); + Builder.SetInsertPoint(EntryBB); + + bool HasShareds = StaleCI->arg_size() > 1; bhandarkar-pranav wrote: Got it, yes. will do that. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); bhandarkar-pranav wrote: Yes, that's a better name. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , bhandarkar-pranav wrote: Good point @ergawy and @skatrak I'll pick a different name `emitTargetTaskProxyFunction` sounds appropriate. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const LocationDescription ) { emitTaskyieldImpl(Loc); } +// Processes the dependencies in Dependencies and does the following +// - Allocates space on the stack of an array of DependInfo objects +// - Populates each DependInfo object with relevant information of +// the corresponding dependence. +// - All code is inserted in the entry block of the current function. +static Value * +emitDepArray(OpenMPIRBuilder , + SmallVector ) { + // Early return if we have no dependencies to process + if (!Dependencies.size()) +return nullptr; + + IRBuilderBase = OMPBuilder.Builder; + Type *DependInfo = OMPBuilder.DependInfo; + Module = OMPBuilder.M; + + Value *DepArray = nullptr; + if (Dependencies.size()) { +OpenMPIRBuilder::InsertPointTy OldIP = Builder.saveIP(); +Builder.SetInsertPoint( +()->getParent()->getEntryBlock().back()); + +Type *DepArrayTy = ArrayType::get(DependInfo, Dependencies.size()); +DepArray = Builder.CreateAlloca(DepArrayTy, nullptr, ".dep.arr.addr"); + +unsigned P = 0; +for (const OpenMPIRBuilder::DependData : Dependencies) { bhandarkar-pranav wrote: Actually, my intention was to switch `OpenMPIRBuilder::createTask` to use this function but i had a couple of other changes to `OpenMPIRBuilder::createTask` in mind so I decided to leave that out and roll it into a different PR. If you prefer, I can make the switch in this PR itself though. Just a preference, no strong opinions on this. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const LocationDescription ) { emitTaskyieldImpl(Loc); } +// Processes the dependencies in Dependencies and does the following +// - Allocates space on the stack of an array of DependInfo objects +// - Populates each DependInfo object with relevant information of +// the corresponding dependence. +// - All code is inserted in the entry block of the current function. +static Value * +emitDepArray(OpenMPIRBuilder , + SmallVector ) { + // Early return if we have no dependencies to process + if (!Dependencies.size()) +return nullptr; + + IRBuilderBase = OMPBuilder.Builder; + Type *DependInfo = OMPBuilder.DependInfo; + Module = OMPBuilder.M; + + Value *DepArray = nullptr; + if (Dependencies.size()) { +OpenMPIRBuilder::InsertPointTy OldIP = Builder.saveIP(); +Builder.SetInsertPoint( +()->getParent()->getEntryBlock().back()); + +Type *DepArrayTy = ArrayType::get(DependInfo, Dependencies.size()); +DepArray = Builder.CreateAlloca(DepArrayTy, nullptr, ".dep.arr.addr"); + +unsigned P = 0; +for (const OpenMPIRBuilder::DependData : Dependencies) { bhandarkar-pranav wrote: Will do, thanks. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const LocationDescription ) { emitTaskyieldImpl(Loc); } +// Processes the dependencies in Dependencies and does the following +// - Allocates space on the stack of an array of DependInfo objects +// - Populates each DependInfo object with relevant information of +// the corresponding dependence. +// - All code is inserted in the entry block of the current function. +static Value * +emitDepArray(OpenMPIRBuilder , + SmallVector ) { + // Early return if we have no dependencies to process + if (!Dependencies.size()) +return nullptr; + + IRBuilderBase = OMPBuilder.Builder; + Type *DependInfo = OMPBuilder.DependInfo; + Module = OMPBuilder.M; + + Value *DepArray = nullptr; + if (Dependencies.size()) { bhandarkar-pranav wrote: Aah nice catch. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const LocationDescription ) { emitTaskyieldImpl(Loc); } +// Processes the dependencies in Dependencies and does the following +// - Allocates space on the stack of an array of DependInfo objects +// - Populates each DependInfo object with relevant information of +// the corresponding dependence. +// - All code is inserted in the entry block of the current function. +static Value * +emitDepArray(OpenMPIRBuilder , bhandarkar-pranav wrote: Good point. I'll rename this function. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -792,6 +792,9 @@ void OpenMPIRBuilder::finalize(Function *Fn) { if (!OffloadInfoManager.empty()) createOffloadEntriesAndInfoMetadata(ErrorReportFn); + + LLVM_DEBUG(dbgs() << "Module after OMPIRBuilder::finalize\n"); + LLVM_DEBUG(dbgs() << M << "\n"); bhandarkar-pranav wrote: Sorry for having left these here. This should go. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -705,28 +728,9 @@ convertOmpTaskOp(omp::TaskOp taskOp, llvm::IRBuilderBase , }; SmallVector dds; - if (!taskOp.getDependVars().empty() && taskOp.getDepends()) { -for (auto dep : - llvm::zip(taskOp.getDependVars(), taskOp.getDepends()->getValue())) { - llvm::omp::RTLDependenceKindTy type; - switch ( - cast(std::get<1>(dep)).getValue()) { - case mlir::omp::ClauseTaskDepend::taskdependin: -type = llvm::omp::RTLDependenceKindTy::DepIn; -break; - // The OpenMP runtime requires that the codegen for 'depend' clause for - // 'out' dependency kind must be the same as codegen for 'depend' clause - // with 'inout' dependency. - case mlir::omp::ClauseTaskDepend::taskdependout: - case mlir::omp::ClauseTaskDepend::taskdependinout: -type = llvm::omp::RTLDependenceKindTy::DepInOut; -break; - }; - llvm::Value *depVal = moduleTranslation.lookupValue(std::get<0>(dep)); - llvm::OpenMPIRBuilder::DependData dd(type, depVal->getType(), depVal); - dds.emplace_back(dd); -} - } + if (!taskOp.getDependVars().empty() && taskOp.getDepends()) skatrak wrote: Nit: I'd suggest moving this check into `buildDependData`, since it always needs to be done. It would just return without adding anything to `dds` if the condition is not met. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const LocationDescription ) { emitTaskyieldImpl(Loc); } +// Processes the dependencies in Dependencies and does the following +// - Allocates space on the stack of an array of DependInfo objects +// - Populates each DependInfo object with relevant information of +// the corresponding dependence. +// - All code is inserted in the entry block of the current function. +static Value * +emitDepArray(OpenMPIRBuilder , skatrak wrote: ```suggestion emitTargetTaskDependenciesArray(OpenMPIRBuilder , ``` Nit: I think it's best to try avoiding arbitrary abbreviations in function names for readability. Perhaps the name I'm suggesting can be shortened if it's also intended for general tasks (`emitTaskDependenciesArray`), but it looks like it's only used for target tasks at the moment. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -792,6 +792,9 @@ void OpenMPIRBuilder::finalize(Function *Fn) { if (!OffloadInfoManager.empty()) createOffloadEntriesAndInfoMetadata(ErrorReportFn); + + LLVM_DEBUG(dbgs() << "Module after OMPIRBuilder::finalize\n"); + LLVM_DEBUG(dbgs() << M << "\n"); skatrak wrote: Nit: Is this addition intentional? https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5272,36 +5682,53 @@ static void emitTargetCall(OpenMPIRBuilder , IRBuilderBase , Value *DynCGGroupMem = Builder.getInt32(0); bool HasNoWait = false; + bool HasDependencies = Dependencies.size() > 0; + bool RequiresOuterTargetTask = HasNoWait || HasDependencies; OpenMPIRBuilder::TargetKernelArgs KArgs(NumTargetItems, RTArgs, NumIterations, NumTeamsVal, NumThreadsVal, DynCGGroupMem, HasNoWait); - Builder.restoreIP(OMPBuilder.emitKernelLaunch( - Builder, OutlinedFn, OutlinedFnID, EmitTargetCallFallbackCB, KArgs, - DeviceID, RTLoc, AllocaIP)); + // The presence of certain clauses on the target directive require the + // explicit generation of the target task. + if (RequiresOuterTargetTask) { +OMPBuilder.emitTargetTask(OutlinedFn, OutlinedFnID, skatrak wrote: Is a call to `Builder.restoreIP()` missing? https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -0,0 +1,37 @@ +! Offloading test checking the use of the depend clause on +! the target construct +! REQUIRES: flang, amdgcn-amd-amdhsa +! UNSUPPORTED: nvptx64-nvidia-cuda +! UNSUPPORTED: nvptx64-nvidia-cuda-LTO +! UNSUPPORTED: aarch64-unknown-linux-gnu +! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +! UNSUPPORTED: x86_64-pc-linux-gnu +! UNSUPPORTED: x86_64-pc-linux-gnu-LTO + +! RUN: %libomptarget-compile-fortran-run-and-check-generic +program main + integer :: a = 0 + call foo(5, a) + print*, "=== FORTRAN Test passed! ===" + print*, "foo(5) returned ", a, ", expected 6\n" + ! stop 0 +end program main +subroutine foo(N, r) + integer, intent(in) :: N + integer, intent(out) :: r + integer :: z + + z = 1 + !$omp task depend(out: z) shared(z) + z = N skatrak wrote: Would it make sense to perhaps add a relatively long-running loop before setting the value of `z` here? Just to make sure that the target task below has had to wait for the first to finish (rather than being able to assume it did because it possibly runs so fast that it always finishes before the next task starts running). https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , skatrak wrote: I agree with the comment by @ergawy, though I'd rather suggest `emitTargetTaskProxyFunction`, since that's what the emitted function is named. Feel free to ignore if you disagree. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5572,6 +5999,8 @@ void OpenMPIRBuilder::emitOffloadingArrays( return; Builder.restoreIP(AllocaIP); + LLVM_DEBUG(dbgs() << "Basicblock before emitOffloadingArrays\n" +<< *(Builder.GetInsertBlock()) << "\n"); skatrak wrote: Nit: Is this addition intentional? https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5252,6 +5660,8 @@ static void emitTargetCall(OpenMPIRBuilder , IRBuilderBase , // emitKernelLaunch auto & = [&](OpenMPIRBuilder::InsertPointTy IP) -> OpenMPIRBuilder::InsertPointTy { +LLVM_DEBUG(dbgs() << "EmitTargetCallFallbackCB::Builder = " << + << "\n"); skatrak wrote: Nit: Is this addition intentional? https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/skatrak commented: Thank you Pranav, I think this looks good. I just have a few minor comments. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/skatrak edited https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); + OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(), +StaleCI->getIterator()); + LLVMContext = StaleCI->getParent()->getContext(); + Type *ThreadIDTy = Type::getInt32Ty(Ctx); + Type *TaskPtrTy = OMPBuilder.TaskPtr; + Type *TaskTy = OMPBuilder.Task; + auto ProxyFnTy = + FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy}, +/* isVarArg */ false); + auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage, + ".omp_target_task_proxy_func", + Builder.GetInsertBlock()->getModule()); + + BasicBlock *EntryBB = + BasicBlock::Create(Builder.getContext(), "entry", ProxyFn); + Builder.SetInsertPoint(EntryBB); + + bool HasShareds = StaleCI->arg_size() > 1; + // TODO: This is a temporary assert to prove to ourselves that + // the outlined target launch function is always going to have + // atmost two arguments if there is any data shared between + // host and device. + assert((!HasShareds || (StaleCI->arg_size() == 2)) && + "StaleCI with shareds should have exactly two arguments."); + if (HasShareds) { +AllocaInst *ArgStructAlloca = +dyn_cast(StaleCI->getArgOperand(1)); +assert(ArgStructAlloca && + "Unable to find the alloca instruction corresponding to arguments " + "for extracted function"); +StructType *ArgStructType = +dyn_cast(ArgStructAlloca->getAllocatedType()); +LLVM_DEBUG(dbgs() << "ArgStructType = " << *ArgStructType << "\n"); ergawy wrote: nit: This will be printed out-of-context and might be confusing. I think this can be removed before merging. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const LocationDescription ) { emitTaskyieldImpl(Loc); } +// Processes the dependencies in Dependencies and does the following +// - Allocates space on the stack of an array of DependInfo objects +// - Populates each DependInfo object with relevant information of +// the corresponding dependence. +// - All code is inserted in the entry block of the current function. +static Value * +emitDepArray(OpenMPIRBuilder , + SmallVector ) { + // Early return if we have no dependencies to process + if (!Dependencies.size()) +return nullptr; + + IRBuilderBase = OMPBuilder.Builder; + Type *DependInfo = OMPBuilder.DependInfo; + Module = OMPBuilder.M; + + Value *DepArray = nullptr; + if (Dependencies.size()) { +OpenMPIRBuilder::InsertPointTy OldIP = Builder.saveIP(); +Builder.SetInsertPoint( +()->getParent()->getEntryBlock().back()); + +Type *DepArrayTy = ArrayType::get(DependInfo, Dependencies.size()); +DepArray = Builder.CreateAlloca(DepArrayTy, nullptr, ".dep.arr.addr"); + +unsigned P = 0; +for (const OpenMPIRBuilder::DependData : Dependencies) { ergawy wrote: I see there is a similar loop in `OpenMPIRBuilder::createTask`. Can we outline this to a shared util used in both locations? https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5229,13 +5362,288 @@ static void emitTargetOutlinedFunction( OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, true, OutlinedFn, OutlinedFnID); } +OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask( +Function *OutlinedFn, Value *OutlinedFnID, +EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , +Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP, +SmallVector , +bool HasNoWait) { + + // When we arrive at this function, the target region itself has been ergawy wrote: Really appreciate this block of comments. Paints the picture of what happens clearly. Thanks :)! https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const LocationDescription ) { emitTaskyieldImpl(Loc); } +// Processes the dependencies in Dependencies and does the following +// - Allocates space on the stack of an array of DependInfo objects +// - Populates each DependInfo object with relevant information of +// the corresponding dependence. +// - All code is inserted in the entry block of the current function. +static Value * +emitDepArray(OpenMPIRBuilder , + SmallVector ) { + // Early return if we have no dependencies to process + if (!Dependencies.size()) +return nullptr; + + IRBuilderBase = OMPBuilder.Builder; + Type *DependInfo = OMPBuilder.DependInfo; + Module = OMPBuilder.M; + + Value *DepArray = nullptr; + if (Dependencies.size()) { +OpenMPIRBuilder::InsertPointTy OldIP = Builder.saveIP(); +Builder.SetInsertPoint( +()->getParent()->getEntryBlock().back()); + +Type *DepArrayTy = ArrayType::get(DependInfo, Dependencies.size()); +DepArray = Builder.CreateAlloca(DepArrayTy, nullptr, ".dep.arr.addr"); + +unsigned P = 0; +for (const OpenMPIRBuilder::DependData : Dependencies) { ergawy wrote: Using `llvm::enumerate` will save us having to declare and increment the idx var. ```suggestion for (const auto&[DepIdx, Dep] : enumerate(Dependencies)) { ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -1698,6 +1701,64 @@ void OpenMPIRBuilder::createTaskyield(const LocationDescription ) { emitTaskyieldImpl(Loc); } +// Processes the dependencies in Dependencies and does the following +// - Allocates space on the stack of an array of DependInfo objects +// - Populates each DependInfo object with relevant information of +// the corresponding dependence. +// - All code is inserted in the entry block of the current function. +static Value * +emitDepArray(OpenMPIRBuilder , + SmallVector ) { + // Early return if we have no dependencies to process + if (!Dependencies.size()) +return nullptr; + + IRBuilderBase = OMPBuilder.Builder; + Type *DependInfo = OMPBuilder.DependInfo; + Module = OMPBuilder.M; + + Value *DepArray = nullptr; + if (Dependencies.size()) { ergawy wrote: You already checked that `Dependencies` is not empty above. ```suggestion ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); + OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(), +StaleCI->getIterator()); + LLVMContext = StaleCI->getParent()->getContext(); + Type *ThreadIDTy = Type::getInt32Ty(Ctx); + Type *TaskPtrTy = OMPBuilder.TaskPtr; + Type *TaskTy = OMPBuilder.Task; + auto ProxyFnTy = + FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy}, +/* isVarArg */ false); + auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage, + ".omp_target_task_proxy_func", + Builder.GetInsertBlock()->getModule()); + + BasicBlock *EntryBB = + BasicBlock::Create(Builder.getContext(), "entry", ProxyFn); + Builder.SetInsertPoint(EntryBB); + + bool HasShareds = StaleCI->arg_size() > 1; + // TODO: This is a temporary assert to prove to ourselves that + // the outlined target launch function is always going to have + // atmost two arguments if there is any data shared between + // host and device. + assert((!HasShareds || (StaleCI->arg_size() == 2)) && + "StaleCI with shareds should have exactly two arguments."); + if (HasShareds) { +AllocaInst *ArgStructAlloca = +dyn_cast(StaleCI->getArgOperand(1)); +assert(ArgStructAlloca && + "Unable to find the alloca instruction corresponding to arguments " + "for extracted function"); +StructType *ArgStructType = +dyn_cast(ArgStructAlloca->getAllocatedType()); +LLVM_DEBUG(dbgs() << "ArgStructType = " << *ArgStructType << "\n"); + +AllocaInst *NewArgStructAlloca = +Builder.CreateAlloca(ArgStructType, nullptr, "structArg"); +Value *TaskT = ProxyFn->getArg(1); +Value *ThreadId = ProxyFn->getArg(0); +LLVM_DEBUG(dbgs() << "TaskT = " << *TaskT << "\n"); +Value *SharedsSize = +Builder.getInt64(M.getDataLayout().getTypeStoreSize(ArgStructType)); + +Value *Shareds = Builder.CreateStructGEP(TaskTy, TaskT, 0); +LoadInst *LoadShared = +Builder.CreateLoad(PointerType::getUnqual(Ctx), Shareds); + +// TODO: Are these alignment values correct? +Builder.CreateMemCpy( +NewArgStructAlloca, +NewArgStructAlloca->getPointerAlignment(M.getDataLayout()), LoadShared, +LoadShared->getPointerAlignment(M.getDataLayout()), SharedsSize); + +Builder.CreateCall(CalledFunction, {ThreadId, NewArgStructAlloca}); + } + ProxyFn->getArg(0)->setName("thread.id"); + ProxyFn->getArg(1)->setName("task"); ergawy wrote: nit: move these closer to where `ProxyFn` is created? https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , ergawy wrote: nit: Just to express more clearly the intent of the function. ```suggestion static Function *emitTargetProxyTaskFunction(OpenMPIRBuilder , ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); + OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(), +StaleCI->getIterator()); + LLVMContext = StaleCI->getParent()->getContext(); + Type *ThreadIDTy = Type::getInt32Ty(Ctx); + Type *TaskPtrTy = OMPBuilder.TaskPtr; + Type *TaskTy = OMPBuilder.Task; + auto ProxyFnTy = + FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy}, +/* isVarArg */ false); + auto ProxyFn = Function::Create(ProxyFnTy, GlobalValue::InternalLinkage, + ".omp_target_task_proxy_func", + Builder.GetInsertBlock()->getModule()); + + BasicBlock *EntryBB = + BasicBlock::Create(Builder.getContext(), "entry", ProxyFn); + Builder.SetInsertPoint(EntryBB); + + bool HasShareds = StaleCI->arg_size() > 1; ergawy wrote: Can you document what `StaleCI` will look like in case there are shared values and in case there aren't? https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5212,6 +5273,78 @@ static Function *createOutlinedFunction( return Func; } +// Create an entry point for a target task with the following. +// It'll have the following signature +// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task) +// This function is called from emitTargetTask once the +// code to launch the target kernel has been outlined already. +static Function *emitProxyTaskFunction(OpenMPIRBuilder , + IRBuilderBase , + CallInst *StaleCI) { + Module = OMPBuilder.M; + // CalledFunction is the target launch function, i.e. + // the function that sets up kernel arguments and calls + // __tgt_target_kernel to launch the kernel on the device. + Function *CalledFunction = StaleCI->getCalledFunction(); ergawy wrote: nit: ```suggestion Function *KernelLaunchFunction = StaleCI->getCalledFunction(); ``` https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
@@ -5229,13 +5362,288 @@ static void emitTargetOutlinedFunction( OMPBuilder.emitTargetRegionFunction(EntryInfo, GenerateOutlinedFunction, true, OutlinedFn, OutlinedFnID); } +OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::emitTargetTask( +Function *OutlinedFn, Value *OutlinedFnID, +EmitFallbackCallbackTy EmitTargetCallFallbackCB, TargetKernelArgs , +Value *DeviceID, Value *RTLoc, OpenMPIRBuilder::InsertPointTy AllocaIP, +SmallVector , +bool HasNoWait) { + + // When we arrive at this function, the target region itself has been + // outlined into the function OutlinedFn. + // So at ths point, for + // -- + // void user_code_that_offloads(...) { + // omp target depend(..) map(from:a) map(to:b, c) + //a = b + c + // } + // + // -- + // + // we have + // + // -- + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // ;; target region has been outlined and now we need to + // ;; offload to it via a target task. + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + // We have to now do the following + // (i) Make an offloading call to outlined_device_function using the OpenMP + // RTL. See 'kernel_launch_function' in the pseudo code below. This is + // emitted by emitKernelLaunch + // (ii) Create a task entry point function that calls kernel_launch_function + // and is the entry point for the target task. See + // '@.omp_target_task_proxy_func in the pseudocode below. + // (iii) Create a task with the task entry point created in (ii) + // + // That is we create the following + // + // void user_code_that_offloads(...) { + // %.offload_baseptrs = alloca [3 x ptr], align 8 + // %.offload_ptrs = alloca [3 x ptr], align 8 + // %.offload_mappers = alloca [3 x ptr], align 8 + // + // %structArg = alloca { ptr, ptr, ptr }, align 8 + // %strucArg[0] = %.offload_baseptrs + // %strucArg[1] = %.offload_ptrs + // %strucArg[2] = %.offload_mappers + // proxy_target_task = @__kmpc_omp_task_alloc(..., + // @.omp_target_task_proxy_func) + // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg)) + // dependencies_array = ... + // ;; if nowait not present + // call @__kmpc_omp_wait_deps(..., dependencies_array) + // call @__kmpc_omp_task_begin_if0(...) + // call @ @.omp_target_task_proxy_func(i32 thread_id, ptr + // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...) + // } + // + // define internal void @.omp_target_task_proxy_func(i32 %thread.id, + // ptr %task) { + // %structArg = alloca {ptr, ptr, ptr} + // %shared_data = load (getelementptr %task, 0, 0) + // mempcy(%structArg, %shared_data, sizeof(structArg)) + // kernel_launch_function(%thread.id, %structArg) + // } + // + // We need the proxy function because the signature of the task entry point + // expected by kmpc_omp_task is always the same and will be different from + // that of the kernel_launch function. + // + // kernel_launch_function is generated by emitKernelLaunch and has the + // always_inline attribute. void kernel_launch_function(thread_id, + //structArg) + //alwaysinline { + // %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8 + // offload_baseptrs = load(getelementptr structArg, 0, 0) + // offload_ptrs = load(getelementptr structArg, 0, 1) + // offload_mappers = load(getelementptr structArg, 0, 2) + // ; setup kernel_args using offload_baseptrs, offload_ptrs and + // ; offload_mappers + // call i32 @__tgt_target_kernel(..., + // outlined_device_function, + // ptr %kernel_args) + // } + // void outlined_device_function(ptr a, ptr b, ptr c) { + // *a = *b + *c + // } + // + BasicBlock *TargetTaskBodyBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.body"); + BasicBlock *TargetTaskAllocaBB = + splitBB(Builder, /*CreateBranch=*/true, "target.task.alloca"); + + InsertPointTy TargetTaskAllocaIP = + InsertPointTy(TargetTaskAllocaBB, TargetTaskAllocaBB->begin()); + InsertPointTy TargetTaskBodyIP = + InsertPointTy(TargetTaskBodyBB, TargetTaskBodyBB->begin()); + + OutlineInfo OI; + OI.EntryBB =
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/ergawy edited https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
https://github.com/ergawy commented: Thanks @pranavb-ca! I did a first round and have a few comments. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [OMPIRBuilder] - Handle dependencies in `createTarget` (PR #93977)
bhandarkar-pranav wrote: ping! Gentle reminder, I'd appreciate reviews on this. https://github.com/llvm/llvm-project/pull/93977 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits