llvmorg-github-actions[bot] wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-clang-codegen
Author: Julian Brown (jtb20)
<details>
<summary>Changes</summary>
OpenMP 6.0 lets a taskgraph region be recorded once and replayed many
times. Each replay creates a fresh instance of the 'args' pointer
block passed to __kmpc_taskgraph (and may execute at a different stack
location, or even on a different stack), so by-reference captures inside a
recorded task must be re-pointed at the live host objects of the current
invocation; otherwise the recorded tasks would dereference stale memory
from the stack frame of the initial call to __kmpc_taskgraph.
This patch introduces the small infrastructure to do that and wires
it up for the explicit 'task' construct. A subsequent patch
extends the same scheme to 'taskloop'.
On the compiler side (CGOpenMPRuntime.cpp), a new helper
emitTaskRelocationFunction emits a per-task thunk:
void __omp_taskgraph_relocate.NN(kmp_task_t *task,
void *outer_captures);
The thunk walks the task's captures and overwrites each entry of
task->shareds with the address of the corresponding field projected from
the freshly reconstructed outer pointer block. Two classes of capture do
not need updating and are treated as no-ops by the thunk: captures that
correspond to a firstprivate list item (the body reads from the per-task
'.kmp_privates.t' snapshot, populated when the task is allocated and
-- for non-trivial types -- reset on each replay by the clone helper
introduced later), and captures of variables with static storage duration
(their address is link-time fixed). Reductions of a local-stack variable
are intentionally not in this set: the taskred state is keyed on the
recording-time taskgroup hierarchy and is not yet usable on replay,
so we prefer to preserve today's relocate-returns-null / runtime-aborts
behaviour for that case so the limitation surfaces as a diagnostic.
emitTaskCall now emits such a thunk for each taskgraph-recorded task
and passes it as the new trailing argument of __kmpc_taskgraph_task.
The redundant 'shareds' parameter is dropped, since relocation now
provides the supported mechanism for refreshing that pointer.
On the runtime side (kmp.h, kmp_tasking.cpp, OMPKinds.def),
introduce a new typedef kmp_task_relocate_t and store the callback
on each recorded task in kmp_taskgraph_node_t::relocate, together
with the outer-record pointer captured at __kmpc_taskgraph entry in
kmp_taskgraph_record_t::taskgraph_args. __kmp_omp_tg_task invokes
the callback on replay, and aborts with a new fatal diagnostic
(OmpTaskgraphBadCapture, i18n/en_US.txt) when a recorded task has a
non-null shareds payload but no relocation callback. There is also a
fix for a pre-existing bug in __kmp_taskgraph_clone_task -- the cloned
task's shareds pointer was left referring to the original's payload --
which becomes observable as soon as the relocation thunk writes through
that pointer.
New libomp tests cover lexical and non-lexical shared captures,
pointer captures, non-trivial types, recursive recordings,
stack-depth differences across replays, and the saved/expired-
graph cases.
Assisted-By: Claude Opus 4.7
---
Patch is 47.65 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/200404.diff
17 Files Affected:
- (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+177-8)
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+1-1)
- (modified) openmp/runtime/src/i18n/en_US.txt (+1)
- (modified) openmp/runtime/src/kmp.h (+5-2)
- (modified) openmp/runtime/src/kmp_tasking.cpp (+32-11)
- (added) openmp/runtime/test/taskgraph/taskgraph_firstprivate_stack_depth.cpp
(+111)
- (added)
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_mixed_capture.cpp
(+44)
- (added)
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_nontrivial_type.cpp
(+58)
- (added)
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_nontrivial_type_recursive.cpp
(+86)
- (added)
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_pointer.cpp
(+42)
- (added)
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_pointer_recursive_frameid.cpp
(+75)
- (added)
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_recursive.cpp
(+44)
- (added)
openmp/runtime/test/taskgraph/taskgraph_replayable_lexical_shared_works.cpp
(+41)
- (added)
openmp/runtime/test/taskgraph/taskgraph_replayable_nonlexical_shared_fails_1.cpp
(+47)
- (added)
openmp/runtime/test/taskgraph/taskgraph_replayable_nonlexical_shared_fails_2.cpp
(+66)
- (added)
openmp/runtime/test/taskgraph/taskgraph_replayable_saved_stack_depth.cpp (+115)
- (added) openmp/runtime/test/taskgraph/taskgraph_shared_stack_depth.cpp (+93)
``````````diff
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 9f00545cd0839..9f342038f2285 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -2241,6 +2241,169 @@ void CGOpenMPRuntime::emitTaskyieldCall(CodeGenFunction
&CGF,
Region->emitUntiedSwitch(CGF);
}
+/// Emit a helper with the runtime relocation signature (kmp_task_relocate_t):
+/// void relocate(kmp_task_t *task, void *outer_captures);
+///
+/// On taskgraph replay the runtime invokes this helper to refresh the task's
+/// shared-pointer table. Each capture (a shared-by-ref variable or \c this)
+/// that the task body actually dereferences at execution time is
+/// re-projected from the freshly reconstructed outer record passed as
+/// \p outer_captures and stored back into \c task->shareds.
+///
+/// Captures that the body cannot observe a changed address for across
+/// replays are skipped here:
+///
+/// * captures of a variable that appears as a firstprivate list item
+/// -- the body sources the value from the per-task '.kmp_privates.t'
+/// snapshot rather than from the shareds slot, so the (potentially
+/// stale) original address in the shareds entry is harmless;
+///
+/// * captures of a variable with static (global / namespace-scope /
+/// static-local / static-data-member) storage duration -- the
+/// captured pointer is the variable's link-time-fixed address, which
+/// is identical at recording and on every replay, so no re-projection
+/// is meaningful.
+///
+/// The relocate helper is therefore only ever called upon to refresh
+/// shareds slots that the body genuinely depends on at execution time
+/// (shared-by-ref to a local variable, captured \c this on a heap or
+/// stack object, etc.). When every capture falls in one of the
+/// skip-eligible categories the helper is emitted as a (still non-null)
+/// no-op: today's runtime only inspects null-vs-non-null, and a non-null
+/// no-op is the right signal that there is nothing the body actually
+/// needs the shareds table refreshed for.
+///
+/// Reduction captures of a local-stack variable still keep the existing
+/// null-relocate-and-abort behaviour: the taskred runtime state is keyed
+/// off the recording-time taskgroup hierarchy and is not currently usable
+/// on replay, so it is preferable to fail loudly (#302) than to silently
+/// misbehave. Reduction captures of a static-storage variable do not run
+/// into this hazard at the relocate layer -- the captured pointer is
+/// stable -- and are no-op-skipped via the static-storage rule above;
+/// whether the reduction body itself then succeeds on replay is a
+/// separate concern.
+///
+/// Returns null only when at least one capture is genuinely shared (none
+/// of the skip-eligible categories apply) AND cannot be resolved in
+/// \p OuterCSI; in that case the caller passes a null relocation function
+/// to the runtime and the runtime fails fast at replay.
+static llvm::Function *
+emitTaskRelocationFunction(CodeGenModule &CGM, SourceLocation Loc,
+ const CapturedStmt &CS,
+ const CodeGenFunction::CGCapturedStmtInfo *OuterCSI,
+ const OMPTaskDataTy &Data) {
+ ASTContext &C = CGM.getContext();
+
+ // Variables that don't need their shareds slot refreshed across replays
+ // because the body sources them from the per-task '.kmp_privates.t'
+ // snapshot. Today this is the set of firstprivate list items (snapshot
+ // is taken at task allocation and reused unchanged by every replay).
+ llvm::SmallPtrSet<const VarDecl *, 8> NoRelocateFirstprivateVars;
+ for (const Expr *E : Data.FirstprivateVars) {
+ if (!E)
+ continue;
+ if (const auto *DRE = dyn_cast<DeclRefExpr>(E->IgnoreParenImpCasts()))
+ if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl()))
+ NoRelocateFirstprivateVars.insert(VD->getCanonicalDecl());
+ }
+
+ // A capture is "no-op-safe" with respect to taskgraph replay when
+ // refreshing its shareds slot is provably unnecessary - either because
+ // the body never reads from that slot (firstprivate) or because the
+ // captured pointer is a link-time-fixed address and is therefore
+ // identical at every replay (static storage duration).
+ auto IsNoOpRelocate = [&](const CapturedStmt::Capture &Cap) {
+ if (Cap.capturesThis() || !Cap.capturesVariable())
+ return false;
+ const VarDecl *VD = Cap.getCapturedVar();
+ if (VD->hasGlobalStorage())
+ return true;
+ return NoRelocateFirstprivateVars.contains(VD->getCanonicalDecl());
+ };
+
+ auto LookupOuterField =
+ [&](const CapturedStmt::Capture &Cap) -> const FieldDecl * {
+ if (!OuterCSI)
+ return nullptr;
+ return Cap.capturesThis() ? OuterCSI->getThisFieldDecl()
+ : OuterCSI->lookup(Cap.getCapturedVar());
+ };
+
+ // Bail out before emitting any IR if a genuinely-shared capture cannot
+ // be resolved in the containing context. No-op-safe captures (see the
+ // function-level comment) don't participate in this preflight; they
+ // simply cause the helper to skip their slot below.
+ if (llvm::any_of(CS.captures(), [&](const CapturedStmt::Capture &Cap) {
+ assert((Cap.capturesThis() || Cap.capturesVariable()) &&
+ "OpenMP task capture must be shared-by-ref or 'this'");
+ return !IsNoOpRelocate(Cap) && !LookupOuterField(Cap);
+ }))
+ return nullptr;
+
+ // void relocate(void *task, void *outer_captures)
+ auto *TaskArg =
+ ImplicitParamDecl::Create(C, /*DC=*/nullptr, Loc, /*Id=*/nullptr,
+ C.VoidPtrTy, ImplicitParamKind::Other);
+ auto *OuterArg =
+ ImplicitParamDecl::Create(C, /*DC=*/nullptr, Loc, /*Id=*/nullptr,
+ C.VoidPtrTy, ImplicitParamKind::Other);
+ FunctionArgList Args{TaskArg, OuterArg};
+ const CGFunctionInfo &FnInfo =
+ CGM.getTypes().arrangeBuiltinFunctionDeclaration(C.VoidTy, Args);
+
+ std::string Name =
+ CGM.getOpenMPRuntime().getName({"omp", "taskgraph", "relocate", ""});
+ auto *Fn = llvm::Function::Create(CGM.getTypes().GetFunctionType(FnInfo),
+ llvm::GlobalValue::InternalLinkage, Name,
+ &CGM.getModule());
+ CGM.SetInternalFunctionAttributes(GlobalDecl(), Fn, FnInfo);
+ if (!CGM.getCodeGenOpts().SampleProfileFile.empty())
+ Fn->addFnAttr("sample-profile-suffix-elision-policy", "selected");
+ Fn->setDoesNotRecurse();
+
+ CodeGenFunction CGF(CGM);
+ CGF.StartFunction(GlobalDecl(), C.VoidTy, Fn, FnInfo, Args, Loc, Loc);
+
+ CGBuilderTy &Bld = CGF.Builder;
+ CharUnits PtrAlign = CGF.getPointerAlign();
+
+ // Base of the reconstructed outer record for this replay.
+ llvm::Value *OuterRaw = Bld.CreateLoad(CGF.GetAddrOfLocalVar(OuterArg));
+
+ // kmp_task_t::shareds is the first field of the runtime task descriptor;
+ // load it to obtain the void* shared table that we will refresh in place.
+ // The table holds one void* per by-ref capture.
+ llvm::Value *TaskRaw = Bld.CreateLoad(CGF.GetAddrOfLocalVar(TaskArg));
+ llvm::Value *SharedRaw =
+ Bld.CreateLoad(Address(TaskRaw, CGF.VoidPtrTy, PtrAlign));
+ Address SharedTable(SharedRaw, CGF.VoidPtrTy, PtrAlign);
+
+ unsigned Index = 0;
+ for (const CapturedStmt::Capture &Cap : CS.captures()) {
+ // Always advance the slot index so that we stay aligned with the
+ // shareds-table layout established at task allocation.
+ unsigned ThisIndex = Index++;
+ if (IsNoOpRelocate(Cap))
+ continue;
+ // Project the capture's referent from the freshly reconstructed outer
+ // record. EmitLValueForField auto-loads the outer reference field, so
+ // the resulting pointer is the live referent address (not the slot).
+ const FieldDecl *OuterField = LookupOuterField(Cap);
+ assert(OuterField && "preflight should have rejected this capture");
+ QualType OuterTy =
+ C.getCanonicalTagType(cast<RecordDecl>(OuterField->getDeclContext()));
+ LValue OuterBase = CGF.MakeAddrLValue(
+ Address(OuterRaw, CGF.ConvertTypeForMem(OuterTy), PtrAlign), OuterTy);
+ llvm::Value *Mapped =
+ CGF.EmitLValueForField(OuterBase, OuterField).getPointer(CGF);
+ Mapped = Bld.CreatePointerBitCastOrAddrSpaceCast(Mapped, CGM.VoidPtrTy);
+ Bld.CreateStore(Mapped, Bld.CreateConstGEP(SharedTable, ThisIndex));
+ }
+
+ CGF.FinishFunction();
+ return Fn;
+}
+
void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
SourceLocation Loc,
const OMPExecutableDirective &D,
@@ -4800,22 +4963,28 @@ void CGOpenMPRuntime::emitTaskCall(
TGTaskArgs[2] = Result.NewTask;
TGTaskArgs[3] = TaskAllocArgs[0]; // TaskFlags
TGTaskArgs[4] = TaskAllocArgs[1]; // KmpTaskTWithPrivatesTySize
- TGTaskArgs[5] = Shareds.emitRawPointer(CGF);
- TGTaskArgs[6] = TaskAllocArgs[2]; // SharedsSize
+ TGTaskArgs[5] = TaskAllocArgs[2]; // SharedsSize
if (auto RecType = dyn_cast<RecordType>(SharedsTy)) {
auto *RD = RecType->getAsRecordDecl();
if (RD->fields().empty()) {
// FIXME: The condition might not be precisely correct here.
- TGTaskArgs[6] = CGF.Builder.getSize(0);
+ TGTaskArgs[5] = CGF.Builder.getSize(0);
}
}
if (Data.Dependences.size() == 0) {
- TGTaskArgs[7] = CGF.Builder.getInt32(0);
- TGTaskArgs[8] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
+ TGTaskArgs[6] = CGF.Builder.getInt32(0);
+ TGTaskArgs[7] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
} else {
- TGTaskArgs[7] = NumOfElements;
- TGTaskArgs[8] = DependenciesArray.emitRawPointer(CGF);
- }
+ TGTaskArgs[6] = NumOfElements;
+ TGTaskArgs[7] = DependenciesArray.emitRawPointer(CGF);
+ }
+ const auto *CS = cast<CapturedStmt>(D.getAssociatedStmt());
+ llvm::Function *RelocFn =
+ emitTaskRelocationFunction(CGM, Loc, *CS, CGF.CapturedStmtInfo, Data);
+ TGTaskArgs[8] = RelocFn
+ ? CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+ RelocFn, CGM.VoidPtrTy)
+ : llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_taskgraph_task),
TGTaskArgs);
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index fc24280eaa077..e32308df74cae 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -360,7 +360,7 @@ __OMP_RTL(__kmpc_taskgroup, false, Void, IdentPtr, Int32)
__OMP_RTL(__kmpc_taskgraph, false, Void, IdentPtr, Int32, VoidPtrPtr, SizeTy,
Int32, Int32, VoidPtr, VoidPtr)
__OMP_RTL(__kmpc_taskgraph_task, false, Int32, IdentPtr, Int32, VoidPtr, Int32,
- SizeTy, VoidPtr, SizeTy, Int32, VoidPtr)
+ SizeTy, SizeTy, Int32, VoidPtr, VoidPtr)
__OMP_RTL(__kmpc_taskgraph_taskloop, false, Int32, IdentPtr, Int32, VoidPtr,
Int32, SizeTy, VoidPtr, SizeTy, Int32, Int64Ptr, Int64Ptr, Int64,
Int32, Int32, Int64, Int32, VoidPtr)
diff --git a/openmp/runtime/src/i18n/en_US.txt
b/openmp/runtime/src/i18n/en_US.txt
index 08e837d3dea11..3cd852abd66c6 100644
--- a/openmp/runtime/src/i18n/en_US.txt
+++ b/openmp/runtime/src/i18n/en_US.txt
@@ -482,6 +482,7 @@ AffHWSubsetIgnoringAttr "KMP_HW_SUBSET: ignoring %1$s
attribute. This machi
TargetMemNotAvailable "Target memory not available, will use default
allocator."
AffIgnoringNonHybrid "%1$s ignored: This machine is not a hybrid
architecutre. Using \"%2$s\" instead."
AffIgnoringNotAvailable "%1$s ignored: %2$s is not available. Using
\"%3$s\" instead."
+OmpTaskgraphBadCapture "Cannot locate captured shared variable reference
for taskgraph replay"
#
--------------------------------------------------------------------------------------------------
-*- HINTS -*-
diff --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index d660c4e191d13..befca12786e70 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -2482,6 +2482,7 @@ extern kmp_uint64 __kmp_taskloop_min_tasks;
/*!
*/
typedef kmp_int32 (*kmp_routine_entry_t)(kmp_int32, void *);
+typedef void (*kmp_task_relocate_t)(struct kmp_task *, void *);
typedef union kmp_cmplrdata {
kmp_int32 priority; /**< priority specified by user for the task */
@@ -2692,6 +2693,7 @@ typedef struct kmp_taskgraph_region_dep {
typedef struct kmp_taskgraph_node {
kmp_task_t *task;
bool taskloop_task;
+ kmp_task_relocate_t relocate;
kmp_taskgraph_reduce_input_data_t *reduce_input;
union {
// Valid when KMP_TDG_RECORDING in parent taskgraph record.
@@ -2777,6 +2779,7 @@ typedef struct kmp_taskgraph_record {
struct kmp_taskgraph_exec_descr *exec_descrs;
kmp_size_t exec_descr_size;
kmp_lock_t replay_lock;
+ void *taskgraph_args = nullptr;
// We need a taskgroup structure to keep track of recorded tasks. This is
// set to TRUE if the user requested "nogroup" on the taskgraph directive
// (then we can avoid blocking at the end of the taskgraph region on replay,
@@ -4507,8 +4510,8 @@ KMP_EXPORT void __kmpc_taskgraph(ident_t *loc_ref,
kmp_int32 gtid,
void *args);
KMP_EXPORT kmp_uint32 __kmpc_taskgraph_task(
ident_t *loc_ref, kmp_int32 gtid, kmp_task_t *new_task, kmp_int32 flags,
- size_t sizeof_kmp_task_t, void *shareds, size_t sizeof_shareds,
- kmp_int32 ndeps, kmp_depend_info_t *dep_list);
+ size_t sizeof_kmp_task_t, size_t sizeof_shareds,
+ kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_task_relocate_t reloc);
KMP_EXPORT kmp_uint32 __kmpc_taskgraph_taskloop(
ident_t *loc_ref, kmp_int32 gtid, kmp_task_t *new_task, kmp_int32 flags,
size_t sizeof_kmp_task_t, void *shareds, size_t sizeof_shareds,
diff --git a/openmp/runtime/src/kmp_tasking.cpp
b/openmp/runtime/src/kmp_tasking.cpp
index 2f73a75f11e7c..d595c555a72c0 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -2352,10 +2352,11 @@ static void
__kmp_exec_descr_link_instances(kmp_taskgraph_exec_descr_t *descrs,
/// Reset, reparent and regroup the recorded task TASK and re-invoke it.
-static void __kmp_omp_tg_task(kmp_int32 gtid, kmp_task_t *task,
+static void __kmp_omp_tg_task(kmp_int32 gtid, kmp_taskgraph_node_t *node,
kmp_taskgroup_t *taskgroup,
kmp_taskdata_t *parent_taskdata,
bool serialize_immediate) {
+ kmp_task_t *task = node->task;
kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
taskdata->td_parent = parent_taskdata;
@@ -2378,6 +2379,18 @@ static void __kmp_omp_tg_task(kmp_int32 gtid, kmp_task_t
*task,
if (parent_taskdata->td_flags.tasktype == TASK_EXPLICIT)
KMP_ATOMIC_INC(&parent_taskdata->td_allocated_child_tasks);
+ if (node->relocate) {
+ // Call the task's relocation function with the incoming args from the
owning
+ // taskgraph. This rewrites capture-by-reference variables to point to the
+ // correct location on the replayed taskgraph's stack (which may not be the
+ // same as the location from the initial recorded taskgraph).
+ node->relocate(task, taskdata->owning_taskgraph->taskgraph_args);
+ } else if (task->shareds != NULL) {
+ // A missing relocation callback is only fatal when there is a non-empty
+ // shareds payload that may contain by-reference captures needing remap.
+ KMP_FATAL(OmpTaskgraphBadCapture);
+ }
+
__kmp_omp_task(gtid, task, false);
}
@@ -2404,9 +2417,9 @@ static void __kmp_taskgraph_exec_descr_start(kmp_int32
gtid, kmp_info_t *thread,
kmp_int32 nblocks = KMP_ATOMIC_DEC(&lowest_descr->nblocks);
if (nblocks <= 0) {
if (descr->region->type == TASKGRAPH_REGION_NODE) {
- kmp_task_t *task = descr->region->task.node->task;
+ kmp_taskgraph_node_t *node = descr->region->task.node;
kmp_taskdata_t *current_taskdata = thread->th.th_current_task;
- __kmp_omp_tg_task(gtid, task, taskgroup, current_taskdata, false);
+ __kmp_omp_tg_task(gtid, node, taskgroup, current_taskdata, false);
} else {
// There's no task for a 'taskwait', so start successors immediately.
kmp_taskgraph_exec_descr_t *walk = descr;
@@ -2447,9 +2460,9 @@ static void __kmp_taskgraph_exec_descr_start(kmp_int32
gtid, kmp_info_t *thread,
kmp_taskgraph_exec_descr_t *item = head;
do {
assert(item->region->type == TASKGRAPH_REGION_NODE);
- kmp_task_t *task = item->region->task.node->task;
+ kmp_taskgraph_node_t *node = item->region->task.node;
kmp_taskdata_t *current_taskdata = thread->th.th_current_task;
- __kmp_omp_tg_task(gtid, task, taskgroup, current_taskdata, true);
+ __kmp_omp_tg_task(gtid, node, taskgroup, current_taskdata, true);
item = item->sibling;
} while (item != head);
break;
@@ -5023,6 +5036,7 @@ __kmp_taskgraph_node_alloc(kmp_taskgraph_record_t *rec,
kmp_task_t *task,
new_task->task = task;
new_task->taskloop_task = false;
+ new_task->relocate = nullptr;
new_task->reduce_input = nullptr;
new_task->u.unresolved.ndeps = 0;
new_task->u.unresolved.dep_list = nullptr;
@@ -5755,6 +5769,7 @@ static void __kmp_taskgraph_reset(kmp_taskgraph_record_t
*rec, kmp_int32 gtid,
rec->num_mutexes = 0;
rec->exec_descrs = nullptr;
rec->exec_descr_size = 0;
+ rec->taskgraph_args = nullptr;
rec->next = nullptr;
}
@@ -5852,10 +5867,6 @@ static kmp_task_t *__kmp_taskgraph_clone_task(kmp_info_t
*thread,
// FIXME: This should use a "taskdup" function like taskloops in cases where
// private variables are not trivially copyable. For now, do it by plain
// bitwise copy.
- // FIXME 2: It's intended that this copy be persistent, and can be
- // re-executed on taskgraph replay. Make sure that works (for shared
- // variables) if stack addresses change (i.e. a task-generating function is
- // called from different call stack depths).
kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(orig);
size_t shareds_offset = sizeof(kmp_taskdata_t) + sizeof_kmp_task_t;
shareds_offset = __kmp_round_up_to_val(shareds_offset, sizeof(kmp_uint64));
@@ -5864,6 +5875,11 @@ static kmp_task_t *__kmp_taskgraph_clone_task(kmp_info_t
*thread,
KMP_MEMCPY(copy_td, taskdata, shareds_offset + sizeof_shareds);
// Tasks cloned for a taskgraph always have this field set.
copy_td->owning_taskgraph = taskgraph;
+ kmp_task_t *copy_task = KMP_TASKDATA_TO_TASK(copy_td);
+ if (orig->shareds) {
+ // New task's shared data has now moved. Update the pointer.
+ copy_task->shareds = (void*) ((char*) copy_td + shareds_offset);
+ }
KMP_ATOMIC_ST_RLX(©_td->td_incomplete_child_tasks, 0);
return KMP_TASKDATA_TO_TASK(copy_td);
}
@@ -5972,6 +5988,9 @@ void __kmpc_taskgraph(ident_t *loc_ref, kmp_int32 gtid,
// taskgroup.
KMP_ATOMIC_ST_REL(&taskgroup->taskgraph.recording, record);
}
+ // Keep the current taskgraph invocation's outlined-entry args for
+ // replay-time relocation of by-reference captures.
+ record->taskgraph_args = args;
__kmp_release_lock(&header->header_lock, gtid);
kmp_taskgraph_status_t status = KMP_ATOMIC_LD_ACQ(&record->status);
@@ -6000,9 +6019,10 @@ void __kmpc_taskgraph(ident_t *loc_ref, kmp_int32 gtid,
kmp_uint32 __kmpc_taskgraph_task(ident_t *loc_ref, kmp_int32 gtid,
kmp_task_t *new_task, kmp_int32 flags,
- size_t sizeof_kmp_task_t, void *shareds,
+ size_t sizeof_kmp_task_t,
size_t sizeof_shareds, kmp_int32 ndeps,
- kmp_depend_info_t *dep_list) {
+ kmp_depend_info_t *dep_list,
+ kmp_task_relocate_t relocate) {
kmp_info_t *thread = __kmp_threads[gtid];
kmp_taskgroup_t *taskgroup = thread->th.th_current_task->td_taskgroup;
kmp_taskgraph_record_t *rec = __kmp_taskgraph_or_parent_recording(taskgroup);
@@ -6038,6 +6058,7 @@ kmp_uint...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/200404
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits