================ @@ -0,0 +1,180 @@ +=============================== + AMDGPU Asynchronous Operations +=============================== + +.. contents:: + :local: + +Introduction +============ + +Asynchronous operations are memory transfers (usually between the global memory +and LDS) that are completed independently at an unspecified scope. A thread that +requests one or more asynchronous transfers can use *async markers* to track +their completion. The thread waits for each marker to be *completed*, which +indicates that requests initiated in program order before this marker have also +completed. + +Operations +========== + +``async_load_to_lds`` +--------------------- + +.. code-block:: llvm + + ; Legacy "LDS DMA" operations + void @llvm.amdgcn.load.to.lds(ptr %src, ptr %dst, ASYNC) + void @llvm.amdgcn.global.load.lds(ptr %src, ptr %dst, ASYNC) + void @llvm.amdgcn.raw.buffer.load.lds(ptr %src, ptr %dst, ASYNC) + void @llvm.amdgcn.raw.ptr.buffer.load.lds(ptr %src, ptr %dst, ASYNC) + void @llvm.amdgcn.struct.buffer.load.lds(ptr %src, ptr %dst, ASYNC) + void @llvm.amdgcn.struct.ptr.buffer.load.lds(ptr %src, ptr %dst, ASYNC) + +Requests an async operation that copies the specified number of bytes from the +global/buffer pointer ``%src`` to the LDS pointer ``%dst``. + +The optional parameter `ASYNC` is a bit in the auxiliary argument to those +intrinsics, as documented in :ref:`LDS DMA operations<amdgpu-lds-dma-bits>`. +When set, it indicates that the compiler should not automatically track the +completion of this operation. + +``@llvm.amdgcn.asyncmark()`` +---------------------------- + +Creates an *async marker* to track all the async operations that are program +ordered before this call. A marker M is said to be *completed* only when all +async operations program ordered before M are reported by the implementation as +having finished, and it is said to be *outstanding* otherwise. + +Thus we have the following sufficient condition: + + An async operation X is *completed* at a program point P if there exists a + marker M such that X is program ordered before M, M is program ordered before + P, and M is completed. X is said to be *outstanding* at P otherwise. + +``@llvm.amdgcn.wait.asyncmark(i32 %N)`` +--------------------------------------- + +Waits until the ``N+1`` th predecessor marker M in program order before this +call is completed, if M exists. + +N is an unsigned integer; the ``N+1`` th predecessor marker of point X is a +marker M such that there are `N` markers in program order from M to X, not +including M. + +Memory Consistency Model +======================== + +Each asynchronous operation consists of a non-atomic read on the source and a +non-atomic write on the destination. Legacy "LDS DMA" intrinsics result in async +accesses that guarantee visibility relative to other memory operations as +follows: + + The side-effects of an asynchronous operation `A` program ordered before any + memory operation `X` are visible to `X` if `A` is completed before `X`. + + The side-effects of any memory operation `X` program ordered before an + asynchronous operation `A` are visible to `A`. + +Function calls in LLVM +====================== + +The underlying abstract machine does not implicitly track the completion of +async operations while entering or returning from a function call. + +.. note:: + + As long as the caller uses sufficient wait's to track its own async ---------------- arsenm wrote:
```suggestion As long as the caller uses sufficient waits to track its own async ``` https://github.com/llvm/llvm-project/pull/173259 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
