================ @@ -0,0 +1,180 @@ +=============================== + AMDGPU Asynchronous Operations +=============================== + +.. contents:: + :local: + +Introduction +============ + +Asynchronous operations are memory transfers (usually between the global memory +and LDS) that are completed independently at an unspecified scope. A thread that +requests one or more asynchronous transfers can use *async markers* to track +their completion. The thread waits for each marker to be *completed*, which +indicates that requests initiated in program order before this marker have also +completed. + +Operations +========== + +``async_load_to_lds`` +--------------------- + +.. code-block:: llvm + + ; Legacy "LDS DMA" operations + void @llvm.amdgcn.load.to.lds(ptr %src, ptr %dst, ASYNC) + void @llvm.amdgcn.global.load.lds(ptr %src, ptr %dst, ASYNC) + void @llvm.amdgcn.raw.buffer.load.lds(ptr %src, ptr %dst, ASYNC) + void @llvm.amdgcn.raw.ptr.buffer.load.lds(ptr %src, ptr %dst, ASYNC) + void @llvm.amdgcn.struct.buffer.load.lds(ptr %src, ptr %dst, ASYNC) + void @llvm.amdgcn.struct.ptr.buffer.load.lds(ptr %src, ptr %dst, ASYNC) + +Requests an async operation that copies the specified number of bytes from the +global/buffer pointer ``%src`` to the LDS pointer ``%dst``. + +The optional parameter `ASYNC` is a bit in the auxiliary argument to those +intrinsics, as documented in :ref:`LDS DMA operations<amdgpu-lds-dma-bits>`. +When set, it indicates that the compiler should not automatically track the +completion of this operation. + +``@llvm.amdgcn.asyncmark()`` +---------------------------- + +Creates an *async marker* to track all the async operations that are program +ordered before this call. A marker M is said to be *completed* only when all +async operations program ordered before M are reported by the implementation as +having finished, and it is said to be *outstanding* otherwise. + +Thus we have the following sufficient condition: + + An async operation X is *completed* at a program point P if there exists a + marker M such that X is program ordered before M, M is program ordered before + P, and M is completed. X is said to be *outstanding* at P otherwise. ---------------- ssahasra wrote:
The combination of `asyncmark` and `wait_asyncmark` is a software abstraction over hardware counters. So yes, it does feel a lot like a typical waitcnt. But it frees the programmer from worrying about the semantics of individual async instructions and instead work purely in terms of marks placed at various program points. Note that global/lds transfers are not the only async operations. There are others in GFX1250 which will benefit from this same abstraction. https://github.com/llvm/llvm-project/pull/173259 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
