| Issue |
179278
|
| Summary |
[CIR][TRACKING] CUDA/HIP Support for ClangIR
|
| Labels |
ClangIR
|
| Assignees |
koparasy,
RiverDave
|
| Reporter |
RiverDave
|
Related: https://github.com/llvm/llvm-project/issues/175871
This issue tracks the effort to bring CUDA and HIP compilation to ClangIR, targeting NVPTX and AMDGPU backends. Our initial scope aims to pass PolyBench, matching the current state achieved in the incubator.
## Reference Material
The following resources contain patches merged in the incubator to establish GPU support:
* [OpenCL/GPU GSoC Tracking Issue](https://github.com/llvm/clangir/issues/689) \- Address space design and OpenCL infrastructure
* [CUDA PRs in incubator](https://github.com/llvm/clangir/pulls?q=is%3Apr+is%3Aclosed+CUDA+sort%3Acreated-asc)
* [HIP PRs in incubator](https://github.com/llvm/clangir/pulls?q=is%3Apr+is%3Aclosed+HIP+sort%3Acreated-asc)
---
## Overview
>From a ClangIR perspective, CUDA/HIP compilation involves three key areas::
1. **Target-specific infrastructure** \- ABIs, address space mapping, calling conventions
2. **Kernel launches and device code execution** \- Stub emission, kernel calls, device-side codegen
3. **Variable registration** \- Device, shared, Surface, Texture, constant, managed variable handling and fatbin support
---
## Goal 1: Target-Specific Infrastructure
Status: **In Progress**
This establishes the foundational support for NVPTX and AMDGPU targets, including ABI information, address space handling, and calling conventions.
### NVPTX Target Info
Status: **In Progress:**
* https://github.com/llvm/llvm-project/pull/177827
High-level Details:
* NVPTXABIInfo and NVPTXTargetCIRGenInfo classes
* Wire up nvptx/nvptx64 triples in getTargetCIRGenInfo()
* Complete ABI lowering for NVPTX
Incubator references:
* https://github.com/llvm/clangir/pull/1303
* https://github.com/llvm/clangir/pull/1358
* https://github.com/llvm/clangir/pull/1445
### AMDGPU Target Info
Status: **In Progress**
* https://github.com/llvm/llvm-project/pull/179084
High-level Details:
* AMDGPUABIInfo and AMDGPUTargetCIRGenInfo classes
* Wire up amdgcn triples
* AMDGPU-specific function attributes for HIP/OpenCL kernels
Incubator references:
* https://github.com/llvm/clangir/pull/2076
* https://github.com/llvm/clangir/pull/2078
* https://github.com/llvm/clangir/pull/2087
* https://github.com/llvm/clangir/pull/2091
### Address Spaces
Status: **In progress** (target-specific address spaces supported upstream)
CIR uses two separate address space attributes by design, based on feedback from MLIR core dialect maintainers to align with other upstream dialects (specifically the ptr dialect):
**TargetAddressSpaceAttr**: Represents target-specific numeric address spaces. Already upstream:
* https://github.com/llvm/llvm-project/pull/161028
**LangAddressSpaceAttr** : Represents language-specific address spaces (CUDA/OpenCL qualifiers like \_\_shared\_\_, \_\_device\_\_, \_\_constant\_\_). Backported to incubator:
* https://github.com/llvm/clangir/pull/1986
High-Level details:
Both attributes implement MemorySpaceAttrInterface to share a common interface. They remain distinct in CIR but converge during lowering to the LLVM IR dialect (which only supports numeric address spaces).
* TargetAddressSpaceAttr for numeric address spaces
* LangAddressSpaceAttr for language-specific address spaces
* MemorySpaceAttrInterface implementation
* Address space handling for CUDA/HIP qualifiers (\_\_shared\_\_, \_\_device\_\_, \_\_constant\_\_)
* Address space attribute on cir.global ops
Incubator references:
* https://github.com/llvm/clangir/pull/1986
### Calling Conventions
Status: **Partial**
* CUDA kernel calling convention (PTX\_Kernel)
* SPIR kernel calling convention (SPIR\_KERNEL)
* Proper calling convention propagation to cir.call ops
(Note: Calling conventions are likely to be redesigned in the future based on: https://github.com/llvm/llvm-project/issues/175968)
Incubator references:
* https://github.com/llvm/clangir/pull/1344
* https://github.com/llvm/clangir/pull/760
* https://github.com/llvm/clangir/pull/772
---
## Goal 2: Kernel Launches and Device Code Generation
Status: **In Progress**
This covers how kernels are launched from host code and how device code is generated.
### CUDA/HIP Global Emission Filtering
Status: **In Progress**
- https://github.com/llvm/llvm-project/pull/177827
High-Level Details:
* Skip host-only functions when compiling for device (\-fcuda-is-device)
* Skip device-only functions when compiling for host
* Always emit \_\_global\_\_ kernels on both sides
* Always emit \_\_host\_\_ \_\_device\_\_ functions on both sides
* Handle implicit host/device templates and lambda call operators
Incubator references:
https://github.com/llvm/clangir/pull/1309
https://github.com/llvm/clangir/pull/1311
### Device Stub Emission
Status: **In Progress**
- https://github.com/llvm/llvm-project/pull/177790
High-Level Details:
Device stubs are host-side placeholder functions that set up and launch kernels.
* Generate device stub functions for \_\_global\_\_ kernels
* Emit kernel name attribute on stubs, this is to be consumed during the lowering pass.
* Generate global storing CUDA/HIP stub function pointer
Incubator references:
* https://github.com/llvm/clangir/pull/1317
* https://github.com/llvm/clangir/pull/1332
* https://github.com/llvm/clangir/pull/1341
### Kernel Launch Calls
Status: **Not Started**
* Generate kernel launch configuration (\<\<\<...\>\>\> syntax)
* Emit cudaLaunchKernel / hipLaunchKernel calls
* Support for stream-per-thread API
Incubator references:
* https://github.com/llvm/clangir/pull/1348
* https://github.com/llvm/clangir/pull/1952
* https://github.com/llvm/clangir/pull/1997
---
## Goal 3: Variable Registration and Device Variables
Status: **Not Started**
This covers how device-side variables are declared, initialized, and registered with the runtime.
### Device Variable Handling
* Handle \_\_device\_\_ variables
* Handle \_\_shared\_\_ variables (static local memory)
* Handle \_\_constant\_\_ variables
* Mark device variables as externally\_initialized
* Add CUDADeviceVarAttr for device variable identification
Incubator references:
* https://github.com/llvm/clangir/pull/1368
* https://github.com/llvm/clangir/pull/1394
* https://github.com/llvm/clangir/pull/1436
* https://github.com/llvm/clangir/pull/1438
* https://github.com/llvm/clangir/pull/1444
### Shadow Variables
Status: **Not Started**
Device variables on host correspond to “shadow” variables that must be registered.
* Generate shadow variables for device globals
* Add CUDAShadowNameAttr for shadow variable tracking
* During CodeGen We might need to generate attributes that are consumed in the lowering pass to perform proper registration.
Incubator references:
* https://github.com/llvm/clangir/pull/1467
* https://github.com/llvm/clangir/pull/2111
### Fat Binary and Registration
Status: **Not Started**
* Add attribute for CUDA fat binary name
* Generate registration function for kernels
* Register \_\_global\_\_ functions with runtime
* Register global variables with runtime
Incubator references:
* https://github.com/llvm/clangir/pull/1377
* https://github.com/llvm/clangir/pull/1415
* https://github.com/llvm/clangir/pull/1441
* https://github.com/llvm/clangir/pull/1978
* https://github.com/llvm/clangir/pull/1980
* https://github.com/llvm/clangir/pull/1977
---
## Goal 4: Built-in Types and Intrinsics
Status: **Not Started**
### Surface and Texture Types
* Support for built-in CUDA surface type
* Support for built-in CUDA texture type
* TBAA handling for surface/texture types
Incubator references:
* https://github.com/llvm/clangir/pull/2009
### AMDGPU Intrinsics
(TODO)
### PTX Intrinsics
(TODO)
---
## Target Test Coverage
The following OG CodeGen tests serve as targets for validation of the features to be supported:
### Core Functionality
* [address-spaces.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/address-spaces.cu)
* [const-var.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/const-var.cu)
* [device-var-init.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/device-var-init.cu)
* [global-initializers.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/global-initializers.cu)
### Kernel Infrastructure
* [device-stub.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/device-stub.cu) *(partial \- tests older CUDA versions and RDC linkage not implemented in incubator)*
* [device-init-fun.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/device-init-fun.cu)
* [kernel-args.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/kernel-args.cu)
* [kernel-call.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/kernel-call.cu) *(includes per-thread API \- partially implemented in incubator)*
* [kernel-stub-name.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/kernel-stub-name.cu)
* [ptx-kernels.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/ptx-kernels.cu)
### Static Variables
* [static-device-var-no-rdc.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/static-device-var-no-rdc.cu)
### Nice to Have
* [launch-bounds.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/launch-bounds.cu) *(not in incubator \- adds kernel attributes/metadata for performance)*
* [surface.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/surface.cu) *(basic infrastructure recently added in incubator)*
* [texture.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/texture.cu) *(same as surface)*
* [lambda.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/lambda.cu) *(GPU dispatch through lambda capturing \- used by Kokkos/RAJA portability layers; untested in incubator)*
---
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs