[llvm-bugs] [Bug 179278] [CIR][TRACKING] CUDA/HIP Support for ClangIR

LLVM Bugs via llvm-bugs Mon, 02 Feb 2026 08:42:07 -0800

Issue	179278
Summary	[CIR][TRACKING] CUDA/HIP Support for ClangIR
Labels	ClangIR
Assignees	koparasy, RiverDave
Reporter	RiverDave

    Related: https://github.com/llvm/llvm-project/issues/175871

This issue tracks the effort to bring CUDA and HIP compilation to ClangIR, targeting NVPTX and AMDGPU backends. Our initial scope aims to pass PolyBench, matching the current state achieved in the incubator.


## Reference Material

The following resources contain patches merged in the incubator to establish GPU support:

* [OpenCL/GPU GSoC Tracking Issue](https://github.com/llvm/clangir/issues/689) \- Address space design and OpenCL infrastructure

* [CUDA PRs in incubator](https://github.com/llvm/clangir/pulls?q=is%3Apr+is%3Aclosed+CUDA+sort%3Acreated-asc)

* [HIP PRs in incubator](https://github.com/llvm/clangir/pulls?q=is%3Apr+is%3Aclosed+HIP+sort%3Acreated-asc)

---

## Overview

>From a ClangIR perspective, CUDA/HIP compilation involves three key areas::

1. **Target-specific infrastructure** \- ABIs, address space mapping, calling conventions

2. **Kernel launches and device code execution** \- Stub emission, kernel calls, device-side codegen

3. **Variable registration** \- Device, shared, Surface, Texture, constant, managed variable handling and fatbin support

---

## Goal 1: Target-Specific Infrastructure

Status: **In Progress**

This establishes the foundational support for NVPTX and AMDGPU targets, including ABI information, address space handling, and calling conventions.

### NVPTX Target Info

Status: **In Progress:** 

* https://github.com/llvm/llvm-project/pull/177827

High-level Details:

* NVPTXABIInfo and NVPTXTargetCIRGenInfo classes

* Wire up nvptx/nvptx64 triples in getTargetCIRGenInfo()

* Complete ABI lowering for NVPTX

Incubator references:
* https://github.com/llvm/clangir/pull/1303
* https://github.com/llvm/clangir/pull/1358
* https://github.com/llvm/clangir/pull/1445

### AMDGPU Target Info

Status: **In Progress**
* https://github.com/llvm/llvm-project/pull/179084

High-level Details:

* AMDGPUABIInfo and AMDGPUTargetCIRGenInfo classes

* Wire up amdgcn triples

* AMDGPU-specific function attributes for HIP/OpenCL kernels

Incubator references:
* https://github.com/llvm/clangir/pull/2076
* https://github.com/llvm/clangir/pull/2078
* https://github.com/llvm/clangir/pull/2087
* https://github.com/llvm/clangir/pull/2091

### Address Spaces

Status: **In progress** (target-specific address spaces supported upstream)

CIR uses two separate address space attributes by design, based on feedback from MLIR core dialect maintainers to align with other upstream dialects (specifically the ptr dialect):

**TargetAddressSpaceAttr**: Represents target-specific numeric address spaces. Already upstream:
* https://github.com/llvm/llvm-project/pull/161028

**LangAddressSpaceAttr** : Represents language-specific address spaces (CUDA/OpenCL qualifiers like \_\_shared\_\_, \_\_device\_\_, \_\_constant\_\_). Backported to incubator:
* https://github.com/llvm/clangir/pull/1986

High-Level details:

Both attributes implement MemorySpaceAttrInterface to share a common interface. They remain distinct in CIR but converge during lowering to the LLVM IR dialect (which only supports numeric address spaces).

* TargetAddressSpaceAttr for numeric address spaces

* LangAddressSpaceAttr for language-specific address spaces

* MemorySpaceAttrInterface implementation

* Address space handling for CUDA/HIP qualifiers (\_\_shared\_\_, \_\_device\_\_, \_\_constant\_\_)

* Address space attribute on cir.global ops

Incubator references: 
* https://github.com/llvm/clangir/pull/1986

### Calling Conventions

Status: **Partial**

* CUDA kernel calling convention (PTX\_Kernel)

* SPIR kernel calling convention (SPIR\_KERNEL)

* Proper calling convention propagation to cir.call ops

(Note: Calling conventions are likely to be redesigned in the future based on: https://github.com/llvm/llvm-project/issues/175968)

Incubator references:
 * https://github.com/llvm/clangir/pull/1344 
 * https://github.com/llvm/clangir/pull/760
 * https://github.com/llvm/clangir/pull/772

---

## Goal 2: Kernel Launches and Device Code Generation

Status: **In Progress**

This covers how kernels are launched from host code and how device code is generated.

### CUDA/HIP Global Emission Filtering

Status: **In Progress** 
- https://github.com/llvm/llvm-project/pull/177827

High-Level Details:

* Skip host-only functions when compiling for device (\-fcuda-is-device)

* Skip device-only functions when compiling for host

* Always emit \_\_global\_\_ kernels on both sides

* Always emit \_\_host\_\_ \_\_device\_\_ functions on both sides

* Handle implicit host/device templates and lambda call operators

Incubator references:
 https://github.com/llvm/clangir/pull/1309 
 https://github.com/llvm/clangir/pull/1311

### Device Stub Emission

Status: **In Progress** 
- https://github.com/llvm/llvm-project/pull/177790

High-Level Details:

Device stubs are host-side placeholder functions that set up and launch kernels.

* Generate device stub functions for \_\_global\_\_ kernels

* Emit kernel name attribute on stubs, this is to be consumed during the lowering pass.

* Generate global storing CUDA/HIP stub function pointer

Incubator references:
 * https://github.com/llvm/clangir/pull/1317
 * https://github.com/llvm/clangir/pull/1332
 * https://github.com/llvm/clangir/pull/1341

### Kernel Launch Calls

Status: **Not Started**

* Generate kernel launch configuration (\<\<\<...\>\>\> syntax)

* Emit cudaLaunchKernel / hipLaunchKernel calls

* Support for stream-per-thread API

Incubator references:
 * https://github.com/llvm/clangir/pull/1348
 * https://github.com/llvm/clangir/pull/1952
 * https://github.com/llvm/clangir/pull/1997

---

## Goal 3: Variable Registration and Device Variables

Status: **Not Started**

This covers how device-side variables are declared, initialized, and registered with the runtime.

### Device Variable Handling

* Handle \_\_device\_\_ variables

* Handle \_\_shared\_\_ variables (static local memory)

* Handle \_\_constant\_\_ variables

* Mark device variables as externally\_initialized

* Add CUDADeviceVarAttr for device variable identification

Incubator references:
 * https://github.com/llvm/clangir/pull/1368
 * https://github.com/llvm/clangir/pull/1394
 * https://github.com/llvm/clangir/pull/1436
 * https://github.com/llvm/clangir/pull/1438 
 * https://github.com/llvm/clangir/pull/1444

### Shadow Variables

Status: **Not Started**

Device variables on host correspond to “shadow” variables that must be registered.

* Generate shadow variables for device globals

* Add CUDAShadowNameAttr for shadow variable tracking

* During CodeGen We might need to generate attributes that are consumed in the lowering pass to perform proper registration.

Incubator references:
 * https://github.com/llvm/clangir/pull/1467
 * https://github.com/llvm/clangir/pull/2111

### Fat Binary and Registration

Status: **Not Started**

* Add attribute for CUDA fat binary name

* Generate registration function for kernels

* Register \_\_global\_\_ functions with runtime

* Register global variables with runtime

Incubator references:
 * https://github.com/llvm/clangir/pull/1377 
 * https://github.com/llvm/clangir/pull/1415 
 * https://github.com/llvm/clangir/pull/1441 
 * https://github.com/llvm/clangir/pull/1978
 * https://github.com/llvm/clangir/pull/1980
 * https://github.com/llvm/clangir/pull/1977

---

## Goal 4: Built-in Types and Intrinsics

Status: **Not Started**

### Surface and Texture Types

* Support for built-in CUDA surface type

* Support for built-in CUDA texture type

* TBAA handling for surface/texture types

Incubator references:
 * https://github.com/llvm/clangir/pull/2009

### AMDGPU Intrinsics

(TODO)

### PTX Intrinsics

(TODO)

---

## Target Test Coverage

The following OG CodeGen tests serve as targets for validation of the features to be supported:

### Core Functionality

* [address-spaces.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/address-spaces.cu)

* [const-var.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/const-var.cu)

* [device-var-init.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/device-var-init.cu)

* [global-initializers.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/global-initializers.cu)

### Kernel Infrastructure

* [device-stub.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/device-stub.cu) *(partial \- tests older CUDA versions and RDC linkage not implemented in incubator)*

* [device-init-fun.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/device-init-fun.cu)

* [kernel-args.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/kernel-args.cu)

* [kernel-call.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/kernel-call.cu) *(includes per-thread API \- partially implemented in incubator)*

* [kernel-stub-name.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/kernel-stub-name.cu)

* [ptx-kernels.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/ptx-kernels.cu)

### Static Variables

* [static-device-var-no-rdc.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/static-device-var-no-rdc.cu)

### Nice to Have

* [launch-bounds.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/launch-bounds.cu) *(not in incubator \- adds kernel attributes/metadata for performance)*

* [surface.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/surface.cu) *(basic infrastructure recently added in incubator)*

* [texture.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/texture.cu) *(same as surface)*

* [lambda.cu](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGenCUDA/lambda.cu) *(GPU dispatch through lambda capturing \- used by Kokkos/RAJA portability layers; untested in incubator)*

---

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 179278] [CIR][TRACKING] CUDA/HIP Support for ClangIR

Reply via email to