================ @@ -0,0 +1,602 @@ +# ClangIR ABI Lowering - Design Document + +## 1. Introduction + +This design describes calling convention lowering that **builds on the GSoC ABI +Lowering Library** (PR #140112): we use its `abi::Type*` and target ABI logic +and add an MLIR integration layer (MLIRTypeMapper, ABI lowering pass, and +dialect rewriters). The framework relies on the LLVM ABI library in +`llvm/lib/ABI/` as the single source of truth for ABI classification; MLIR +dialects use it via an adapter layer. The design enables CIR to perform +ABI-compliant calling convention lowering, be reusable by other MLIR dialects +(particularly FIR), and achieve parity with the CIR incubator for x86_64 and +AArch64. **What the design is, in concrete terms:** inputs are high-level +function signatures in CIR, FIR, or other MLIR dialects; outputs are ABI-lowered +signatures and call sites; lowering runs as an MLIR pass in the compilation +pipeline, before dialect lowering to LLVM IR or other back ends. + +### 1.1 Problem Statement + +Calling convention lowering is currently implemented separately for each MLIR +dialect that needs it. The CIR incubator has a partial implementation, but it's +tightly coupled to CIR-specific types and operations, making it unsuitable for +reuse by other dialects. This means that FIR (Fortran IR) and future MLIR +dialects would need to duplicate this complex logic. While Classic Clang +CodeGen contains mature ABI lowering code, it cannot be reused directly because +it's tightly coupled to Clang's AST representation and LLVM IR generation. + +### 1.2 Design Goals + +Building on the GSoC library and adding an MLIR integration layer avoids +duplicating complex ABI logic across MLIR dialects, reduces maintenance, and +keeps a single source of ABI compliance in `llvm/lib/ABI/`. The separation +between GSoC (classification) and dialect-specific ABIRewriteContext (rewriting) +enables clearer testing and a straightforward migration path from the CIR +incubator by porting useful algorithms into the GSoC library where appropriate. + +A central goal is that generated code be **call-compatible with Classic Clang +CodeGen** (and other compilers). Parity is with Classic Clang CodeGen output, +not only with the incubator. Success means CIR correctly lowers x86_64 and +AArch64 calling conventions with full ABI compliance using the GSoC library +and MLIR integration layer; FIR can adopt the same infrastructure with minimal +dialect-specific adaptation (e.g. cdecl when calling C from Fortran). ABI +compliance will be validated through differential testing against Classic Clang +CodeGen, and performance overhead should remain under 5% compared to a direct, +dialect-specific implementation. Initial scope focuses on fixed-argument +functions; variadic support (varargs) is deferred. + +## 2. Background and Context + +### 2.1 What is Calling Convention Lowering? + +Calling convention lowering transforms high-level function signatures to match +target ABI (Application Binary Interface) requirements. When a function is +declared at the source level with convenient, language-level types, these types +must be translated into the specific register assignments, memory layouts, and +calling sequences that the target architecture expects. For example, on x86_64 +System V ABI, a struct containing two 64-bit integers might be "expanded" into +two separate arguments passed in registers, rather than being passed as a single +aggregate: + +``` +// High-level CIR +func @foo(i32, struct<i64, i64>) -> i32 + +// After ABI lowering +func @foo(i32 %arg0, i64 %arg1, i64 %arg2) -> i32 +// ^ ^ ^ ^ +// | | +--------+ struct expanded into fields +// | +---- first field passed in register +// +---- small integer passed in register +``` + +Calling convention lowering is complex for several reasons: it is highly +target-specific (each architecture has different rules for registers vs. +memory), type-dependent (rules differ for integers, floats, structs, unions, +arrays), and context-sensitive (varargs, virtual calls, conventions like +vectorcall or preserve_most). The same target may have multiple ABI variants +(e.g. x86_64 System V vs. Windows x64), adding further complexity. + +### 2.2 Existing Implementations + +#### Classic Clang CodeGen + +Classic Clang CodeGen (located in `clang/lib/CodeGen/`) transforms calling +conventions during the AST-to-LLVM-IR lowering process. This implementation is +mature and well-tested, handling all supported targets with comprehensive ABI +coverage. However, it's tightly coupled to both Clang's AST representation and +LLVM IR, making it difficult to reuse for MLIR-based frontends. + +#### CIR Incubator + +The CIR incubator includes a calling convention lowering pass in +`clang/lib/CIR/Dialect/Transforms/TargetLowering/` that transforms CIR +operations into ABI-lowered CIR operations as an MLIR pass. This implementation +successfully adapted logic from Classic Clang CodeGen to work within the MLIR +framework. However, it relies on CIR-specific types and operations, preventing +reuse by other MLIR dialects. + +#### GSoC ABI Lowering Library + +A 2025 Google Summer of Code project produced [PR +#140112](https://github.com/llvm/llvm-project/pull/140112), which proposes +extracting Clang's ABI logic into a reusable library in `llvm/lib/ABI/`. The +design centers on a shadow type system (`abi::Type*`) separate from both Clang's +AST types and LLVM IR types, enabling the ABI classification algorithms to work +independently of any specific frontend representation. The library includes +abstract `ABIInfo` base classes and target-specific implementations (e.g. +x86_64, BPF) and provides QualTypeMapper for Clang to map `QualType` to +`abi::Type*`. + +Our approach is to complete and extend this library and use it as the single +source of truth for ABI classification. One implementation in one place reduces +duplication, simplifies bug fixes, and creates a path for Classic Clang CodeGen +to use the same logic in the future. MLIR dialects (CIR, FIR, and others) will +use the library via an adapter layer rather than reimplementing ABI logic. + +**Current state.** The x86_64 implementation is largely complete and under +review. AArch64 and some other targets are not yet implemented; there is no +MLIR integration today. The work is being upstreamed in smaller parts (e.g. +[PR 158329](https://github.com/llvm/llvm-project/pull/158329)); progress is +limited by reviewer bandwidth. The overhead of the shadow type system +(converting to and from `abi::Type*`) has been measured at under 0.1% for clang +-O0, so it is negligible for CIR. Our approach therefore depends on the GSoC +library being merged upstream or our contributions to it being accepted. + +**Our approach.** The approach is to complete and extend the GSoC library (e.g. +AArch64, review feedback, tests) and add an **MLIR integration layer** so that +MLIR dialects can use it: + +- **MLIRTypeMapper**: maps `mlir::Type` to `abi::Type*`, analogous to + QualTypeMapper for Clang. + +- **MLIR ABI lowering pass**: uses the library's `ABIInfo` for classification, + then performs dialect-specific rewriting via `ABIRewriteContext` for CIR, FIR, + and other dialects. + +The CIR incubator serves as a **reference only** (e.g. for AArch64 algorithms). +We do not upstream the incubator's CIR-specific ABI implementation as the +long-term solution; we port useful algorithms into the GSoC library where +appropriate. + +### 2.3 Requirements for MLIR Dialects + +CIR needs to lower C/C++ calling conventions correctly, with initial support for +x86_64 and AArch64 targets. It must handle structs, unions, and complex types, +as well as support instance methods and virtual calls. FIR's initial need is +**cdecl for calling C from Fortran** (C interop); that is in scope. +Fortran-specific ABI semantics (e.g. CHARACTER hidden length parameters, array +descriptors) are out of initial scope; full Fortran ABI lowering is a broader +goal. Both dialects share common requirements: strict target ABI compliance, +efficient lowering with minimal overhead, extensibility for adding new target +architectures, and comprehensive testability and validation capabilities. + +## 3. Proposed Solution + +**Core.** The GSoC library in `llvm/lib/ABI/` performs ABI classification on +`abi::Type*`. It provides `ABIInfo` and target-specific implementations +(x86_64, BPF, and eventually AArch64 and others). This is the single place +where ABI rules are implemented. + +**MLIR side.** To use this library from MLIR dialects we add an integration +layer: (1) **MLIRTypeMapper** maps `mlir::Type` to `abi::Type*` (analogous to +QualTypeMapper for Clang). (2) A **generic ABI lowering pass** invokes the +library's `ABIInfo` for classification, then (3) performs **dialect-specific +rewriting** via the `ABIRewriteContext` interface—each dialect (CIR, FIR, etc.) +implements only the glue to create its own operations (e.g. `cir.call`, +`fir.call`). Classification logic is shared; operation creation is +dialect-specific. + +The following diagram shows the layering. At the top, the GSoC library holds +the ABI logic. In the middle, adapters connect frontends to it: Classic Clang +CodeGen uses QualTypeMapper; MLIR uses MLIRTypeMapper and the ABI lowering pass. +At the bottom, each dialect implements `ABIRewriteContext` only; FIR is shown as +a consumer for cdecl/C interop (e.g. calling C from Fortran). + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ GSoC ABI Library (llvm/lib/ABI/) │ +│ ABIInfo, abi::Type*, target implementations (X86, AArch64,…) │ +└─────────────────────────────────────────────────────────────────┘ + │ + ┌─────────────────┴─────────────────┐ + │ │ + ▼ ▼ +┌───────────────────────┐ ┌───────────────────────────────┐ +│ Classic CodeGen │ │ MLIR adapter │ +│ QualTypeMapper │ │ MLIRTypeMapper + ABI pass │ +└───────────────────────┘ └───────────────────────────────┘ + │ + ┌────────────────┼────────────────┐ + │ │ │ + ▼ ▼ ▼ + ┌────────────┐ ┌────────────┐ ┌────────────┐ + │ CIR │ │ FIR │ │ Future │ + │ ABIRewrite │ │ (cdecl/C │ │ Dialects │ + │ Context │ │ interop) │ │ │ + └────────────┘ └────────────┘ └────────────┘ +``` + +## 4. Design Overview + +### 4.1 Architecture Diagram + +The following diagram shows how the design builds on the GSoC library (Section +3). At the top, GSoC holds the ABI classification logic. The middle layer +adapts MLIR to GSoC: MLIRTypeMapper converts `mlir::Type` to `abi::Type*`, and +the MLIR ABI lowering pass invokes GSoC's `ABIInfo` and uses the classification +to drive rewriting. At the bottom, each dialect implements only +`ABIRewriteContext` for operation creation; there is no separate type +abstraction layer in MLIR for classification—that lives in GSoC. + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ GSoC ABI Library (llvm/lib/ABI/) — single source of truth │ +│ abi::Type*, ABIInfo, target implementations (X86_64, AArch64, …) │ +│ Input: abi::Type* → Output: classification (ABIArgInfo, etc.) │ +└─────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ MLIR adapter │ +│ MLIRTypeMapper (mlir::Type → abi::Type*) + MLIR ABI lowering pass │ +│ (1) Map types (2) Call GSoC ABIInfo (3) Drive rewriting from │ +│ classification result │ +└─────────────────────────────────────────────────────────────────────────┘ + │ + ┌─────────────────┼─────────────────┐ + ▼ ▼ ▼ + ┌────────────┐ ┌────────────┐ ┌────────────┐ + │ CIR │ │ FIR │ │ Future │ + │ ABIRewrite │ │ ABIRewrite │ │ Dialects │ + │ Context │ │ Context │ │ │ + └────────────┘ └────────────┘ └────────────┘ + Dialect-specific operation creation only (no type + abstraction for classification in MLIR) +``` + +### 4.2 GSoC, Adapter, and Dialect Layers + +The architecture has three parts. **GSoC** (`llvm/lib/ABI/`) is the single +source of truth for ABI classification: it operates on `abi::Type*` and produces +classification results (e.g. ABIArgInfo, ABIFunctionInfo as defined in GSoC). +Target-specific `ABIInfo` implementations (X86_64, AArch64, etc.) live there. +The **adapter layer** is MLIR-specific: MLIRTypeMapper maps `mlir::Type` to +`abi::Type*`, and the MLIR ABI lowering pass (1) maps types, (2) calls GSoC's +ABIInfo, and (3) uses the classification to drive rewriting. The **dialect +layer** is only ABIRewriteContext: each dialect (CIR, FIR) implements operation +creation (createFunction, createCall, createExtractValue, etc.). There is no +type abstraction layer in MLIR for classification; type queries for ABI are +performed on `abi::Type*` inside GSoC. + +### 4.3 Key Components + +The framework is built from the following components. **GSoC** +(`llvm/lib/ABI/`) provides the single source of truth for ABI classification: +the `abi::Type*` type system, the `ABIInfo` base and target-specific +implementations (e.g. X86_64, AArch64), and the classification result types +(e.g. ABIArgInfo, ABIFunctionInfo). **MLIRTypeMapper** maps `mlir::Type` to +`abi::Type*` so that MLIR dialect types can be classified by GSoC. The **MLIR +ABI lowering pass** orchestrates the flow: it uses MLIRTypeMapper, calls GSoC's +ABIInfo, and drives rewriting from the classification result. +**ABIRewriteContext** is the dialect-specific interface for operation creation +(each dialect implements it to produce e.g. cir.call, fir.call). A **target +registry** (or equivalent) is used to select the appropriate GSoC ABIInfo for +the compilation target. There is no ABITypeInterface or separate "ABIInfo in +MLIR"; classification lives entirely in GSoC. + +### 4.4 ABI Lowering Flow: How the Pieces Fit Together + +This section describes the end-to-end flow of ABI lowering, showing how all +interfaces and components work together. + +#### Step 1: Function Signature Analysis + +The ABI lowering pass begins by analyzing the function signature. When it +encounters a function operation, it extracts the parameter types and return type +to prepare them for classification. At this stage, the types are still in their +high-level, dialect-specific form (e.g., `!cir.struct` for CIR, or `!fir.type` +for FIR). The pass collects these types into a list that will be fed to the +classification logic in the next step. + +``` +Input: func @foo(%arg0: !cir.int<u, 32>, + %arg1: !cir.struct<{!cir.int<u, 64>, + !cir.int<u, 64>}>) -> !cir.int<u, 32> +``` + +#### Step 2: Type Mapping via MLIRTypeMapper + +For each argument and the return type, the pass maps `mlir::Type` to +`abi::Type*` using MLIRTypeMapper. The mapper produces the representation that +GSoC's ABIInfo expects; optionally, it can map back to MLIR types for coercion +types when needed. + +```cpp +// Map dialect types to GSoC's type system +MLIRTypeMapper mlirTypeMapper(module.getDataLayout()); +abi::Type *arg0Abi = mlirTypeMapper.map(arg0Type); // i32 -> IntegerType +abi::Type *arg1Abi = mlirTypeMapper.map(arg1Type); // struct -> RecordType +abi::Type *retAbi = mlirTypeMapper.map(returnType); +``` + +**Key Point**: Classification runs in GSoC on `abi::Type*`; MLIRTypeMapper is +the only bridge from dialect types to that representation. + +#### Step 3: ABI Classification (GSoC ABIInfo) + +GSoC's target-specific `ABIInfo` (e.g. X86_64) performs classification on +`abi::Type*` and produces GSoC's classification result (e.g. ABIFunctionInfo +and ABIArgInfo as defined in `llvm/lib/ABI/`): + +```cpp +// Pass holds a GSoC ABIInfo (from target registry or module target) +llvm::abi::ABIInfo *abiInfo = getABIInfo(); // e.g. X86_64 +llvm::abi::ABIFunctionInfo abiFI; +abiInfo->computeInfo(abiFI, arg0Abi, arg1Abi, retAbi); +// For struct<i64,i64> on x86_64: produces Expand (two i64 args) +``` + +Output: GSoC's classification (e.g. ABIFunctionInfo) for all arguments and +return: +- `%arg0 (i32)` → Direct (pass as-is) +- `%arg1 (struct)` → Expand (split into two i64 fields) +- Return type → Direct + +#### Step 4: Function Signature Rewriting + +After GSoC's classification is complete, the pass rewrites the function to match +the ABI requirements using the dialect's `ABIRewriteContext`. The +classification result (from GSoC) describes the lowered signature; the rewrite +context creates the actual dialect operations. For example, if a struct is +classified as "Expand", the new function signature will have multiple scalar +parameters instead of the single struct parameter. + +```cpp +ABIRewriteContext &ctx = getDialectRewriteContext(); + +// Create new function with lowered signature +FunctionType newType = ...; // (i32, i64, i64) -> i32 +Operation *newFunc = ctx.createFunction(loc, "foo", newType); +``` + +**Key Point**: The original function had signature `(i32, struct) -> i32`, but +the ABI-lowered function has signature `(i32, i64, i64) -> i32` with the struct +expanded into its constituent fields. + +#### Step 5: Argument Expansion + +With the function signature rewritten, the pass updates all call sites to match +the new signature, using the classification from GSoC to drive rewriting via +`ABIRewriteContext`. For arguments classified as "Expand", the pass breaks down +the aggregate into its constituent parts (e.g. struct into two i64 values). +The rewrite context provides operations to extract fields and construct the new +call with the expanded argument list. + +```cpp +// Original call: call @foo(%val0, %structVal) +// Need to extract struct fields: + +Value field0 = ctx.createExtractValue(loc, structVal, {0}); // extract 1st i64 +Value field1 = ctx.createExtractValue(loc, structVal, {1}); // extract 2nd i64 + +// New call with expanded arguments +ctx.createCall(loc, newFunc, {resultType}, {val0, field0, field1}); +``` + +**Key Point**: `ABIRewriteContext` abstracts the dialect-specific operation +creation, so the lowering logic doesn't need to know about CIR operations. + +#### Step 6: Return Value Handling + +For functions returning large structs (indirect return): + +```cpp +// If return type is classified as Indirect: +Value sretPtr = ctx.createAlloca(loc, retType, alignment); +ctx.createCall(loc, func, {}, {sretPtr, ...otherArgs}); +Value result = ctx.createLoad(loc, sretPtr); +``` + +#### Complete Flow Diagram + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ Input: High-Level Function (CIR/FIR/other dialect) │ +│ func @foo(%arg0: i32, %arg1: struct<i64,i64>) -> i32 │ +└────────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Step 1: Extract Types │ +│ For each parameter: mlir::Type │ +└────────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Step 2: Map Types (MLIRTypeMapper → abi::Type*) │ +│ mlirTypeMapper.map(argType) → abi::Type* │ +│ └─> Dialect types converted for GSoC │ +└────────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Step 3: Classify (GSoC ABIInfo) │ +│ abiInfo->computeInfo(abiFI, ...) on abi::Type* │ +│ Applies target rules (e.g. x86_64 System V) │ +│ └─> Produces: GSoC ABIFunctionInfo / ABIArgInfo │ +└────────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Step 4: Rewrite Function (ABIRewriteContext) │ +│ Use GSoC classification to build lowered signature │ +│ └─> ctx.createFunction(loc, name, newType); (i32, i64, i64) │ +└────────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Step 5: Rewrite Call Sites (ABIRewriteContext) │ +│ ctx.createExtractValue() - expand struct; ctx.createCall() │ +│ └─> Dialect-specific operation creation │ +└────────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Output: ABI-Lowered Function │ +│ func @foo(%arg0: i32, %arg1: i64, %arg2: i64) -> i32 │ +└─────────────────────────────────────────────────────────────────┘ +``` + +#### Key Interactions Between Components + +Classification lives in GSoC: `ABIInfo` operates on `abi::Type*` and produces +classification results (e.g. ABIArgInfo, ABIFunctionInfo). MLIR types reach +GSoC only via MLIRTypeMapper, which converts `mlir::Type` to `abi::Type*`. The +lowering pass (1) maps types with MLIRTypeMapper, (2) calls GSoC's ABIInfo to +get classification, and (3) uses that result to drive rewriting through the +dialect's ABIRewriteContext. + +ABIRewriteContext consumes the classification (e.g. "Expand" for a struct) and +performs the actual IR changes: createFunction with the lowered signature, +createExtractValue and createCall at call sites. Each dialect implements +ABIRewriteContext to produce its own operations (e.g. cir.call, fir.call). +This keeps classification in one place (GSoC) and limits dialect code to +operation creation. + +## 5. Detailed Component Design ---------------- andykaylor wrote:
Except for 5.6 this section doesn't seem to provide any new information. https://github.com/llvm/llvm-project/pull/178326 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
