https://github.com/andykaylor updated https://github.com/llvm/llvm-project/pull/176236
>From 0a3deb68485d7c79aa19be60483e5518956a929b Mon Sep 17 00:00:00 2001 From: Andy Kaylor <[email protected]> Date: Thu, 15 Jan 2026 12:04:27 -0800 Subject: [PATCH 1/2] [clang][docs] Add documentation for EH codegen This adds a document describing the implementation of LLVM IR generation for exceptions and C++ cleanup handling. This will be used as a point of reference for future CIR exception handling design work. This document was generated using AI, with some modifications afterwards. --- clang/docs/LLVMExceptionHandlingCodeGen.rst | 266 ++++++++++++++++++++ clang/docs/index.rst | 1 + 2 files changed, 267 insertions(+) create mode 100644 clang/docs/LLVMExceptionHandlingCodeGen.rst diff --git a/clang/docs/LLVMExceptionHandlingCodeGen.rst b/clang/docs/LLVMExceptionHandlingCodeGen.rst new file mode 100644 index 0000000000000..3dbe2fc6dd618 --- /dev/null +++ b/clang/docs/LLVMExceptionHandlingCodeGen.rst @@ -0,0 +1,266 @@ +======================================== +LLVM IR Generation for EH and Cleanups +======================================== + +.. contents:: + :local: + +Overview +======== + +This document describes how Clang's LLVM IR generation represents exception +handling (EH) and C++ cleanups. It focuses on the data structures and control +flow patterns used to model normal and exceptional exits, and it outlines how +the generated IR differs across common ABI models. + +Core Model +========== + +EH and cleanup handling is centered around an ``EHScopeStack`` that records +nested scopes for: + +- **Cleanups**, which run on normal control flow, exceptional control flow, or + both. These are used for destructors, full-expression cleanups, and other + scope-exit actions. +- **Catch scopes**, which represent ``try``/``catch`` handlers. +- **Filter scopes**, used to model dynamic exception specifications and some + platform-specific filters. +- **Terminate scopes**, used for ``noexcept`` and similar termination paths. + +Each cleanup is a small object with an ``Emit`` method. When a cleanup scope is +popped, the IR generator decides whether it must materialize a normal cleanup +block (for fallthrough, branch-through, or unresolved ``goto`` fixups) and/or an +EH cleanup entry (when exceptional control flow can reach the cleanup). This +results in a flattened CFG where cleanup lifetime is represented by the blocks +and edges that flow into those blocks. + +Key Components +============== + +The LLVM IR generation for EH and cleanups is spread across several core +components: + +- ``CodeGenModule`` owns module-wide state such as the LLVM module, target + information, and the selected EH personality function. It provides access to + ABI helpers via ``CGCXXABI`` and target-specific hooks. +- ``CodeGenFunction`` manages per-function state and IR building. It owns the + ``EHScopeStack``, tracks the current insertion point, and emits blocks, calls, + and branches. Most cleanup and EH control flow is built here. +- ``EHScopeStack`` is the central stack of scopes used to model EH and cleanup + semantics. It stores ``EHCleanupScope`` entries for cleanups, along with + ``EHCatchScope``, ``EHFilterScope``, and ``EHTerminateScope`` for handlers and + termination logic. +- ``EHCleanupScope`` stores the cleanup object plus state data (active flags, + fixup depth, and enclosing scope links). When a cleanup scope is popped, + ``CodeGenFunction`` decides whether to emit a normal cleanup block, an EH + cleanup entry, or both. +- Cleanup emission helpers implement the mechanics of branching through + cleanups, threading fixups, and emitting cleanup blocks. +- Exception emission helpers implement landing pads, dispatch blocks, + personality selection, and helper routines for try/catch, filters, and + terminate handling. +- ``CGCXXABI`` (and its ABI-specific implementations such as + ``ItaniumCXXABI`` and ``MicrosoftCXXABI``) provide ABI-specific lowering for + throws, catch handling, and destructor emission details. +- C++ expression, class, and statement emission logic drives construction and + destruction, and is responsible for pushing/popping cleanups in response to + AST constructs. + +These components interact along a consistent pattern: AST traversal in +``CodeGenFunction`` emits code and pushes cleanups or EH scopes; ``EHScopeStack`` +records scope nesting; cleanup and exception helpers materialize the CFG as +scopes are popped; and ``CGCXXABI`` supplies ABI-specific details for landing +pads or funclets. + +Normal Cleanups and Branch Fixups +================================= + +Normal control flow exits (``return``, ``break``, ``goto``, fallthrough, etc.) +are threaded through cleanups by creating explicit cleanup blocks. The +implementation supports unresolved branches to labels by emitting an optimistic +branch and recording a fixup. When a cleanup is popped, fixups are threaded +through the cleanup by turning that optimistic branch into a switch that +dispatches to the correct destination after the cleanup runs. + +Cleanups use a switch on an internal "cleanup destination" slot even for simple +source constructs. It is a general mechanism that allows multiple exits to share +the same cleanup code while still reaching the correct final destination. + +Exceptional Cleanups and EH Dispatch +==================================== + +Exceptional exits (``throw``, ``invoke`` unwinds) are routed through EH cleanup +entries, which are reached via a landing pad or a funclet dispatch block, +depending on the target ABI. + +For Itanium-style EH (such as is used on x86-64 Linux), the IR uses ``invoke`` +to call potentially-throwing operations and a ``landingpad`` instruction to +capture the exception and selector values. The landing pad aggregates the +in-scope catch, filter, and cleanup clauses, then branches to a dispatch block +that compares the selector to type IDs and jumps to the appropriate handler. + +For Windows, LLVM IR uses funclet-style EH: ``catchswitch`` and ``catchpad`` for +handlers, and ``cleanuppad`` for cleanups, with ``catchret`` and ``cleanupret`` +edges to resume normal flow. The personality function determines how these pads +are interpreted by the backend. + +Personality and ABI Selection +============================= + +The IR generation selects a personality function based on language options and +the target ABI (e.g., Itanium, MSVC SEH, SJLJ, Wasm EH). This decision affects: + +- Whether the IR uses landing pads or funclet pads. +- The shape of dispatch logic for catch and filter scopes. +- How termination or rethrow paths are modeled. + +Because the personality choice is made during IR generation, the CFG shape +directly reflects ABI-specific details. + +Example: Array of Objects with Throwing Constructor +=================================================== + +Consider: + +.. code-block:: c++ + + class MyClass { + public: + MyClass(); // may throw + ~MyClass(); + }; + void doSomething(); // may throw + void f() { + MyClass arr[4]; + doSomething(); + } + +High-level behavior +------------------- + +- Construction of ``arr`` proceeds element-by-element. If an element constructor + throws, destructors must run for any elements that were successfully + constructed before the throw in reverse order of construction. +- After full construction, the call to ``doSomething`` may throw, in which case + the destructors for all constructed elements must run, in reverse order. +- On normal exit, destructors for all elements run in reverse order. + +Codegen flow and key components +------------------------------- + +- ``CodeGenFunction::EmitDecl`` routes the local variable to + ``CodeGenFunction::EmitVarDecl`` and then ``CodeGenFunction::EmitAutoVarDecl``, + which in turn calls ``EmitAutoVarAlloca``, ``EmitAutoVarInit``, and + ``EmitAutoVarCleanups``. +- ``CodeGenFunction::EmitCXXAggrConstructorCall`` emits the array constructor + loop. While emitting the loop body, it enters a ``RunCleanupsScope`` and uses + ``CodeGenFunction::pushRegularPartialArrayCleanup`` to register a + cleanup before calling ``CodeGenFunction::EmitCXXConstructorCall`` for one + element in the loop iteration. If this constructor were to throw an exception, + the cleanup handler would destroy the previously constructed elements in + reverse order. +- ``CodeGenFunction::EmitAutoVarCleanups`` calls ``emitAutoVarTypeCleanup``, + which ultimately registers a ``DestroyObject`` cleanup via + ``CodeGenFunction::pushDestroy`` / ``pushFullExprCleanup`` for the full-array + destructor path. +- ``DestroyObject`` uses ``CodeGenFunction::destroyCXXObject``, which emits the + actual destructor call via ``CodeGenFunction::EmitCXXDestructorCall``. +- Cleanup emission helpers (e.g., ``CodeGenFunction::PopCleanupBlock`` and + ``CodeGenFunction::EmitBranchThroughCleanup``) thread both normal and EH exits + through the cleanup blocks as scopes are popped. +- The cleanup is represented as an ``EHCleanupScope`` on ``EHScopeStack``, and + its ``Emit`` method generates a loop that calls the destructor on the + initialized range in reverse order. + +Call-Graph Summary +------------------ + +.. code-block:: text + + EmitDecl + -> EmitVarDecl + -> EmitAutoVarDecl + -> EmitAutoVarAlloca + -> EmitAutoVarInit + -> EmitCXXAggrConstructorCall + -> RunCleanupsScope + -> pushRegularPartialArrayCleanup + -> EmitCXXConstructorCall (per element) + -> EmitAutoVarCleanups + -> emitAutoVarTypeCleanup + -> pushDestroy / pushFullExprCleanup + -> DestroyObject cleanup + -> destroyCXXObject + -> EmitCXXDestructorCall + +Example: Temporary object materialization +========================================= + +Consider: + +.. code-block:: c++ + + class MyClass { + public: + MyClass(); + ~MyClass(); + }; + void useMyClass(MyClass &); + void f() { + useMyClass(MyClass()); + } + +High-level behavior +------------------- + +- The temporary ``MyClass`` is materialized for the call argument. +- The temporary must be destroyed at the end of the full-expression, both on + the normal path and on the exceptional path if ``useMyClass`` throws. +- If the constructor throws, the temporary is not considered constructed and no + destructor runs. + +Codegen flow and key functions +------------------------------ + +- ``CodeGenFunction::EmitExprWithCleanups`` wraps the full-expression in a + ``RunCleanupsScope`` so that full-expression cleanups are run after the call. +- ``CodeGenFunction::EmitMaterializeTemporaryExpr`` creates storage for the + temporary via ``createReferenceTemporary`` and initializes it. For record + temporaries this flows through ``EmitAnyExprToMem`` and + ``CodeGenFunction::EmitCXXConstructExpr``, which calls + ``CodeGenFunction::EmitCXXConstructorCall``. +- ``pushTemporaryCleanup`` registers the destructor as a full-expression + cleanup by calling ``CodeGenFunction::pushDestroy`` for + ``SD_FullExpression`` temporaries. +- The cleanup ultimately uses ``DestroyObject`` and + ``CodeGenFunction::destroyCXXObject``, which emits + ``CodeGenFunction::EmitCXXDestructorCall``. +- The call to ``useMyClass`` is emitted while the temporary is live, and the + cleanup scope ensures the destructor runs on both normal and EH exits. + +Call-Graph Summary +------------------ + +.. code-block:: text + + EmitExprWithCleanups + -> RunCleanupsScope + -> EmitMaterializeTemporaryExpr + -> createReferenceTemporary + -> EmitAnyExprToMem + -> EmitCXXConstructExpr + -> EmitCXXConstructorCall + -> pushTemporaryCleanup + -> pushDestroy + -> DestroyObject cleanup + -> destroyCXXObject + -> EmitCXXDestructorCall + +Notes on Variations +=================== + +The exact shape of generated LLVM IR depends on target ABI, language options, +and optimization level. For example, filters, ``noexcept`` termination scopes, +and async EH options can introduce additional dispatch blocks, personality +selection differences, or outlined helper functions. The patterns above capture +the essential structure used for EH and cleanup handling on the named targets. diff --git a/clang/docs/index.rst b/clang/docs/index.rst index a0d0401ed1c86..c4464c4dbf0a2 100644 --- a/clang/docs/index.rst +++ b/clang/docs/index.rst @@ -122,6 +122,7 @@ Design Documents ControlFlowIntegrityDesign HardwareAssistedAddressSanitizerDesign.rst ConstantInterpreter + LLVMExceptionHandlingCodeGen ClangIRCodeDuplication Indices and tables >From d924d5ead2934436caf19c80e31b0276be65f824 Mon Sep 17 00:00:00 2001 From: Andy Kaylor <[email protected]> Date: Fri, 16 Jan 2026 11:34:34 -0800 Subject: [PATCH 2/2] Address review feedback --- clang/docs/LLVMExceptionHandlingCodeGen.rst | 139 +++++++++----------- 1 file changed, 62 insertions(+), 77 deletions(-) diff --git a/clang/docs/LLVMExceptionHandlingCodeGen.rst b/clang/docs/LLVMExceptionHandlingCodeGen.rst index 3dbe2fc6dd618..b3c995f0d60b4 100644 --- a/clang/docs/LLVMExceptionHandlingCodeGen.rst +++ b/clang/docs/LLVMExceptionHandlingCodeGen.rst @@ -13,6 +13,9 @@ handling (EH) and C++ cleanups. It focuses on the data structures and control flow patterns used to model normal and exceptional exits, and it outlines how the generated IR differs across common ABI models. +For details on the LLVM IR representation of exception handling, see +:doc:`LLVM Exception Handling <https://llvm.org/docs/ExceptionHandling.html>`. + Core Model ========== @@ -62,29 +65,44 @@ components: - ``CGCXXABI`` (and its ABI-specific implementations such as ``ItaniumCXXABI`` and ``MicrosoftCXXABI``) provide ABI-specific lowering for throws, catch handling, and destructor emission details. -- C++ expression, class, and statement emission logic drives construction and - destruction, and is responsible for pushing/popping cleanups in response to - AST constructs. - -These components interact along a consistent pattern: AST traversal in -``CodeGenFunction`` emits code and pushes cleanups or EH scopes; ``EHScopeStack`` -records scope nesting; cleanup and exception helpers materialize the CFG as -scopes are popped; and ``CGCXXABI`` supplies ABI-specific details for landing -pads or funclets. - -Normal Cleanups and Branch Fixups -================================= - -Normal control flow exits (``return``, ``break``, ``goto``, fallthrough, etc.) -are threaded through cleanups by creating explicit cleanup blocks. The -implementation supports unresolved branches to labels by emitting an optimistic -branch and recording a fixup. When a cleanup is popped, fixups are threaded -through the cleanup by turning that optimistic branch into a switch that -dispatches to the correct destination after the cleanup runs. - -Cleanups use a switch on an internal "cleanup destination" slot even for simple -source constructs. It is a general mechanism that allows multiple exits to share -the same cleanup code while still reaching the correct final destination. +- The cleanup and exception handling code generation is driven by the flow of + ``CodeGenFunction`` and its helper classes traversing the AST to emit IR for + C++ expressions, classes, and statements. + +AST traversal in ``CodeGenFunction`` emits code and pushes cleanups or EH scopes, +``EHScopeStack`` records scope nesting, cleanup and exception helpers materialize +the CFG as scopes are popped, and ``CGCXXABI`` supplies ABI-specific details for +landing pads or funclets. + +Cleanup Destination Routing +=========================== + +When multiple control flow exits (``return``, ``break``, ``continue``, +fallthrough) pass through the same cleanup, the generated IR shares a single +cleanup block among them. Before entering the cleanup, each exit path stores a +unique index into a "cleanup destination" slot. After the cleanup code runs, a +``switch`` instruction loads this index and dispatches to the appropriate final +destination. This avoids duplicating cleanup code for each exit while preserving +correct control flow. + +For example, if a function has both a ``return`` and a ``break`` that exit +through the same destructor cleanup, both paths branch to the shared cleanup +block after storing their respective destination indices. The cleanup epilogue +then switches on the stored index to reach either the return block or the +loop-exit block. + +When only a single exit passes through a cleanup (the common case), the switch +is unnecessary and the cleanup block branches directly to its sole destination. + +Branch Fixups for Forward Gotos +------------------------------- + +A ``goto`` statement that jumps forward to a label not yet seen poses a special +problem. The destination's enclosing cleanup scope is unknown at the point the +``goto`` is emitted. This is handled by emitting an optimistic branch and +recording a "fixup." When the cleanup scope is later popped, any recorded fixups +are resolved by rewriting the branch to thread through the cleanup block and +adding the destination to the cleanup's switch. Exceptional Cleanups and EH Dispatch ==================================== @@ -95,9 +113,10 @@ depending on the target ABI. For Itanium-style EH (such as is used on x86-64 Linux), the IR uses ``invoke`` to call potentially-throwing operations and a ``landingpad`` instruction to -capture the exception and selector values. The landing pad aggregates the -in-scope catch, filter, and cleanup clauses, then branches to a dispatch block -that compares the selector to type IDs and jumps to the appropriate handler. +capture the exception and selector values. The landing pad aggregates any +catch and cleanup clauses for the current scope, and branches to a dispatch +block that compares the selector to type IDs and jumps to the appropriate +handler. For Windows, LLVM IR uses funclet-style EH: ``catchswitch`` and ``catchpad`` for handlers, and ``cleanuppad`` for cleanups, with ``catchret`` and ``cleanupret`` @@ -107,12 +126,17 @@ are interpreted by the backend. Personality and ABI Selection ============================= -The IR generation selects a personality function based on language options and -the target ABI (e.g., Itanium, MSVC SEH, SJLJ, Wasm EH). This decision affects: +Each function with exception handling constructs is associated with a +personality function (e.g. __gxx_personality_v0 for C++ on Linux). The +personality function determines the ABI-specifc EH behavior of the +function. The IR generation selects a personality function based on language +options and the target ABI (e.g., Itanium, MSVC SEH, SJLJ, Wasm EH). This +decision affects: - Whether the IR uses landing pads or funclet pads. - The shape of dispatch logic for catch and filter scopes. - How termination or rethrow paths are modeled. +- Whether certain helper functions such as exception filters must be outlined. Because the personality choice is made during IR generation, the CFG shape directly reflects ABI-specific details. @@ -148,6 +172,9 @@ High-level behavior Codegen flow and key components ------------------------------- +- The surrounding compound statement enters a ``CodeGenFunction::LexicalScope``, + which is a ``RunCleanupsScope`` and is responsible for popping local cleanups + at the end of the block. - ``CodeGenFunction::EmitDecl`` routes the local variable to ``CodeGenFunction::EmitVarDecl`` and then ``CodeGenFunction::EmitAutoVarDecl``, which in turn calls ``EmitAutoVarAlloca``, ``EmitAutoVarInit``, and @@ -172,26 +199,9 @@ Codegen flow and key components its ``Emit`` method generates a loop that calls the destructor on the initialized range in reverse order. -Call-Graph Summary ------------------- - -.. code-block:: text - - EmitDecl - -> EmitVarDecl - -> EmitAutoVarDecl - -> EmitAutoVarAlloca - -> EmitAutoVarInit - -> EmitCXXAggrConstructorCall - -> RunCleanupsScope - -> pushRegularPartialArrayCleanup - -> EmitCXXConstructorCall (per element) - -> EmitAutoVarCleanups - -> emitAutoVarTypeCleanup - -> pushDestroy / pushFullExprCleanup - -> DestroyObject cleanup - -> destroyCXXObject - -> EmitCXXDestructorCall +The above function names and flow are accurate as of LLVM 22.0, but this is +subject to change as the code evolves, and this document might not be updated to +reflect the exact functions used. Example: Temporary object materialization ========================================= @@ -235,32 +245,7 @@ Codegen flow and key functions - The cleanup ultimately uses ``DestroyObject`` and ``CodeGenFunction::destroyCXXObject``, which emits ``CodeGenFunction::EmitCXXDestructorCall``. -- The call to ``useMyClass`` is emitted while the temporary is live, and the - cleanup scope ensures the destructor runs on both normal and EH exits. - -Call-Graph Summary ------------------- - -.. code-block:: text - - EmitExprWithCleanups - -> RunCleanupsScope - -> EmitMaterializeTemporaryExpr - -> createReferenceTemporary - -> EmitAnyExprToMem - -> EmitCXXConstructExpr - -> EmitCXXConstructorCall - -> pushTemporaryCleanup - -> pushDestroy - -> DestroyObject cleanup - -> destroyCXXObject - -> EmitCXXDestructorCall - -Notes on Variations -=================== - -The exact shape of generated LLVM IR depends on target ABI, language options, -and optimization level. For example, filters, ``noexcept`` termination scopes, -and async EH options can introduce additional dispatch blocks, personality -selection differences, or outlined helper functions. The patterns above capture -the essential structure used for EH and cleanup handling on the named targets. + +The above function names and flow are accurate as of LLVM 22.0, but this is +subject to change as the code evolves, and this document might not be updated to +reflect the exact functions used. _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
