[clang] [llvm] [KeyInstr] Add docs (PR #137991)

Orlando Cazalet-Hyams via cfe-commits Tue, 15 Jul 2025 05:27:41 -0700

https://github.com/OCHyams updated 
https://github.com/llvm/llvm-project/pull/137991


>From cb89d1f1bb60db07743f1973f9b263424fab9f6d Mon Sep 17 00:00:00 2001
From: Orlando Cazalet-Hyams <orlando.hy...@sony.com>
Date: Wed, 30 Apr 2025 15:19:03 +0100
Subject: [PATCH 1/9] [KeyInstr] Add docs

---
 clang/docs/KeyInstructionsClang.md    |  25 ++++++
 llvm/docs/KeyInstructionsDebugInfo.md | 114 ++++++++++++++++++++++++++
 2 files changed, 139 insertions(+)
 create mode 100644 clang/docs/KeyInstructionsClang.md
 create mode 100644 llvm/docs/KeyInstructionsDebugInfo.md

diff --git a/clang/docs/KeyInstructionsClang.md 
b/clang/docs/KeyInstructionsClang.md
new file mode 100644
index 0000000000000..fa9dd11033d2d
--- /dev/null
+++ b/clang/docs/KeyInstructionsClang.md
@@ -0,0 +1,25 @@
+# Key Instructions in Clang
+
+Key Instructions reduces the jumpiness of optimized code debug stepping. This 
document explains the feature and how it is implemented in LLVM. For Clang 
support please see the Clang docs.
+
+## Status
+
+In development - some details may change with little notice.
+
+Tell Clang [not] to produce Key Instructions metadata with 
`-g[no-]key-instructions`. This also sets the LLVM flag 
`-dwarf-use-key-instructions`, so it interprets Key Instructions metadata when 
producing the DWARF line table.
+
+The feature improves optimized code stepping; it's intended for the feature to 
be used with optimisations enabled. Although the feature works at O0 it is not 
recommended because in some cases the effect of editing variables may not 
always be immediately realised. (This is a quirk of the current implementation, 
rather than fundemental limitation, covered in more detail later).
+
+There is currently no plan to support CodeView.
+
+## Implementation
+
+See the [LLVM docs](../../llvm/docs/KeyInstructionsDebugInfo.md) for general 
info about the feature (and LLVM implementation details).
+
+Clang needs to annotate key instructions with the new metadata. Variable 
assignments (stores, memory intrinsics), control flow (branches and their 
conditions, some unconditional branches), and exception handling instructions 
are annotated. Calls are ignored as they're unconditionally marked `is_stmt`. 
This is achieved with a few simple constructs:
+
+Class `ApplyAtomGroup` - This is a scoped helper similar to 
`ApplyDebugLocation` that creates a new source atom group which instructions 
can be added to. It's used during CodeGen to declare that a new source atom has 
started, e.g. in `CodeGenFunction::EmitBinaryOperatorLValue`.
+
+`CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction 
*KeyInstruction, llvm::Value *Backup)` adds an instruction (and a backup 
instruction if non-null) to the current "atom group" defined with 
`ApplyAtomGroup`. The Key Instruction gets rank 1, and backup instructions get 
higher ranks (the function looks through casts, applying increasing rank as it 
goes). There are a lot of sites in Clang that need to call this (mostly stores 
and store-like instructions). FIXME?: Currently it's called at the CGBuilderTy 
callsites; it could instead make sense to always call the function inside the 
CGBuilderTy calls, with some calls opting out.
+
+`CodeGenFunction::addInstToNewSourceAtom(llvm::Instruction *KeyInstruction, 
llvm::Value *Backup)` adds an instruction (and a backup instruction if 
non-null) to a new "atom group". Currently mostly used in loop handling code.
diff --git a/llvm/docs/KeyInstructionsDebugInfo.md 
b/llvm/docs/KeyInstructionsDebugInfo.md
new file mode 100644
index 0000000000000..1b2acfb2bfb29
--- /dev/null
+++ b/llvm/docs/KeyInstructionsDebugInfo.md
@@ -0,0 +1,114 @@
+# Key Instructions debug info in LLVM
+
+Key Instructions reduces the jumpiness of optimized code debug stepping. This 
document explains the feature and how it is implemented in LLVM. For Clang 
support please see the Clang docs.
+
+## Status
+
+In development - some details may change with little notice.
+
+Tell Clang [not] to produce Key Instructions metadata with 
`-g[no-]key-instructions`. See the Clang docs for implementation info.
+
+Use LLVM flag `-dwarf-use-key-instructions` to interpret Key Instructions 
metadata when producing the DWARF line table (Clang passes the flag to LLVM). 
The behaviour of this flag may change.
+
+The feature improves optimized code stepping; it's intended for the feature to 
be used with optimisations enabled. Although the feature works at O0 it is not 
recommended because in some cases the effect of editing variables may not 
always be immediately realised. (This is a quirk of the current implementation, 
rather than fundemental limitation, covered in more detail later).
+
+There is currently no plan to support CodeView.
+
+## Problem statement
+
+A lot of the noise in stepping comes from code motion and instruction 
scheduling. Consider a long expression on a single line. It may involve 
multiple operations that optimisations move, re-order, and interleave with 
other instructions that have different line numbers.
+
+DWARF provides a helpful tool the compiler can employ to mitigate this 
jumpiness, the is_stmt flag, which indicates that an instruction is a 
recommended breakpoint location. However, LLVM's current approach to deciding 
is_stmt placement essentially reduces down to "is the associated line number 
different to the previous instruction's?".
+
+(Note: It's up to the debugger if it wants to interpret is_stmt or not, and at 
time of writing LLDB doesn't; possibly because LLVM's is_stmts convey no 
information that can't already be deduced from the rest of the line table.)
+
+## Solution overview
+
+Taking ideas from two papers [1][2] that explore the issue, especially C. 
Tice's:
+
+From the perspective of a source-level debugger user:
+
+* Source code is made up of interesting constructs; the level of granularity 
for “interesting” while stepping is typically assignments, calls, control flow. 
We’ll call these interesting constructs Atoms.
+
+* Atoms usually have one instruction that implements the functionality that a 
user can observe; once they step “off” that instruction, the atom is finalised. 
We’ll call that a Key Instruction.
+
+* Communicating where the key instructions are to the debugger (using DWARF’s 
is_stmt) avoids jumpiness introduced by scheduling non-key instructions without 
losing source attribution (because non-key instructions retain an associated 
source location, they’re just ignored for stepping).
+
+## Solution implementation
+
+1. `DILocation` has 2 new fields, `atomGroup` and `atomRank`.
+2. Clang creates `DILocations` using the new fields to communicate which 
instructions are "interesting".
+3. There’s some bookkeeping required by optimisations that duplicate control 
flow.
+4. During DWARF emission, the new metadata is collected (linear scan over 
instructions) to decide is_stmt placements.
+
+1. *The metadata* - The two new `DILocation` fields are `atomGroup` and 
`atomRank`. Both are unsigned integers. Instructions in the same function with 
the same `(atomGroup, inlinedAt)` pair are part of the same source atom. 
`atomRank` determines is_stmt preference within that group, where a lower 
number is higher precedence. Higher rank instructions act as "backup" is_stmt 
locations, providing good fallback locations if/when the primary candidate gets 
optimized away. The default values of 0 indicate the instruction isn’t 
interesting - it's not an is_stmt candidate.
+
+2. *Clang annotates key instructions* with the new metadata. Variable 
assignments (stores, memory intrinsics), control flow (branches and their 
conditions, some unconditional branches), and exception handling instructions 
are annotated. Calls are ignored as they're unconditionally marked is_stmt.
+
+3. *Throughout optimisation*, the DILocation is propagated normally. Cloned 
instructions get the original’s DILocation, the new fields get merged in 
getMergedLocation, etc. However, pass writers need to intercede in cases where 
a code path is duplicated, e.g. unrolling, jump-threading. In these cases we 
want to emit key instructions in both the original and duplicated code, so the 
duplicated must be assigned new `atomGroup` numbers, in a similar way that 
instruction operands must get remapped. There’s facilities to help this: 
`mapAtomInstance(const DebugLoc &DL, ValueToValueMapTy &VMap)` adds an entry to 
`VMap` which can later be used for remapping using 
`llvm::RemapSourceAtom(Instruction *I, ValueToValueMapTy &VM)`. 
`mapAtomInstance` is called from `llvm::CloneBasicBlock` and 
`llvm::RemapSourceAtom` is called from `llvm::RemapInstruction` so in many 
cases no additional effort is actually needed.
+
+`mapAtomInstance` ensures `LLVMContextImpl::NextAtomGroup` is kept up to date, 
which is the global “next available atom number”.
+
+The `DILocations` carry over from IR to MIR as normal, without any changes.
+
+4. *DWARF emission* - Iterate over all instructions in a function. For each 
`(atomGroup, inlinedAt)` pair we find the set of instructions sharing the 
lowest rank. Only the last of these instructions in each basic block is 
included in the set. The instructions in this set get is_stmt applied to their 
source locations. That `is_stmt` then "floats" to the top of contiguous 
sequence of instructions with the same line number in the same block. That has 
two benefits when optimisations are enabled. First, this floats `is_stmt` to 
the top of epilogue instructions (rather than applying it to the `ret` 
instruction itself) which is important to avoid losing variable location 
coverage at return statements. Second, it reduces the difference in optimized 
code stepping behaviour between when Key Instructions is enabled and disabled 
in “uninteresting” cases. I.e., it appears to generally reduce unnecessary 
changes in stepping.
+
+We’ve used contiguous line numbers rather than atom membership as the test 
there because of our choice to represent source atoms with a single integer ID. 
We can’t have instructions belonging to multiple atom groups or represent any 
kind of grouping hierarchy. That means we can’t rely on all the call setup 
instructions being in the same group currently (e.g., if one of the argument 
expressions contains key functionality such as a store, it will be in its own 
group).
+
+## Adding the feature to a front end
+
+Front ends that want to use the feature need to do some heavy lifting; they 
need to annotate Key Instructions and their backups with `DILocations` with the 
necessary `atomGroup` and `atomRank` values. Currently they also need to tell 
LLVM to interpret the metadata by passing the `-dwarf-use-key-instructions` 
flag.
+
+The prototype had LLVM annotate instructions (instead of Clang) using simple 
heuristics (just looking at kind of instructions). This doesn't exist anywhere 
upstream, but could be shared if there's interest (e.g., so another front end 
can try it out before committing to a full implementation ), feel fre to reach 
out on Discourse (@OCHyams).
+
+## Limitations
+
+### Lack of multiple atom membership
+
+Using a number to represent atom membership is limiting; currently an 
instruction cannot belong to multiple atoms. Does this come up in practice? 
Yes. Both in the front end and during optimisations. Consider this C code:
+```c
+a = b = c;
+```
+Clang generates this IR:
+```llvm
+  %0 = load i32, ptr %c.addr, align 4
+  store i32 %0, ptr %b.addr, align 4
+  store i32 %0, ptr %a.addr, align 4
+```
+The load of `c` is used by both stores (which are the Key Instructions for 
each assignment respectively). We can only use it as a backup location for one 
of the two atoms.
+
+Certain optimisations merge source locations, which presents another case 
where it might make sense to be able to represent an instruction belonging to 
multiple atoms. Currently we deterministically pick one (choosing to keep the 
lower rank one if there is one).
+
+### Disabled at O0
+
+Consider the following code without optimisations:
+```
+int c =
+    a + b;
+```
+In the current implementation an `is_stmt` won't be generated for the `a + b` 
instruction, meaning debuggers will likely step over the `add` and stop at the 
`store` of the result into `c` (which does get `is_stmt`). A user might have 
hoped to edit `a` or `b` on the previous line in order to alter the result 
stored to `c`, which they now won't have the chance to do (they'd need to edit 
the variables on a previous line instead). If the expression was all on one 
line then they would be able to edit the values before the `add`. For these 
reasons we're choosing to recommend that the feature should not be enabled at 
O0.
+
+It should be possible to fix this case if we make a few changes: add all the 
instructions in the statement (i.e., including the loads) to the atom, and 
tweak the DwarfEmission code to understand this situation (same atom, different 
line). So there is room to persue this in the future. Though that gets tricky 
in some cases due to the [other limitation mentioned 
above](#lack-of-multiple-atom-membership), e.g.:
+```c
+int e =        // atom 1
+    (a + b)    // atom 1
+  * (c = d);   // - atom 2
+```
+```llvm
+  %0 = load i32, ptr %a.addr, align 4     ; atom 1
+  %1 = load i32, ptr %b.addr, align 4     ; atom 1
+  %add = add nsw i32 %0, %1               ; atom 1
+  %2 = load i32, ptr %d.addr, align 4     ; - atom 2
+  store i32 %2, ptr %c.addr, align 4      ; - atom 2
+  %mul = mul nsw i32 %add, %2             ; atom 1
+  store i32 %mul, ptr %e, align 4         ; atom 1
+```
+Without multiple-atom-membership or some kind of atom hierarchy it's not 
apparent how to get the `is_stmt` to stick to `a + b`, given the other rules 
the `is_stmt` placement follows.
+
+O0 isn't a key use-case so solving this is not a priority for the initial 
implementation. The trade off, smoother stepping at the cost of not being able 
to edit variables to affect an expression in some cases (and at particular stop 
points), becomes more attractive when optimisations are enabled (we find that 
editing variables in the debugger in optimized code often produces unexpected 
effects, so it's not a big concern that Key Instructions makes it harder 
sometimes).
+
+---
+
+**References**
+* [1] Key Instructions: Solving the Code Location Problem for Optimized Code 
(C. Tice, . S. L. Graham, 2000)
+* [2] Debugging Optimized Code: Concepts and Implementation on DIGITAL Alpha 
Systems (R. F. Brender et al)

>From 411ebbcd66fc450618a8f9e22c1168b277598f0b Mon Sep 17 00:00:00 2001
From: Orlando Cazalet-Hyams <orlando.hy...@sony.com>
Date: Wed, 30 Apr 2025 17:39:21 +0100
Subject: [PATCH 2/9] add worked example to Clang docs

---
 clang/docs/KeyInstructionsClang.md | 115 ++++++++++++++++++++++++++---
 1 file changed, 104 insertions(+), 11 deletions(-)

diff --git a/clang/docs/KeyInstructionsClang.md 
b/clang/docs/KeyInstructionsClang.md
index fa9dd11033d2d..a37df23610e0b 100644
--- a/clang/docs/KeyInstructionsClang.md
+++ b/clang/docs/KeyInstructionsClang.md
@@ -1,16 +1,6 @@
 # Key Instructions in Clang
 
-Key Instructions reduces the jumpiness of optimized code debug stepping. This 
document explains the feature and how it is implemented in LLVM. For Clang 
support please see the Clang docs.
-
-## Status
-
-In development - some details may change with little notice.
-
-Tell Clang [not] to produce Key Instructions metadata with 
`-g[no-]key-instructions`. This also sets the LLVM flag 
`-dwarf-use-key-instructions`, so it interprets Key Instructions metadata when 
producing the DWARF line table.
-
-The feature improves optimized code stepping; it's intended for the feature to 
be used with optimisations enabled. Although the feature works at O0 it is not 
recommended because in some cases the effect of editing variables may not 
always be immediately realised. (This is a quirk of the current implementation, 
rather than fundemental limitation, covered in more detail later).
-
-There is currently no plan to support CodeView.
+Key Instructions is an LLVM feature that reduces the jumpiness of optimized 
code debug stepping. This document explains how Clang applies the necessary 
metadata.
 
 ## Implementation
 
@@ -23,3 +13,106 @@ Class `ApplyAtomGroup` - This is a scoped helper similar to 
`ApplyDebugLocation`
 `CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction 
*KeyInstruction, llvm::Value *Backup)` adds an instruction (and a backup 
instruction if non-null) to the current "atom group" defined with 
`ApplyAtomGroup`. The Key Instruction gets rank 1, and backup instructions get 
higher ranks (the function looks through casts, applying increasing rank as it 
goes). There are a lot of sites in Clang that need to call this (mostly stores 
and store-like instructions). FIXME?: Currently it's called at the CGBuilderTy 
callsites; it could instead make sense to always call the function inside the 
CGBuilderTy calls, with some calls opting out.
 
 `CodeGenFunction::addInstToNewSourceAtom(llvm::Instruction *KeyInstruction, 
llvm::Value *Backup)` adds an instruction (and a backup instruction if 
non-null) to a new "atom group". Currently mostly used in loop handling code.
+
+## Examples
+
+A simple example walk through:
+```
+void fun(int a) {
+  int b = a;
+}
+```
+
+There are two key instructions here, the assignment and the implicit return. 
We want to emit metadata that looks like this:
+
+```
+define hidden void @_Z3funi(i32 noundef %a) #0 !dbg !11 {
+entry:
+  %a.addr = alloca i32, align 4
+  %b = alloca i32, align 4
+  store i32 %a, ptr %a.addr, align 4
+  %0 = load i32, ptr %a.addr, align 4, !dbg !DILocation(line: 2, scope: !11, 
atomGroup: 1, atomRank: 2)
+  store i32 %0, ptr %b, align 4,       !dbg !DILocation(line: 2, scope: !11, 
atomGroup: 1, atomRank: 1)
+  ret void,                            !dbg !DILocation(line: 3, scope: !11, 
atomGroup: 2, atomRank: 1)
+}
+```
+
+The store is the key instruction for the assignment (`atomGroup` 1). The 
instruction corresponding to the final (and in this case only) RHS value, the 
load from `%a.addr`, is a good backup location for is_stmt if the store gets 
optimized away. It's part of the same source atom, but has lower is_stmt 
precedence, so it gets a higher `atomRank`.
+
+The atom group is set here:
+```
+>  clang::CodeGen::ApplyAtomGroup::ApplyAtomGroup(clang::CodeGen::CGDebugInfo 
* DI) Line 187
+   clang::CodeGen::CodeGenFunction::EmitAutoVarInit(const 
clang::CodeGen::CodeGenFunction::AutoVarEmission & emission) Line 1961
+   clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(const clang::VarDecl & D) 
Line 1361
+   clang::CodeGen::CodeGenFunction::EmitVarDecl(const clang::VarDecl & D) Line 
219
+   clang::CodeGen::CodeGenFunction::EmitDecl(const clang::Decl & D) Line 164
+   clang::CodeGen::CodeGenFunction::EmitDeclStmt(const clang::DeclStmt & S) 
Line 1611
+   clang::CodeGen::CodeGenFunction::EmitSimpleStmt(const clang::Stmt * S, 
llvm::ArrayRef<clang::Attr const *> Attrs) Line 466
+   clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt * S, 
llvm::ArrayRef<clang::Attr const *> Attrs) Line 72
+   clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(const 
clang::CompoundStmt & S, bool GetLast, clang::CodeGen::AggValueSlot AggSlot) 
Line 556+
+   clang::CodeGen::CodeGenFunction::EmitFunctionBody(const clang::Stmt * Body) 
Line 1307
+```
+
+And the DILocations are updated here:
+```
+>  
clang::CodeGen::CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction * 
KeyInstruction, llvm::Value * Backup, unsigned char KeyInstRank) Line 2551
+   clang::CodeGen::CodeGenFunction::EmitStoreOfScalar(llvm::Value * Value, 
clang::CodeGen::Address Addr, bool Volatile, clang::QualType Ty, 
clang::CodeGen::LValueBaseInfo BaseInfo, clang::CodeGen::TBAAAccessInfo 
TBAAInfo, bool isInit, bool isNontemporal) Line 2133
+   clang::CodeGen::CodeGenFunction::EmitStoreOfScalar(llvm::Value * value, 
clang::CodeGen::LValue lvalue, bool isInit) Line 2152
+   
clang::CodeGen::CodeGenFunction::EmitStoreThroughLValue(clang::CodeGen::RValue 
Src, clang::CodeGen::LValue Dst, bool isInit) Line 2478
+   clang::CodeGen::CodeGenFunction::EmitScalarInit(const clang::Expr * init, 
const clang::ValueDecl * D, clang::CodeGen::LValue lvalue, bool capturedByInit) 
Line 805
+   clang::CodeGen::CodeGenFunction::EmitExprAsInit(const clang::Expr * init, 
const clang::ValueDecl * D, clang::CodeGen::LValue lvalue, bool capturedByInit) 
Line 2088
+   clang::CodeGen::CodeGenFunction::EmitAutoVarInit(const 
clang::CodeGen::CodeGenFunction::AutoVarEmission & emission) Line 2050
+   clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(const clang::VarDecl & D) 
Line 1361
+   clang::CodeGen::CodeGenFunction::EmitVarDecl(const clang::VarDecl & D) Line 
219
+   clang::CodeGen::CodeGenFunction::EmitDecl(const clang::Decl & D) Line 164
+   clang::CodeGen::CodeGenFunction::EmitDeclStmt(const clang::DeclStmt & S) 
Line 1611
+   clang::CodeGen::CodeGenFunction::EmitSimpleStmt(const clang::Stmt * S, 
llvm::ArrayRef<clang::Attr const *> Attrs) Line 466
+   clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt * S, 
llvm::ArrayRef<clang::Attr const *> Attrs) Line 72
+   clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(const 
clang::CompoundStmt & S, bool GetLast, clang::CodeGen::AggValueSlot AggSlot) 
Line 556
+   clang::CodeGen::CodeGenFunction::EmitFunctionBody(const clang::Stmt * Body) 
Line 1307
+
+```
+
+The implicit return is also key (`atomGroup` 2) so that it's stepped on, to 
match existing non-key-instructions behaviour.
+
+```
+>  
clang::CodeGen::CodeGenFunction::addRetToOverrideOrNewSourceAtom(llvm::ReturnInst
 * Ret, llvm::Value * Backup, unsigned char KeyInstRank) Line 2567
+   clang::CodeGen::CodeGenFunction::EmitFunctionEpilog(const 
clang::CodeGen::CGFunctionInfo & FI, bool EmitRetDbgLoc, clang::SourceLocation 
EndLoc) Line 3839
+   clang::CodeGen::CodeGenFunction::FinishFunction(clang::SourceLocation 
EndLoc) Line 433
+```
+
+`addRetToOverrideOrNewSourceAtom` is a special function used for handling 
`ret`s. In this case it simply replaces the DILocation with the `atomGroup` and 
`atomRank` set, adding it to its own atom.
+
+To demonstrate why `ret`s need special handling, we need to look at a more 
"complex" example, below.
+
+```
+int fun(int a) {
+  return a;
+}
+```
+
+Rather than emit a `ret` for each `return` Clang, in all but the simplest 
cases (as in the first example) emits a branch to a dedicated block with a 
single `ret`. That branch is the key instruction for the return statement. If 
there's only one branch to that block, because there's only one `return` (as in 
this example), Clang folds the block into its only predecessor. We need to do 
some accounting to transfer the `atomGroup` number to the `ret` when that 
happens:
+
+When we hit the special-casing code that knows we've only got one block (the 
IR looks like this):
+```
+entry:
+  %a.addr = alloca i32, align 4
+  %allocapt = bitcast i32 undef to i32
+  store i32 %a, ptr %a.addr, align 4
+  br label %return, !dbg !6
+```
+
+...remember the branch-to-return-block's `atomGroup`:
+
+```
+>  clang::CodeGen::CGDebugInfo::setRetInstSourceAtomOverride(unsigned __int64 
Group) Line 168
+   clang::CodeGen::CodeGenFunction::EmitReturnBlock() Line 332
+   clang::CodeGen::CodeGenFunction::FinishFunction(clang::SourceLocation 
EndLoc) Line 415
+```
+
+And apply it to the `ret` when it's added:
+```
+>  
clang::CodeGen::CodeGenFunction::addRetToOverrideOrNewSourceAtom(llvm::ReturnInst
 * Ret, llvm::Value * Backup, unsigned char KeyInstRank) Line 2567
+   clang::CodeGen::CodeGenFunction::EmitFunctionEpilog(const 
clang::CodeGen::CGFunctionInfo & FI, bool EmitRetDbgLoc, clang::SourceLocation 
EndLoc) Line 3839
+   clang::CodeGen::CodeGenFunction::FinishFunction(clang::SourceLocation 
EndLoc) Line 433
+```

>From f6b47ab20f7d7e9a49529a8738f7cb03bc004004 Mon Sep 17 00:00:00 2001
From: Orlando Cazalet-Hyams <orlando.hy...@sony.com>
Date: Wed, 30 Apr 2025 17:44:33 +0100
Subject: [PATCH 3/9] self nits

---
 clang/docs/KeyInstructionsClang.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/clang/docs/KeyInstructionsClang.md 
b/clang/docs/KeyInstructionsClang.md
index a37df23610e0b..441fab41a60ee 100644
--- a/clang/docs/KeyInstructionsClang.md
+++ b/clang/docs/KeyInstructionsClang.md
@@ -14,6 +14,8 @@ Class `ApplyAtomGroup` - This is a scoped helper similar to 
`ApplyDebugLocation`
 
 `CodeGenFunction::addInstToNewSourceAtom(llvm::Instruction *KeyInstruction, 
llvm::Value *Backup)` adds an instruction (and a backup instruction if 
non-null) to a new "atom group". Currently mostly used in loop handling code.
 
+There are a couple of other helpers, including 
`addRetToOverrideOrNewSourceAtom` used for `rets` which is covered in the 
examples below.
+
 ## Examples
 
 A simple example walk through:
@@ -37,9 +39,9 @@ entry:
 }
 ```
 
-The store is the key instruction for the assignment (`atomGroup` 1). The 
instruction corresponding to the final (and in this case only) RHS value, the 
load from `%a.addr`, is a good backup location for is_stmt if the store gets 
optimized away. It's part of the same source atom, but has lower is_stmt 
precedence, so it gets a higher `atomRank`.
+The store is the key instruction for the assignment (`atomGroup` 1). The 
instruction corresponding to the final (and in this case only) RHS value, the 
load from `%a.addr`, is a good backup location for `is_stmt` if the store gets 
optimized away. It's part of the same source atom, but has lower `is_stmt` 
precedence, so it gets a higher `atomRank`.
 
-The atom group is set here:
+This is all handled during CodeGen. The atom group is set here:
 ```
 >  clang::CodeGen::ApplyAtomGroup::ApplyAtomGroup(clang::CodeGen::CGDebugInfo 
 > * DI) Line 187
    clang::CodeGen::CodeGenFunction::EmitAutoVarInit(const 
clang::CodeGen::CodeGenFunction::AutoVarEmission & emission) Line 1961

>From 057680ad861171dd4cf6a778b27971098f33dd55 Mon Sep 17 00:00:00 2001
From: Orlando Cazalet-Hyams <orlando.hy...@sony.com>
Date: Mon, 16 Jun 2025 09:02:07 +0100
Subject: [PATCH 4/9] update docs

---
 llvm/docs/KeyInstructionsDebugInfo.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/llvm/docs/KeyInstructionsDebugInfo.md 
b/llvm/docs/KeyInstructionsDebugInfo.md
index 1b2acfb2bfb29..c6abcac6b7572 100644
--- a/llvm/docs/KeyInstructionsDebugInfo.md
+++ b/llvm/docs/KeyInstructionsDebugInfo.md
@@ -8,12 +8,12 @@ In development - some details may change with little notice.
 
 Tell Clang [not] to produce Key Instructions metadata with 
`-g[no-]key-instructions`. See the Clang docs for implementation info.
 
-Use LLVM flag `-dwarf-use-key-instructions` to interpret Key Instructions 
metadata when producing the DWARF line table (Clang passes the flag to LLVM). 
The behaviour of this flag may change.
-
 The feature improves optimized code stepping; it's intended for the feature to 
be used with optimisations enabled. Although the feature works at O0 it is not 
recommended because in some cases the effect of editing variables may not 
always be immediately realised. (This is a quirk of the current implementation, 
rather than fundemental limitation, covered in more detail later).
 
 There is currently no plan to support CodeView.
 
+Set LLVM flag `-dwarf-use-key-instructions` to `false` to ignore Key 
Instructions metadata when emitting DWARF.
+
 ## Problem statement
 
 A lot of the noise in stepping comes from code motion and instruction 
scheduling. Consider a long expression on a single line. It may involve 
multiple operations that optimisations move, re-order, and interleave with 
other instructions that have different line numbers.
@@ -36,12 +36,12 @@ From the perspective of a source-level debugger user:
 
 ## Solution implementation
 
-1. `DILocation` has 2 new fields, `atomGroup` and `atomRank`.
-2. Clang creates `DILocations` using the new fields to communicate which 
instructions are "interesting".
+1. `DILocation` has 2 new fields, `atomGroup` and `atomRank`. `DISubprogram` 
has a new field `keyInstructions`.
+2. Clang creates `DILocations` using the new fields to communicate which 
instructions are "interesting", and sets `keyInstructions` true in 
`DISubprogram`s to tell LLVM to interpret the new metadata in those functions.
 3. There’s some bookkeeping required by optimisations that duplicate control 
flow.
 4. During DWARF emission, the new metadata is collected (linear scan over 
instructions) to decide is_stmt placements.
 
-1. *The metadata* - The two new `DILocation` fields are `atomGroup` and 
`atomRank`. Both are unsigned integers. Instructions in the same function with 
the same `(atomGroup, inlinedAt)` pair are part of the same source atom. 
`atomRank` determines is_stmt preference within that group, where a lower 
number is higher precedence. Higher rank instructions act as "backup" is_stmt 
locations, providing good fallback locations if/when the primary candidate gets 
optimized away. The default values of 0 indicate the instruction isn’t 
interesting - it's not an is_stmt candidate.
+1. *The metadata* - The two new `DILocation` fields are `atomGroup` and 
`atomRank`. Both are unsigned integers. Instructions in the same function with 
the same `(atomGroup, inlinedAt)` pair are part of the same source atom. 
`atomRank` determines is_stmt preference within that group, where a lower 
number is higher precedence. Higher rank instructions act as "backup" is_stmt 
locations, providing good fallback locations if/when the primary candidate gets 
optimized away. The default values of 0 indicate the instruction isn’t 
interesting - it's not an is_stmt candidate. If `keyInstructions` in 
`DISubprogram` is false (default) then the new `DILocation` metadata is ignored 
for the function (including inlined instances) when emitting DWARF.
 
 2. *Clang annotates key instructions* with the new metadata. Variable 
assignments (stores, memory intrinsics), control flow (branches and their 
conditions, some unconditional branches), and exception handling instructions 
are annotated. Calls are ignored as they're unconditionally marked is_stmt.
 
@@ -57,7 +57,7 @@ We’ve used contiguous line numbers rather than atom 
membership as the test the
 
 ## Adding the feature to a front end
 
-Front ends that want to use the feature need to do some heavy lifting; they 
need to annotate Key Instructions and their backups with `DILocations` with the 
necessary `atomGroup` and `atomRank` values. Currently they also need to tell 
LLVM to interpret the metadata by passing the `-dwarf-use-key-instructions` 
flag.
+Front ends that want to use the feature need to do some heavy lifting; they 
need to annotate Key Instructions and their backups with `DILocations` with the 
necessary `atomGroup` and `atomRank` values.  It also needs to set 
`keyInstructions` true in `DISubprogram`s to tell LLVM to interpret the new 
metadata in those functions.
 
 The prototype had LLVM annotate instructions (instead of Clang) using simple 
heuristics (just looking at kind of instructions). This doesn't exist anywhere 
upstream, but could be shared if there's interest (e.g., so another front end 
can try it out before committing to a full implementation ), feel fre to reach 
out on Discourse (@OCHyams).
 

>From 7ce6640ce27d0aa154782385f767300791809d3b Mon Sep 17 00:00:00 2001
From: Orlando Cazalet-Hyams <orlando.hy...@sony.com>
Date: Fri, 11 Jul 2025 17:35:24 +0100
Subject: [PATCH 5/9] tidy up, address review comments

---
 clang/docs/KeyInstructionsClang.md    | 80 +--------------------------
 llvm/docs/KeyInstructionsDebugInfo.md | 22 ++++----
 llvm/docs/UserGuides.rst              |  9 +++
 3 files changed, 23 insertions(+), 88 deletions(-)

diff --git a/clang/docs/KeyInstructionsClang.md 
b/clang/docs/KeyInstructionsClang.md
index 441fab41a60ee..9e7ef04328238 100644
--- a/clang/docs/KeyInstructionsClang.md
+++ b/clang/docs/KeyInstructionsClang.md
@@ -14,7 +14,7 @@ Class `ApplyAtomGroup` - This is a scoped helper similar to 
`ApplyDebugLocation`
 
 `CodeGenFunction::addInstToNewSourceAtom(llvm::Instruction *KeyInstruction, 
llvm::Value *Backup)` adds an instruction (and a backup instruction if 
non-null) to a new "atom group". Currently mostly used in loop handling code.
 
-There are a couple of other helpers, including 
`addRetToOverrideOrNewSourceAtom` used for `rets` which is covered in the 
examples below.
+There are a couple of other helpers, including `addInstToSpecificSourceAtom` 
used for `rets` which is covered in the examples below.
 
 ## Examples
 
@@ -41,80 +41,6 @@ entry:
 
 The store is the key instruction for the assignment (`atomGroup` 1). The 
instruction corresponding to the final (and in this case only) RHS value, the 
load from `%a.addr`, is a good backup location for `is_stmt` if the store gets 
optimized away. It's part of the same source atom, but has lower `is_stmt` 
precedence, so it gets a higher `atomRank`.
 
-This is all handled during CodeGen. The atom group is set here:
-```
->  clang::CodeGen::ApplyAtomGroup::ApplyAtomGroup(clang::CodeGen::CGDebugInfo 
* DI) Line 187
-   clang::CodeGen::CodeGenFunction::EmitAutoVarInit(const 
clang::CodeGen::CodeGenFunction::AutoVarEmission & emission) Line 1961
-   clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(const clang::VarDecl & D) 
Line 1361
-   clang::CodeGen::CodeGenFunction::EmitVarDecl(const clang::VarDecl & D) Line 
219
-   clang::CodeGen::CodeGenFunction::EmitDecl(const clang::Decl & D) Line 164
-   clang::CodeGen::CodeGenFunction::EmitDeclStmt(const clang::DeclStmt & S) 
Line 1611
-   clang::CodeGen::CodeGenFunction::EmitSimpleStmt(const clang::Stmt * S, 
llvm::ArrayRef<clang::Attr const *> Attrs) Line 466
-   clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt * S, 
llvm::ArrayRef<clang::Attr const *> Attrs) Line 72
-   clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(const 
clang::CompoundStmt & S, bool GetLast, clang::CodeGen::AggValueSlot AggSlot) 
Line 556+
-   clang::CodeGen::CodeGenFunction::EmitFunctionBody(const clang::Stmt * Body) 
Line 1307
-```
-
-And the DILocations are updated here:
-```
->  
clang::CodeGen::CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction * 
KeyInstruction, llvm::Value * Backup, unsigned char KeyInstRank) Line 2551
-   clang::CodeGen::CodeGenFunction::EmitStoreOfScalar(llvm::Value * Value, 
clang::CodeGen::Address Addr, bool Volatile, clang::QualType Ty, 
clang::CodeGen::LValueBaseInfo BaseInfo, clang::CodeGen::TBAAAccessInfo 
TBAAInfo, bool isInit, bool isNontemporal) Line 2133
-   clang::CodeGen::CodeGenFunction::EmitStoreOfScalar(llvm::Value * value, 
clang::CodeGen::LValue lvalue, bool isInit) Line 2152
-   
clang::CodeGen::CodeGenFunction::EmitStoreThroughLValue(clang::CodeGen::RValue 
Src, clang::CodeGen::LValue Dst, bool isInit) Line 2478
-   clang::CodeGen::CodeGenFunction::EmitScalarInit(const clang::Expr * init, 
const clang::ValueDecl * D, clang::CodeGen::LValue lvalue, bool capturedByInit) 
Line 805
-   clang::CodeGen::CodeGenFunction::EmitExprAsInit(const clang::Expr * init, 
const clang::ValueDecl * D, clang::CodeGen::LValue lvalue, bool capturedByInit) 
Line 2088
-   clang::CodeGen::CodeGenFunction::EmitAutoVarInit(const 
clang::CodeGen::CodeGenFunction::AutoVarEmission & emission) Line 2050
-   clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(const clang::VarDecl & D) 
Line 1361
-   clang::CodeGen::CodeGenFunction::EmitVarDecl(const clang::VarDecl & D) Line 
219
-   clang::CodeGen::CodeGenFunction::EmitDecl(const clang::Decl & D) Line 164
-   clang::CodeGen::CodeGenFunction::EmitDeclStmt(const clang::DeclStmt & S) 
Line 1611
-   clang::CodeGen::CodeGenFunction::EmitSimpleStmt(const clang::Stmt * S, 
llvm::ArrayRef<clang::Attr const *> Attrs) Line 466
-   clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt * S, 
llvm::ArrayRef<clang::Attr const *> Attrs) Line 72
-   clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(const 
clang::CompoundStmt & S, bool GetLast, clang::CodeGen::AggValueSlot AggSlot) 
Line 556
-   clang::CodeGen::CodeGenFunction::EmitFunctionBody(const clang::Stmt * Body) 
Line 1307
+The implicit return is also key (`atomGroup` 2) so that it's stepped on, to 
match existing non-key-instructions behaviour. This is achieved by calling  
`addInstToNewSourceAtom` from within `EmitFunctionEpilog`.
 
-```
-
-The implicit return is also key (`atomGroup` 2) so that it's stepped on, to 
match existing non-key-instructions behaviour.
-
-```
->  
clang::CodeGen::CodeGenFunction::addRetToOverrideOrNewSourceAtom(llvm::ReturnInst
 * Ret, llvm::Value * Backup, unsigned char KeyInstRank) Line 2567
-   clang::CodeGen::CodeGenFunction::EmitFunctionEpilog(const 
clang::CodeGen::CGFunctionInfo & FI, bool EmitRetDbgLoc, clang::SourceLocation 
EndLoc) Line 3839
-   clang::CodeGen::CodeGenFunction::FinishFunction(clang::SourceLocation 
EndLoc) Line 433
-```
-
-`addRetToOverrideOrNewSourceAtom` is a special function used for handling 
`ret`s. In this case it simply replaces the DILocation with the `atomGroup` and 
`atomRank` set, adding it to its own atom.
-
-To demonstrate why `ret`s need special handling, we need to look at a more 
"complex" example, below.
-
-```
-int fun(int a) {
-  return a;
-}
-```
-
-Rather than emit a `ret` for each `return` Clang, in all but the simplest 
cases (as in the first example) emits a branch to a dedicated block with a 
single `ret`. That branch is the key instruction for the return statement. If 
there's only one branch to that block, because there's only one `return` (as in 
this example), Clang folds the block into its only predecessor. We need to do 
some accounting to transfer the `atomGroup` number to the `ret` when that 
happens:
-
-When we hit the special-casing code that knows we've only got one block (the 
IR looks like this):
-```
-entry:
-  %a.addr = alloca i32, align 4
-  %allocapt = bitcast i32 undef to i32
-  store i32 %a, ptr %a.addr, align 4
-  br label %return, !dbg !6
-```
-
-...remember the branch-to-return-block's `atomGroup`:
-
-```
->  clang::CodeGen::CGDebugInfo::setRetInstSourceAtomOverride(unsigned __int64 
Group) Line 168
-   clang::CodeGen::CodeGenFunction::EmitReturnBlock() Line 332
-   clang::CodeGen::CodeGenFunction::FinishFunction(clang::SourceLocation 
EndLoc) Line 415
-```
-
-And apply it to the `ret` when it's added:
-```
->  
clang::CodeGen::CodeGenFunction::addRetToOverrideOrNewSourceAtom(llvm::ReturnInst
 * Ret, llvm::Value * Backup, unsigned char KeyInstRank) Line 2567
-   clang::CodeGen::CodeGenFunction::EmitFunctionEpilog(const 
clang::CodeGen::CGFunctionInfo & FI, bool EmitRetDbgLoc, clang::SourceLocation 
EndLoc) Line 3839
-   clang::CodeGen::CodeGenFunction::FinishFunction(clang::SourceLocation 
EndLoc) Line 433
-```
+Explicit return statements are handled uniquely. Rather than emit a `ret` for 
each `return` Clang, in all but the simplest cases (as in the first example) 
emits a branch to a dedicated block with a single `ret`. That branch is the key 
instruction for the return statement. If there's only one branch to that block, 
because there's only one `return` (as in this example), Clang folds the block 
into its only predecessor. Handily `EmitReturnBlock` returns the `DebugLoc` 
associated with the single branch in that case, which is fed into 
`addInstToSpecificSourceAtom` to ensure the `ret` gets the right group.
diff --git a/llvm/docs/KeyInstructionsDebugInfo.md 
b/llvm/docs/KeyInstructionsDebugInfo.md
index c6abcac6b7572..7d33f02174c6c 100644
--- a/llvm/docs/KeyInstructionsDebugInfo.md
+++ b/llvm/docs/KeyInstructionsDebugInfo.md
@@ -1,16 +1,16 @@
 # Key Instructions debug info in LLVM
 
-Key Instructions reduces the jumpiness of optimized code debug stepping. This 
document explains the feature and how it is implemented in LLVM. For Clang 
support please see the Clang docs.
+Key Instructions reduces the jumpiness of optimized code debug stepping. This 
document explains the feature and how it is implemented in LLVM. For Clang 
support please see the [Clang docs](../../clang/docs/KeyInstructionsClang.md)
 
 ## Status
 
-In development - some details may change with little notice.
+In development, but mostly complete. The feature is currently disabled for 
coroutines.
 
 Tell Clang [not] to produce Key Instructions metadata with 
`-g[no-]key-instructions`. See the Clang docs for implementation info.
 
 The feature improves optimized code stepping; it's intended for the feature to 
be used with optimisations enabled. Although the feature works at O0 it is not 
recommended because in some cases the effect of editing variables may not 
always be immediately realised. (This is a quirk of the current implementation, 
rather than fundemental limitation, covered in more detail later).
 
-There is currently no plan to support CodeView.
+This is a DWARF-based feature. There is currently no plan to support CodeView.
 
 Set LLVM flag `-dwarf-use-key-instructions` to `false` to ignore Key 
Instructions metadata when emitting DWARF.
 
@@ -18,9 +18,9 @@ Set LLVM flag `-dwarf-use-key-instructions` to `false` to 
ignore Key Instruction
 
 A lot of the noise in stepping comes from code motion and instruction 
scheduling. Consider a long expression on a single line. It may involve 
multiple operations that optimisations move, re-order, and interleave with 
other instructions that have different line numbers.
 
-DWARF provides a helpful tool the compiler can employ to mitigate this 
jumpiness, the is_stmt flag, which indicates that an instruction is a 
recommended breakpoint location. However, LLVM's current approach to deciding 
is_stmt placement essentially reduces down to "is the associated line number 
different to the previous instruction's?".
+DWARF provides a helpful tool the compiler can employ to mitigate this 
jumpiness, the `is_stmt` flag, which indicates that an instruction is a 
recommended breakpoint location. However, LLVM's current approach to deciding 
`is_stmt` placement essentially reduces down to "is the associated line number 
different to the previous instruction's?".
 
-(Note: It's up to the debugger if it wants to interpret is_stmt or not, and at 
time of writing LLDB doesn't; possibly because LLVM's is_stmts convey no 
information that can't already be deduced from the rest of the line table.)
+(Note: It's up to the debugger if it wants to interpret `is_stmt` or not, and 
at time of writing LLDB doesn't; possibly because until now LLVM's is_stmts 
convey no information that can't already be deduced from the rest of the line 
table.)
 
 ## Solution overview
 
@@ -39,27 +39,27 @@ From the perspective of a source-level debugger user:
 1. `DILocation` has 2 new fields, `atomGroup` and `atomRank`. `DISubprogram` 
has a new field `keyInstructions`.
 2. Clang creates `DILocations` using the new fields to communicate which 
instructions are "interesting", and sets `keyInstructions` true in 
`DISubprogram`s to tell LLVM to interpret the new metadata in those functions.
 3. There’s some bookkeeping required by optimisations that duplicate control 
flow.
-4. During DWARF emission, the new metadata is collected (linear scan over 
instructions) to decide is_stmt placements.
+4. During DWARF emission, the new metadata is collected (linear scan over 
instructions) to decide `is_stmt` placements.
 
-1. *The metadata* - The two new `DILocation` fields are `atomGroup` and 
`atomRank`. Both are unsigned integers. Instructions in the same function with 
the same `(atomGroup, inlinedAt)` pair are part of the same source atom. 
`atomRank` determines is_stmt preference within that group, where a lower 
number is higher precedence. Higher rank instructions act as "backup" is_stmt 
locations, providing good fallback locations if/when the primary candidate gets 
optimized away. The default values of 0 indicate the instruction isn’t 
interesting - it's not an is_stmt candidate. If `keyInstructions` in 
`DISubprogram` is false (default) then the new `DILocation` metadata is ignored 
for the function (including inlined instances) when emitting DWARF.
+1. *The metadata* - The two new `DILocation` fields are `atomGroup` and 
`atomRank` and are both are unsigned integers. `atomGroup` is 61 bits and 
`atomRank` 3 bits. Instructions in the same function with the same `(atomGroup, 
inlinedAt)` pair are part of the same source atom. `atomRank` determines 
`is_stmt` preference within that group, where a lower number is higher 
precedence. Higher rank instructions act as "backup" `is_stmt` locations, 
providing good fallback locations if/when the primary candidate gets optimized 
away. The default values of 0 indicate the instruction isn’t interesting - it's 
not an `is_stmt` candidate. If `keyInstructions` in `DISubprogram` is false 
(default) then the new `DILocation` metadata is ignored for the function 
(including inlined instances) when emitting DWARF.
 
 2. *Clang annotates key instructions* with the new metadata. Variable 
assignments (stores, memory intrinsics), control flow (branches and their 
conditions, some unconditional branches), and exception handling instructions 
are annotated. Calls are ignored as they're unconditionally marked is_stmt.
 
-3. *Throughout optimisation*, the DILocation is propagated normally. Cloned 
instructions get the original’s DILocation, the new fields get merged in 
getMergedLocation, etc. However, pass writers need to intercede in cases where 
a code path is duplicated, e.g. unrolling, jump-threading. In these cases we 
want to emit key instructions in both the original and duplicated code, so the 
duplicated must be assigned new `atomGroup` numbers, in a similar way that 
instruction operands must get remapped. There’s facilities to help this: 
`mapAtomInstance(const DebugLoc &DL, ValueToValueMapTy &VMap)` adds an entry to 
`VMap` which can later be used for remapping using 
`llvm::RemapSourceAtom(Instruction *I, ValueToValueMapTy &VM)`. 
`mapAtomInstance` is called from `llvm::CloneBasicBlock` and 
`llvm::RemapSourceAtom` is called from `llvm::RemapInstruction` so in many 
cases no additional effort is actually needed.
+3. *Throughout optimisation*, the `DILocation` is propagated normally. Cloned 
instructions get the original’s `DILocation`, the new fields get merged in 
`getMergedLocation`, etc. However, pass writers need to intercede in cases 
where a code path is duplicated, e.g. unrolling, jump-threading. In these cases 
we want to emit key instructions in both the original and duplicated code, so 
the duplicated must be assigned new `atomGroup` numbers, in a similar way that 
instruction operands must get remapped. There are facilities to help this: 
`mapAtomInstance(const DebugLoc &DL, ValueToValueMapTy &VMap)` adds an entry to 
`VMap` which can later be used for remapping using 
`llvm::RemapSourceAtom(Instruction *I, ValueToValueMapTy &VM)`. 
`mapAtomInstance` is called from `llvm::CloneBasicBlock` and 
`llvm::RemapSourceAtom` is called from `llvm::RemapInstruction` so in many 
cases no additional work is actually needed.
 
 `mapAtomInstance` ensures `LLVMContextImpl::NextAtomGroup` is kept up to date, 
which is the global “next available atom number”.
 
 The `DILocations` carry over from IR to MIR as normal, without any changes.
 
-4. *DWARF emission* - Iterate over all instructions in a function. For each 
`(atomGroup, inlinedAt)` pair we find the set of instructions sharing the 
lowest rank. Only the last of these instructions in each basic block is 
included in the set. The instructions in this set get is_stmt applied to their 
source locations. That `is_stmt` then "floats" to the top of contiguous 
sequence of instructions with the same line number in the same block. That has 
two benefits when optimisations are enabled. First, this floats `is_stmt` to 
the top of epilogue instructions (rather than applying it to the `ret` 
instruction itself) which is important to avoid losing variable location 
coverage at return statements. Second, it reduces the difference in optimized 
code stepping behaviour between when Key Instructions is enabled and disabled 
in “uninteresting” cases. I.e., it appears to generally reduce unnecessary 
changes in stepping.
+4. *DWARF emission* - Iterate over all instructions in a function. For each 
`(atomGroup, inlinedAt)` pair we find the set of instructions sharing the 
lowest rank. Only the last of these instructions in each basic block is 
included in the set. The instructions in this set get `is_stmt` applied to 
their source locations. That `is_stmt` then "floats" to the top of contiguous 
sequence of instructions with the same line number in the same basic block. 
That has two benefits when optimisations are enabled. First, this floats 
`is_stmt` to the top of epilogue instructions (rather than applying it to the 
`ret` instruction itself) which is important to avoid losing variable location 
coverage at return statements. Second, it reduces the difference in optimized 
code stepping behaviour between when Key Instructions is enabled and disabled 
in “uninteresting” cases. I.e., it appears to generally reduce unnecessary 
changes in stepping.
 
 We’ve used contiguous line numbers rather than atom membership as the test 
there because of our choice to represent source atoms with a single integer ID. 
We can’t have instructions belonging to multiple atom groups or represent any 
kind of grouping hierarchy. That means we can’t rely on all the call setup 
instructions being in the same group currently (e.g., if one of the argument 
expressions contains key functionality such as a store, it will be in its own 
group).
 
 ## Adding the feature to a front end
 
-Front ends that want to use the feature need to do some heavy lifting; they 
need to annotate Key Instructions and their backups with `DILocations` with the 
necessary `atomGroup` and `atomRank` values.  It also needs to set 
`keyInstructions` true in `DISubprogram`s to tell LLVM to interpret the new 
metadata in those functions.
+Front ends that want to use the feature need to do some heavy lifting; they 
need to annotate Key Instructions and their backups with `DILocations` with the 
necessary `atomGroup` and `atomRank` values. They also need to set 
`keyInstructions` true in `DISubprogram`s to tell LLVM to interpret the new 
metadata in those functions.
 
-The prototype had LLVM annotate instructions (instead of Clang) using simple 
heuristics (just looking at kind of instructions). This doesn't exist anywhere 
upstream, but could be shared if there's interest (e.g., so another front end 
can try it out before committing to a full implementation ), feel fre to reach 
out on Discourse (@OCHyams).
+The prototype had LLVM annotate instructions (instead of Clang) using simple 
heuristics (just looking at kind of instructions). This doesn't exist anywhere 
upstream, but could be shared if there's interest (e.g., so another front end 
can try it out before committing to a full implementation), feel fre to reach 
out on Discourse (@OCHyams, @jmorse).
 
 ## Limitations
 
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
index 6eee564713d6d..9d69ca08d8f9a 100644
--- a/llvm/docs/UserGuides.rst
+++ b/llvm/docs/UserGuides.rst
@@ -63,6 +63,7 @@ intermediate LLVM representation.
    ResponseGuide
    Remarks
    RemoveDIsDebugInfo
+   KeyInstructions
    RISCVUsage
    RISCV/RISCVVectorExtension
    SourceLevelDebugging
@@ -102,6 +103,10 @@ Clang
 :doc:`CFIVerify`
   A description of the verification tool for Control Flow Integrity.
 
+:doc: `KeyInstructionsClang`
+   This document explains how the debug info feature Key Instructions is
+   implemented in Clang.
+
 LLVM Builds and Distributions
 -----------------------------
 
@@ -187,6 +192,10 @@ Optimizations
    This is a migration guide describing how to move from debug info using
    intrinsics such as dbg.value to using the non-instruction DbgRecord object.
 
+:doc: `KeyInstructionsDebugInfo`
+   This document explains how the debug info feature Key Instructions is
+   implemented in LLVM.
+
 :doc:`InstrProfileFormat`
    This document explains two binary formats of instrumentation-based profiles.
 

>From 54be192a31c91ca85ccd05ab058fa9aac8156261 Mon Sep 17 00:00:00 2001
From: Orlando Cazalet-Hyams <orlando.hy...@sony.com>
Date: Tue, 15 Jul 2025 11:06:13 +0100
Subject: [PATCH 6/9] style and spelling nits etc

---
 clang/docs/KeyInstructionsClang.md    |  8 ++++----
 llvm/docs/KeyInstructionsDebugInfo.md | 20 ++++++++++----------
 llvm/docs/UserGuides.rst              |  7 ++++---
 3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/clang/docs/KeyInstructionsClang.md 
b/clang/docs/KeyInstructionsClang.md
index 9e7ef04328238..23600b23eed32 100644
--- a/clang/docs/KeyInstructionsClang.md
+++ b/clang/docs/KeyInstructionsClang.md
@@ -1,6 +1,6 @@
 # Key Instructions in Clang
 
-Key Instructions is an LLVM feature that reduces the jumpiness of optimized 
code debug stepping. This document explains how Clang applies the necessary 
metadata.
+Key Instructions is an LLVM feature that reduces the jumpiness of optimized 
code debug stepping by discriminating the significance of instrucions that make 
up source language statements. This document explains how Clang applies the 
necessary metadata.
 
 ## Implementation
 
@@ -10,11 +10,11 @@ Clang needs to annotate key instructions with the new 
metadata. Variable assignm
 
 Class `ApplyAtomGroup` - This is a scoped helper similar to 
`ApplyDebugLocation` that creates a new source atom group which instructions 
can be added to. It's used during CodeGen to declare that a new source atom has 
started, e.g. in `CodeGenFunction::EmitBinaryOperatorLValue`.
 
-`CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction 
*KeyInstruction, llvm::Value *Backup)` adds an instruction (and a backup 
instruction if non-null) to the current "atom group" defined with 
`ApplyAtomGroup`. The Key Instruction gets rank 1, and backup instructions get 
higher ranks (the function looks through casts, applying increasing rank as it 
goes). There are a lot of sites in Clang that need to call this (mostly stores 
and store-like instructions). FIXME?: Currently it's called at the CGBuilderTy 
callsites; it could instead make sense to always call the function inside the 
CGBuilderTy calls, with some calls opting out.
+`CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction 
*KeyInstruction, llvm::Value *Backup)` adds an instruction (and a backup 
instruction if non-null) to the current "atom group" defined with 
`ApplyAtomGroup`. The Key Instruction gets rank 1, and backup instructions get 
higher ranks (the function looks through casts, applying increasing rank as it 
goes). There are a lot of sites in Clang that need to call this (mostly stores 
and store-like instructions). Most stores created through `CGBuilderTy` are 
annotated, but some that don't need to be key are not. It's important to 
remember that if there's no active atom group, i.e. no active `ApplyAtomGroup` 
instance, then `addInstToCurrentSourceAtom` does not annotate the instructions.
 
 `CodeGenFunction::addInstToNewSourceAtom(llvm::Instruction *KeyInstruction, 
llvm::Value *Backup)` adds an instruction (and a backup instruction if 
non-null) to a new "atom group". Currently mostly used in loop handling code.
 
-There are a couple of other helpers, including `addInstToSpecificSourceAtom` 
used for `rets` which is covered in the examples below.
+`CodeGenFunction::addInstToSpecificSourceAtom(llvm::Instruction 
*KeyInstruction, llvm::Value *Backup, uint64_t Atom)` adds the instruction (and 
backup instruction if non-null) to the specific group `Atom`. This is currently 
only used for `rets` which is explored in the examples below. Special handling 
is needed due to the fact that an existing atom group needs to be reused in 
some circumstances, so neither of the other helper functions are appropriate.
 
 ## Examples
 
@@ -39,7 +39,7 @@ entry:
 }
 ```
 
-The store is the key instruction for the assignment (`atomGroup` 1). The 
instruction corresponding to the final (and in this case only) RHS value, the 
load from `%a.addr`, is a good backup location for `is_stmt` if the store gets 
optimized away. It's part of the same source atom, but has lower `is_stmt` 
precedence, so it gets a higher `atomRank`.
+The store is the key instruction for the assignment (`atomGroup` 1). The 
instruction corresponding to the final (and in this case only) RHS value, the 
load from `%a.addr`, is a good backup location for `is_stmt` if the store gets 
optimized away. It's part of the same source atom, but has lower `is_stmt` 
precedence, so it gets a higher `atomRank`. This is achieved by starting an 
atom group with `ApplyAtomGroup` for the source atom (in this case a variable 
init) in `EmitAutoVarInit`. The instructions (both key and backup) are then 
annotated by call to `addInstToCurrentSourceAtom` called from 
`EmitStoreOfScalar`.
 
 The implicit return is also key (`atomGroup` 2) so that it's stepped on, to 
match existing non-key-instructions behaviour. This is achieved by calling  
`addInstToNewSourceAtom` from within `EmitFunctionEpilog`.
 
diff --git a/llvm/docs/KeyInstructionsDebugInfo.md 
b/llvm/docs/KeyInstructionsDebugInfo.md
index 7d33f02174c6c..018d63e788dd6 100644
--- a/llvm/docs/KeyInstructionsDebugInfo.md
+++ b/llvm/docs/KeyInstructionsDebugInfo.md
@@ -1,14 +1,14 @@
 # Key Instructions debug info in LLVM
 
-Key Instructions reduces the jumpiness of optimized code debug stepping. This 
document explains the feature and how it is implemented in LLVM. For Clang 
support please see the [Clang docs](../../clang/docs/KeyInstructionsClang.md)
+Key Instructions is an LLVM feature that reduces the jumpiness of optimized 
code debug stepping by discriminating the significance of instrucions that make 
up source language statements. This document explains the feature and how it is 
implemented in LLVM. For Clang support please see the [Clang 
docs](../../clang/docs/KeyInstructionsClang.md)
 
 ## Status
 
-In development, but mostly complete. The feature is currently disabled for 
coroutines.
+Feature complete except for coroutines, which fall back to 
not-key-instructions handling for now but will get support soon (there is no 
fundemental reason why they cannot be supported, we've just not got to it at 
time of writing).
 
 Tell Clang [not] to produce Key Instructions metadata with 
`-g[no-]key-instructions`. See the Clang docs for implementation info.
 
-The feature improves optimized code stepping; it's intended for the feature to 
be used with optimisations enabled. Although the feature works at O0 it is not 
recommended because in some cases the effect of editing variables may not 
always be immediately realised. (This is a quirk of the current implementation, 
rather than fundemental limitation, covered in more detail later).
+The feature improves optimized code stepping; it's intended for the feature to 
be used with optimisations enabled. Although the feature works at O0 it is not 
recommended because in some cases the effect of editing variables may not 
always be immediately realised. In some cases, debuggers may place a breakpoint 
after parts of an expression have been evaluated, which limits the ability to 
have variable edits affect expressions. (This is a quirk of the current 
implementation, rather than fundemental limitation, covered in more detail 
[later](#disabled-at-o0).)
 
 This is a DWARF-based feature. There is currently no plan to support CodeView.
 
@@ -20,7 +20,7 @@ A lot of the noise in stepping comes from code motion and 
instruction scheduling
 
 DWARF provides a helpful tool the compiler can employ to mitigate this 
jumpiness, the `is_stmt` flag, which indicates that an instruction is a 
recommended breakpoint location. However, LLVM's current approach to deciding 
`is_stmt` placement essentially reduces down to "is the associated line number 
different to the previous instruction's?".
 
-(Note: It's up to the debugger if it wants to interpret `is_stmt` or not, and 
at time of writing LLDB doesn't; possibly because until now LLVM's is_stmts 
convey no information that can't already be deduced from the rest of the line 
table.)
+(Note: It's up to the debugger if it wants to interpret `is_stmt` or not, and 
at time of writing LLDB doesn't; possibly because until now LLVM's `is_stmt`s 
convey no information that can't already be deduced from the rest of the line 
table.)
 
 ## Solution overview
 
@@ -43,7 +43,7 @@ From the perspective of a source-level debugger user:
 
 1. *The metadata* - The two new `DILocation` fields are `atomGroup` and 
`atomRank` and are both are unsigned integers. `atomGroup` is 61 bits and 
`atomRank` 3 bits. Instructions in the same function with the same `(atomGroup, 
inlinedAt)` pair are part of the same source atom. `atomRank` determines 
`is_stmt` preference within that group, where a lower number is higher 
precedence. Higher rank instructions act as "backup" `is_stmt` locations, 
providing good fallback locations if/when the primary candidate gets optimized 
away. The default values of 0 indicate the instruction isn’t interesting - it's 
not an `is_stmt` candidate. If `keyInstructions` in `DISubprogram` is false 
(default) then the new `DILocation` metadata is ignored for the function 
(including inlined instances) when emitting DWARF.
 
-2. *Clang annotates key instructions* with the new metadata. Variable 
assignments (stores, memory intrinsics), control flow (branches and their 
conditions, some unconditional branches), and exception handling instructions 
are annotated. Calls are ignored as they're unconditionally marked is_stmt.
+2. *Clang annotates key instructions* with the new metadata. Variable 
assignments (stores, memory intrinsics), control flow (branches and their 
conditions, some unconditional branches), and exception handling instructions 
are annotated. Calls are ignored as they're unconditionally marked `is_stmt`.
 
 3. *Throughout optimisation*, the `DILocation` is propagated normally. Cloned 
instructions get the original’s `DILocation`, the new fields get merged in 
`getMergedLocation`, etc. However, pass writers need to intercede in cases 
where a code path is duplicated, e.g. unrolling, jump-threading. In these cases 
we want to emit key instructions in both the original and duplicated code, so 
the duplicated must be assigned new `atomGroup` numbers, in a similar way that 
instruction operands must get remapped. There are facilities to help this: 
`mapAtomInstance(const DebugLoc &DL, ValueToValueMapTy &VMap)` adds an entry to 
`VMap` which can later be used for remapping using 
`llvm::RemapSourceAtom(Instruction *I, ValueToValueMapTy &VM)`. 
`mapAtomInstance` is called from `llvm::CloneBasicBlock` and 
`llvm::RemapSourceAtom` is called from `llvm::RemapInstruction` so in many 
cases no additional work is actually needed.
 
@@ -57,15 +57,15 @@ We’ve used contiguous line numbers rather than atom 
membership as the test the
 
 ## Adding the feature to a front end
 
-Front ends that want to use the feature need to do some heavy lifting; they 
need to annotate Key Instructions and their backups with `DILocations` with the 
necessary `atomGroup` and `atomRank` values. They also need to set 
`keyInstructions` true in `DISubprogram`s to tell LLVM to interpret the new 
metadata in those functions.
+Front ends that want to use the feature need to group and rank instructions 
according to their source atoms and interingness by attaching `DILocations` 
with the necessary `atomGroup` and `atomRank` values. They also need to set the 
`keyInstructions` field to `true` in `DISubprogram`s to tell LLVM to interpret 
the new metadata in those functions.
 
-The prototype had LLVM annotate instructions (instead of Clang) using simple 
heuristics (just looking at kind of instructions). This doesn't exist anywhere 
upstream, but could be shared if there's interest (e.g., so another front end 
can try it out before committing to a full implementation), feel fre to reach 
out on Discourse (@OCHyams, @jmorse).
+The prototype had LLVM annotate instructions (instead of Clang) using simple 
heuristics (just looking at kind of instructions, e.g., annotating all stores, 
conditional branches, etc). This doesn't exist anywhere upstream, but could be 
shared if there's interest (e.g., so another front end can try it out before 
committing to a full implementation), feel free to reach out on Discourse 
(@OCHyams, @jmorse).
 
 ## Limitations
 
 ### Lack of multiple atom membership
 
-Using a number to represent atom membership is limiting; currently an 
instruction cannot belong to multiple atoms. Does this come up in practice? 
Yes. Both in the front end and during optimisations. Consider this C code:
+Using a number to represent atom membership is limiting; currently an 
instruction that belongs to multiple source atoms cannot belong to multiple 
atom groups. This does occur in practice, both in the front end and during 
optimisations. Consider this C code:
 ```c
 a = b = c;
 ```
@@ -86,7 +86,7 @@ Consider the following code without optimisations:
 int c =
     a + b;
 ```
-In the current implementation an `is_stmt` won't be generated for the `a + b` 
instruction, meaning debuggers will likely step over the `add` and stop at the 
`store` of the result into `c` (which does get `is_stmt`). A user might have 
hoped to edit `a` or `b` on the previous line in order to alter the result 
stored to `c`, which they now won't have the chance to do (they'd need to edit 
the variables on a previous line instead). If the expression was all on one 
line then they would be able to edit the values before the `add`. For these 
reasons we're choosing to recommend that the feature should not be enabled at 
O0.
+In the current implementation an `is_stmt` won't be generated for the `a + b` 
instruction, meaning debuggers will likely step over the `add` and stop at the 
`store` of the result into `c` (which does get `is_stmt`). A user might have 
wished to edit `a` or `b` on the previous line in order to alter the result 
stored to `c`, which they now won't have the chance to do (they'd need to edit 
the variables on a previous line instead). If the expression was all on one 
line then they would be able to edit the values before the `add`. For these 
reasons we're choosing to recommend that the feature should not be enabled at 
O0.
 
 It should be possible to fix this case if we make a few changes: add all the 
instructions in the statement (i.e., including the loads) to the atom, and 
tweak the DwarfEmission code to understand this situation (same atom, different 
line). So there is room to persue this in the future. Though that gets tricky 
in some cases due to the [other limitation mentioned 
above](#lack-of-multiple-atom-membership), e.g.:
 ```c
@@ -110,5 +110,5 @@ O0 isn't a key use-case so solving this is not a priority 
for the initial implem
 ---
 
 **References**
-* [1] Key Instructions: Solving the Code Location Problem for Optimized Code 
(C. Tice, . S. L. Graham, 2000)
+* [1] Key Instructions: Solving the Code Location Problem for Optimized Code 
(C. Tice, S. L. Graham, 2000)
 * [2] Debugging Optimized Code: Concepts and Implementation on DIGITAL Alpha 
Systems (R. F. Brender et al)
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
index 9d69ca08d8f9a..68b5aa2ccd176 100644
--- a/llvm/docs/UserGuides.rst
+++ b/llvm/docs/UserGuides.rst
@@ -46,6 +46,8 @@ intermediate LLVM representation.
    InstCombineContributorGuide
    InstrProfileFormat
    InstrRefDebugInfo
+   KeyInstructionsClang
+   KeyInstructionsDebugInfo
    LinkTimeOptimization
    LoopTerminology
    MarkdownQuickstartTemplate
@@ -63,7 +65,6 @@ intermediate LLVM representation.
    ResponseGuide
    Remarks
    RemoveDIsDebugInfo
-   KeyInstructions
    RISCVUsage
    RISCV/RISCVVectorExtension
    SourceLevelDebugging
@@ -103,7 +104,7 @@ Clang
 :doc:`CFIVerify`
   A description of the verification tool for Control Flow Integrity.
 
-:doc: `KeyInstructionsClang`
+:doc:`KeyInstructionsClang`
    This document explains how the debug info feature Key Instructions is
    implemented in Clang.
 
@@ -192,7 +193,7 @@ Optimizations
    This is a migration guide describing how to move from debug info using
    intrinsics such as dbg.value to using the non-instruction DbgRecord object.
 
-:doc: `KeyInstructionsDebugInfo`
+:doc:`KeyInstructionsDebugInfo`
    This document explains how the debug info feature Key Instructions is
    implemented in LLVM.
 

>From a1688df50653f76f9928bb83a479b86344b0e7a8 Mon Sep 17 00:00:00 2001
From: Orlando Cazalet-Hyams <orlando.hy...@sony.com>
Date: Tue, 15 Jul 2025 11:43:43 +0100
Subject: [PATCH 7/9] fix layout, remove broken link to clang docs

---
 llvm/docs/KeyInstructionsDebugInfo.md | 12 +++++-------
 llvm/docs/UserGuides.rst              |  5 -----
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/llvm/docs/KeyInstructionsDebugInfo.md 
b/llvm/docs/KeyInstructionsDebugInfo.md
index 018d63e788dd6..19e1ff3aacab7 100644
--- a/llvm/docs/KeyInstructionsDebugInfo.md
+++ b/llvm/docs/KeyInstructionsDebugInfo.md
@@ -1,6 +1,6 @@
 # Key Instructions debug info in LLVM
 
-Key Instructions is an LLVM feature that reduces the jumpiness of optimized 
code debug stepping by discriminating the significance of instrucions that make 
up source language statements. This document explains the feature and how it is 
implemented in LLVM. For Clang support please see the [Clang 
docs](../../clang/docs/KeyInstructionsClang.md)
+Key Instructions is an LLVM feature that reduces the jumpiness of optimized 
code debug stepping by discriminating the significance of instrucions that make 
up source language statements. This document explains the feature and how it is 
implemented in LLVM. For Clang support please see the Clang docs.
 
 ## Status
 
@@ -41,18 +41,16 @@ From the perspective of a source-level debugger user:
 3. There’s some bookkeeping required by optimisations that duplicate control 
flow.
 4. During DWARF emission, the new metadata is collected (linear scan over 
instructions) to decide `is_stmt` placements.
 
+Details:
+
 1. *The metadata* - The two new `DILocation` fields are `atomGroup` and 
`atomRank` and are both are unsigned integers. `atomGroup` is 61 bits and 
`atomRank` 3 bits. Instructions in the same function with the same `(atomGroup, 
inlinedAt)` pair are part of the same source atom. `atomRank` determines 
`is_stmt` preference within that group, where a lower number is higher 
precedence. Higher rank instructions act as "backup" `is_stmt` locations, 
providing good fallback locations if/when the primary candidate gets optimized 
away. The default values of 0 indicate the instruction isn’t interesting - it's 
not an `is_stmt` candidate. If `keyInstructions` in `DISubprogram` is false 
(default) then the new `DILocation` metadata is ignored for the function 
(including inlined instances) when emitting DWARF.
 
 2. *Clang annotates key instructions* with the new metadata. Variable 
assignments (stores, memory intrinsics), control flow (branches and their 
conditions, some unconditional branches), and exception handling instructions 
are annotated. Calls are ignored as they're unconditionally marked `is_stmt`.
 
-3. *Throughout optimisation*, the `DILocation` is propagated normally. Cloned 
instructions get the original’s `DILocation`, the new fields get merged in 
`getMergedLocation`, etc. However, pass writers need to intercede in cases 
where a code path is duplicated, e.g. unrolling, jump-threading. In these cases 
we want to emit key instructions in both the original and duplicated code, so 
the duplicated must be assigned new `atomGroup` numbers, in a similar way that 
instruction operands must get remapped. There are facilities to help this: 
`mapAtomInstance(const DebugLoc &DL, ValueToValueMapTy &VMap)` adds an entry to 
`VMap` which can later be used for remapping using 
`llvm::RemapSourceAtom(Instruction *I, ValueToValueMapTy &VM)`. 
`mapAtomInstance` is called from `llvm::CloneBasicBlock` and 
`llvm::RemapSourceAtom` is called from `llvm::RemapInstruction` so in many 
cases no additional work is actually needed.
-
-`mapAtomInstance` ensures `LLVMContextImpl::NextAtomGroup` is kept up to date, 
which is the global “next available atom number”.
-
+3. *Throughout optimisation*, the `DILocation` is propagated normally. Cloned 
instructions get the original’s `DILocation`, the new fields get merged in 
`getMergedLocation`, etc. However, pass writers need to intercede in cases 
where a code path is duplicated, e.g. unrolling, jump-threading. In these cases 
we want to emit key instructions in both the original and duplicated code, so 
the duplicated must be assigned new `atomGroup` numbers, in a similar way that 
instruction operands must get remapped. There are facilities to help this: 
`mapAtomInstance(const DebugLoc &DL, ValueToValueMapTy &VMap)` adds an entry to 
`VMap` which can later be used for remapping using 
`llvm::RemapSourceAtom(Instruction *I, ValueToValueMapTy &VM)`. 
`mapAtomInstance` is called from `llvm::CloneBasicBlock` and 
`llvm::RemapSourceAtom` is called from `llvm::RemapInstruction` so in many 
cases no additional work is actually needed. `mapAtomInstance` ensures 
`LLVMContextImpl::NextAtomGroup` is kept up to date, which is the global “next 
available atom number”.
 The `DILocations` carry over from IR to MIR as normal, without any changes.
 
-4. *DWARF emission* - Iterate over all instructions in a function. For each 
`(atomGroup, inlinedAt)` pair we find the set of instructions sharing the 
lowest rank. Only the last of these instructions in each basic block is 
included in the set. The instructions in this set get `is_stmt` applied to 
their source locations. That `is_stmt` then "floats" to the top of contiguous 
sequence of instructions with the same line number in the same basic block. 
That has two benefits when optimisations are enabled. First, this floats 
`is_stmt` to the top of epilogue instructions (rather than applying it to the 
`ret` instruction itself) which is important to avoid losing variable location 
coverage at return statements. Second, it reduces the difference in optimized 
code stepping behaviour between when Key Instructions is enabled and disabled 
in “uninteresting” cases. I.e., it appears to generally reduce unnecessary 
changes in stepping.
-
+4. *DWARF emission* - Iterate over all instructions in a function. For each 
`(atomGroup, inlinedAt)` pair we find the set of instructions sharing the 
lowest rank. Only the last of these instructions in each basic block is 
included in the set. The instructions in this set get `is_stmt` applied to 
their source locations. That `is_stmt` then "floats" to the top of contiguous 
sequence of instructions with the same line number in the same basic block. 
That has two benefits when optimisations are enabled. First, this floats 
`is_stmt` to the top of epilogue instructions (rather than applying it to the 
`ret` instruction itself) which is important to avoid losing variable location 
coverage at return statements. Second, it reduces the difference in optimized 
code stepping behaviour between when Key Instructions is enabled and disabled 
in “uninteresting” cases. I.e., it appears to generally reduce unnecessary 
changes in stepping.\
 We’ve used contiguous line numbers rather than atom membership as the test 
there because of our choice to represent source atoms with a single integer ID. 
We can’t have instructions belonging to multiple atom groups or represent any 
kind of grouping hierarchy. That means we can’t rely on all the call setup 
instructions being in the same group currently (e.g., if one of the argument 
expressions contains key functionality such as a store, it will be in its own 
group).
 
 ## Adding the feature to a front end
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
index 68b5aa2ccd176..021118ae12769 100644
--- a/llvm/docs/UserGuides.rst
+++ b/llvm/docs/UserGuides.rst
@@ -46,7 +46,6 @@ intermediate LLVM representation.
    InstCombineContributorGuide
    InstrProfileFormat
    InstrRefDebugInfo
-   KeyInstructionsClang
    KeyInstructionsDebugInfo
    LinkTimeOptimization
    LoopTerminology
@@ -104,10 +103,6 @@ Clang
 :doc:`CFIVerify`
   A description of the verification tool for Control Flow Integrity.
 
-:doc:`KeyInstructionsClang`
-   This document explains how the debug info feature Key Instructions is
-   implemented in Clang.
-
 LLVM Builds and Distributions
 -----------------------------
 

>From 3690be700fb276cfcc14a813f49c8b0ce4060f47 Mon Sep 17 00:00:00 2001
From: Orlando Cazalet-Hyams <orlando.hy...@sony.com>
Date: Tue, 15 Jul 2025 11:59:06 +0100
Subject: [PATCH 8/9] merge clang into llvm doc

---
 clang/docs/KeyInstructionsClang.md    | 46 ------------------
 llvm/docs/KeyInstructionsDebugInfo.md | 68 +++++++++++++++++++++++----
 2 files changed, 58 insertions(+), 56 deletions(-)
 delete mode 100644 clang/docs/KeyInstructionsClang.md

diff --git a/clang/docs/KeyInstructionsClang.md 
b/clang/docs/KeyInstructionsClang.md
deleted file mode 100644
index 23600b23eed32..0000000000000
--- a/clang/docs/KeyInstructionsClang.md
+++ /dev/null
@@ -1,46 +0,0 @@
-# Key Instructions in Clang
-
-Key Instructions is an LLVM feature that reduces the jumpiness of optimized 
code debug stepping by discriminating the significance of instrucions that make 
up source language statements. This document explains how Clang applies the 
necessary metadata.
-
-## Implementation
-
-See the [LLVM docs](../../llvm/docs/KeyInstructionsDebugInfo.md) for general 
info about the feature (and LLVM implementation details).
-
-Clang needs to annotate key instructions with the new metadata. Variable 
assignments (stores, memory intrinsics), control flow (branches and their 
conditions, some unconditional branches), and exception handling instructions 
are annotated. Calls are ignored as they're unconditionally marked `is_stmt`. 
This is achieved with a few simple constructs:
-
-Class `ApplyAtomGroup` - This is a scoped helper similar to 
`ApplyDebugLocation` that creates a new source atom group which instructions 
can be added to. It's used during CodeGen to declare that a new source atom has 
started, e.g. in `CodeGenFunction::EmitBinaryOperatorLValue`.
-
-`CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction 
*KeyInstruction, llvm::Value *Backup)` adds an instruction (and a backup 
instruction if non-null) to the current "atom group" defined with 
`ApplyAtomGroup`. The Key Instruction gets rank 1, and backup instructions get 
higher ranks (the function looks through casts, applying increasing rank as it 
goes). There are a lot of sites in Clang that need to call this (mostly stores 
and store-like instructions). Most stores created through `CGBuilderTy` are 
annotated, but some that don't need to be key are not. It's important to 
remember that if there's no active atom group, i.e. no active `ApplyAtomGroup` 
instance, then `addInstToCurrentSourceAtom` does not annotate the instructions.
-
-`CodeGenFunction::addInstToNewSourceAtom(llvm::Instruction *KeyInstruction, 
llvm::Value *Backup)` adds an instruction (and a backup instruction if 
non-null) to a new "atom group". Currently mostly used in loop handling code.
-
-`CodeGenFunction::addInstToSpecificSourceAtom(llvm::Instruction 
*KeyInstruction, llvm::Value *Backup, uint64_t Atom)` adds the instruction (and 
backup instruction if non-null) to the specific group `Atom`. This is currently 
only used for `rets` which is explored in the examples below. Special handling 
is needed due to the fact that an existing atom group needs to be reused in 
some circumstances, so neither of the other helper functions are appropriate.
-
-## Examples
-
-A simple example walk through:
-```
-void fun(int a) {
-  int b = a;
-}
-```
-
-There are two key instructions here, the assignment and the implicit return. 
We want to emit metadata that looks like this:
-
-```
-define hidden void @_Z3funi(i32 noundef %a) #0 !dbg !11 {
-entry:
-  %a.addr = alloca i32, align 4
-  %b = alloca i32, align 4
-  store i32 %a, ptr %a.addr, align 4
-  %0 = load i32, ptr %a.addr, align 4, !dbg !DILocation(line: 2, scope: !11, 
atomGroup: 1, atomRank: 2)
-  store i32 %0, ptr %b, align 4,       !dbg !DILocation(line: 2, scope: !11, 
atomGroup: 1, atomRank: 1)
-  ret void,                            !dbg !DILocation(line: 3, scope: !11, 
atomGroup: 2, atomRank: 1)
-}
-```
-
-The store is the key instruction for the assignment (`atomGroup` 1). The 
instruction corresponding to the final (and in this case only) RHS value, the 
load from `%a.addr`, is a good backup location for `is_stmt` if the store gets 
optimized away. It's part of the same source atom, but has lower `is_stmt` 
precedence, so it gets a higher `atomRank`. This is achieved by starting an 
atom group with `ApplyAtomGroup` for the source atom (in this case a variable 
init) in `EmitAutoVarInit`. The instructions (both key and backup) are then 
annotated by call to `addInstToCurrentSourceAtom` called from 
`EmitStoreOfScalar`.
-
-The implicit return is also key (`atomGroup` 2) so that it's stepped on, to 
match existing non-key-instructions behaviour. This is achieved by calling  
`addInstToNewSourceAtom` from within `EmitFunctionEpilog`.
-
-Explicit return statements are handled uniquely. Rather than emit a `ret` for 
each `return` Clang, in all but the simplest cases (as in the first example) 
emits a branch to a dedicated block with a single `ret`. That branch is the key 
instruction for the return statement. If there's only one branch to that block, 
because there's only one `return` (as in this example), Clang folds the block 
into its only predecessor. Handily `EmitReturnBlock` returns the `DebugLoc` 
associated with the single branch in that case, which is fed into 
`addInstToSpecificSourceAtom` to ensure the `ret` gets the right group.
diff --git a/llvm/docs/KeyInstructionsDebugInfo.md 
b/llvm/docs/KeyInstructionsDebugInfo.md
index 19e1ff3aacab7..ad157e4b90181 100644
--- a/llvm/docs/KeyInstructionsDebugInfo.md
+++ b/llvm/docs/KeyInstructionsDebugInfo.md
@@ -1,12 +1,12 @@
-# Key Instructions debug info in LLVM
+# Key Instructions debug info in LLVM and Clang
 
-Key Instructions is an LLVM feature that reduces the jumpiness of optimized 
code debug stepping by discriminating the significance of instrucions that make 
up source language statements. This document explains the feature and how it is 
implemented in LLVM. For Clang support please see the Clang docs.
+Key Instructions is an LLVM feature that reduces the jumpiness of optimized 
code debug stepping by discriminating the significance of instrucions that make 
up source language statements. This document explains the feature and how it is 
implemented in [LLVM](#llvm) and [Clang](#clang-and-other-front-ends).
 
 ## Status
 
 Feature complete except for coroutines, which fall back to 
not-key-instructions handling for now but will get support soon (there is no 
fundemental reason why they cannot be supported, we've just not got to it at 
time of writing).
 
-Tell Clang [not] to produce Key Instructions metadata with 
`-g[no-]key-instructions`. See the Clang docs for implementation info.
+Tell Clang [not] to produce Key Instructions metadata with 
`-g[no-]key-instructions`.
 
 The feature improves optimized code stepping; it's intended for the feature to 
be used with optimisations enabled. Although the feature works at O0 it is not 
recommended because in some cases the effect of editing variables may not 
always be immediately realised. In some cases, debuggers may place a breakpoint 
after parts of an expression have been evaluated, which limits the ability to 
have variable edits affect expressions. (This is a quirk of the current 
implementation, rather than fundemental limitation, covered in more detail 
[later](#disabled-at-o0).)
 
@@ -14,6 +14,8 @@ This is a DWARF-based feature. There is currently no plan to 
support CodeView.
 
 Set LLVM flag `-dwarf-use-key-instructions` to `false` to ignore Key 
Instructions metadata when emitting DWARF.
 
+# LLVM
+
 ## Problem statement
 
 A lot of the noise in stepping comes from code motion and instruction 
scheduling. Consider a long expression on a single line. It may involve 
multiple operations that optimisations move, re-order, and interleave with 
other instructions that have different line numbers.
@@ -53,12 +55,6 @@ The `DILocations` carry over from IR to MIR as normal, 
without any changes.
 4. *DWARF emission* - Iterate over all instructions in a function. For each 
`(atomGroup, inlinedAt)` pair we find the set of instructions sharing the 
lowest rank. Only the last of these instructions in each basic block is 
included in the set. The instructions in this set get `is_stmt` applied to 
their source locations. That `is_stmt` then "floats" to the top of contiguous 
sequence of instructions with the same line number in the same basic block. 
That has two benefits when optimisations are enabled. First, this floats 
`is_stmt` to the top of epilogue instructions (rather than applying it to the 
`ret` instruction itself) which is important to avoid losing variable location 
coverage at return statements. Second, it reduces the difference in optimized 
code stepping behaviour between when Key Instructions is enabled and disabled 
in “uninteresting” cases. I.e., it appears to generally reduce unnecessary 
changes in stepping.\
 We’ve used contiguous line numbers rather than atom membership as the test 
there because of our choice to represent source atoms with a single integer ID. 
We can’t have instructions belonging to multiple atom groups or represent any 
kind of grouping hierarchy. That means we can’t rely on all the call setup 
instructions being in the same group currently (e.g., if one of the argument 
expressions contains key functionality such as a store, it will be in its own 
group).
 
-## Adding the feature to a front end
-
-Front ends that want to use the feature need to group and rank instructions 
according to their source atoms and interingness by attaching `DILocations` 
with the necessary `atomGroup` and `atomRank` values. They also need to set the 
`keyInstructions` field to `true` in `DISubprogram`s to tell LLVM to interpret 
the new metadata in those functions.
-
-The prototype had LLVM annotate instructions (instead of Clang) using simple 
heuristics (just looking at kind of instructions, e.g., annotating all stores, 
conditional branches, etc). This doesn't exist anywhere upstream, but could be 
shared if there's interest (e.g., so another front end can try it out before 
committing to a full implementation), feel free to reach out on Discourse 
(@OCHyams, @jmorse).
-
 ## Limitations
 
 ### Lack of multiple atom membership
@@ -80,7 +76,7 @@ Certain optimisations merge source locations, which presents 
another case where
 ### Disabled at O0
 
 Consider the following code without optimisations:
-```
+```c
 int c =
     a + b;
 ```
@@ -105,6 +101,58 @@ Without multiple-atom-membership or some kind of atom 
hierarchy it's not apparen
 
 O0 isn't a key use-case so solving this is not a priority for the initial 
implementation. The trade off, smoother stepping at the cost of not being able 
to edit variables to affect an expression in some cases (and at particular stop 
points), becomes more attractive when optimisations are enabled (we find that 
editing variables in the debugger in optimized code often produces unexpected 
effects, so it's not a big concern that Key Instructions makes it harder 
sometimes).
 
+# Clang and other front ends
+
+Tell Clang [not] to produce Key Instructions metadata with 
`-g[no-]key-instructions`.
+
+## Implementation
+
+Clang needs to annotate key instructions with the new metadata. Variable 
assignments (stores, memory intrinsics), control flow (branches and their 
conditions, some unconditional branches), and exception handling instructions 
are annotated. Calls are ignored as they're unconditionally marked `is_stmt`. 
This is achieved with a few simple constructs:
+
+Class `ApplyAtomGroup` - This is a scoped helper similar to 
`ApplyDebugLocation` that creates a new source atom group which instructions 
can be added to. It's used during CodeGen to declare that a new source atom has 
started, e.g. in `CodeGenFunction::EmitBinaryOperatorLValue`.
+
+`CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction 
*KeyInstruction, llvm::Value *Backup)` adds an instruction (and a backup 
instruction if non-null) to the current "atom group" defined with 
`ApplyAtomGroup`. The Key Instruction gets rank 1, and backup instructions get 
higher ranks (the function looks through casts, applying increasing rank as it 
goes). There are a lot of sites in Clang that need to call this (mostly stores 
and store-like instructions). Most stores created through `CGBuilderTy` are 
annotated, but some that don't need to be key are not. It's important to 
remember that if there's no active atom group, i.e. no active `ApplyAtomGroup` 
instance, then `addInstToCurrentSourceAtom` does not annotate the instructions.
+
+`CodeGenFunction::addInstToNewSourceAtom(llvm::Instruction *KeyInstruction, 
llvm::Value *Backup)` adds an instruction (and a backup instruction if 
non-null) to a new "atom group". Currently mostly used in loop handling code.
+
+`CodeGenFunction::addInstToSpecificSourceAtom(llvm::Instruction 
*KeyInstruction, llvm::Value *Backup, uint64_t Atom)` adds the instruction (and 
backup instruction if non-null) to the specific group `Atom`. This is currently 
only used for `rets` which is explored in the examples below. Special handling 
is needed due to the fact that an existing atom group needs to be reused in 
some circumstances, so neither of the other helper functions are appropriate.
+
+## Examples
+
+A simple example walk through:
+```c
+void fun(int a) {
+  int b = a;
+}
+```
+
+There are two key instructions here, the assignment and the implicit return. 
We want to emit metadata that looks like this:
+
+```llvm
+define hidden void @_Z3funi(i32 noundef %a) #0 !dbg !11 {
+entry:
+  %a.addr = alloca i32, align 4
+  %b = alloca i32, align 4
+  store i32 %a, ptr %a.addr, align 4
+  %0 = load i32, ptr %a.addr, align 4, !dbg !DILocation(line: 2, scope: !11, 
atomGroup: 1, atomRank: 2)
+  store i32 %0, ptr %b, align 4,       !dbg !DILocation(line: 2, scope: !11, 
atomGroup: 1, atomRank: 1)
+  ret void,                            !dbg !DILocation(line: 3, scope: !11, 
atomGroup: 2, atomRank: 1)
+}
+```
+
+The store is the key instruction for the assignment (`atomGroup` 1). The 
instruction corresponding to the final (and in this case only) RHS value, the 
load from `%a.addr`, is a good backup location for `is_stmt` if the store gets 
optimized away. It's part of the same source atom, but has lower `is_stmt` 
precedence, so it gets a higher `atomRank`. This is achieved by starting an 
atom group with `ApplyAtomGroup` for the source atom (in this case a variable 
init) in `EmitAutoVarInit`. The instructions (both key and backup) are then 
annotated by call to `addInstToCurrentSourceAtom` called from 
`EmitStoreOfScalar`.
+
+The implicit return is also key (`atomGroup` 2) so that it's stepped on, to 
match existing non-key-instructions behaviour. This is achieved by calling  
`addInstToNewSourceAtom` from within `EmitFunctionEpilog`.
+
+Explicit return statements are handled uniquely. Rather than emit a `ret` for 
each `return` Clang, in all but the simplest cases (as in the first example) 
emits a branch to a dedicated block with a single `ret`. That branch is the key 
instruction for the return statement. If there's only one branch to that block, 
because there's only one `return` (as in this example), Clang folds the block 
into its only predecessor. Handily `EmitReturnBlock` returns the `DebugLoc` 
associated with the single branch in that case, which is fed into 
`addInstToSpecificSourceAtom` to ensure the `ret` gets the right group.
+
+
+## Supporting Key Instructions from another front end
+
+Front ends that want to use the feature need to group and rank instructions 
according to their source atoms and interingness by attaching `DILocations` 
with the necessary `atomGroup` and `atomRank` values. They also need to set the 
`keyInstructions` field to `true` in `DISubprogram`s to tell LLVM to interpret 
the new metadata in those functions.
+
+The prototype had LLVM annotate instructions (instead of Clang) using simple 
heuristics (just looking at kind of instructions, e.g., annotating all stores, 
conditional branches, etc). This doesn't exist anywhere upstream, but could be 
shared if there's interest (e.g., so another front end can try it out before 
committing to a full implementation), feel free to reach out on Discourse 
(@OCHyams, @jmorse).
+
 ---
 
 **References**

>From bc0fd13bfb091a19dc4e97cc59794a1e9734dec7 Mon Sep 17 00:00:00 2001
From: Orlando Cazalet-Hyams <orlando.hy...@sony.com>
Date: Tue, 15 Jul 2025 13:27:11 +0100
Subject: [PATCH 9/9] add ref from HowToUpdateDebugInfo

---
 llvm/docs/HowToUpdateDebugInfo.rst | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/llvm/docs/HowToUpdateDebugInfo.rst 
b/llvm/docs/HowToUpdateDebugInfo.rst
index 3088f59c1066a..1d17d8960ff99 100644
--- a/llvm/docs/HowToUpdateDebugInfo.rst
+++ b/llvm/docs/HowToUpdateDebugInfo.rst
@@ -169,6 +169,14 @@ See the discussion in the section about
 :ref:`merging locations<WhenToMergeLocation>` for examples of when the rule for
 dropping locations applies.
 
+When to remap a debug location
+------------------------------
+
+When code paths are duplicated, during passes such as loop unrolling or jump
+threading, `DILocation` attachments need to be remapped using `mapAtomInstance`
+and `RemapSourceAtom`. This is to support the Key Instructions debug info 
feature.
+See :doc:`KeyInstructionsDebugInfo` for information.
+
 Rules for updating debug values
 ===============================
 

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [KeyInstr] Add docs (PR #137991)

Reply via email to