After today's call, hearing some viewpoints and hopefully learning a few things, I thought I'd take a stab at reframing 240108.1. (Without once mentioning CFI!) It ended up becoming an alternative proposal, but I'm fine with Zoran taking it over if he wants to.
# Describe prologue and epilogue ranges ## Background ### Stopping Points Ordinarily, a source-level debugger will prefer to pause execution of a program at instructions identified by the compiler as good places to do so. These include instructions flagged as `is_stmt`, `prologue_end`, or `epilogue_begin`. A user expects debug info such as source coordinates and variable locations to be sensible and useful at those points. It is entirely possible for execution to pause at other instructions. There are a number of possible reasons for this. - The user has chosen to single-step instructions rather than statements. - The user has requested a breakpoint at a specific instruction that happens not to have any of the above flags. - An asynchronous exception has occurred and the debugger intercepted it. - The program has crashed and the user is looking at a core dump. This list is not exhaustive. Let's call the instruction where a debugger has paused execution (or the instruction where a crash was triggered) a "stopping point." ### Prologue/Epilogue Ranges In DWARF v3 thru v5, a subprogram's prologue(s) and epilogue(s) are described indirectly by the line table. A prologue generally consists of all instructions from an entry point up to the first executed instruction that is flagged as `prologue_end`. An epilogue generally consists of all instructions from an instruction flagged as `epilogue_begin` to where the subprogram returns to its caller. These groups of instructions implicitly form ranges. (These ranges might be empty.) A subprogram might have multiple prologues if it has multiple entry points; more often, it might have multiple epilogues if it has multiple exit or return points. In particular, when there are multiple epilogues it is not necessarily clear when an epilogue ends and the next basic block (which might not be part of any epilogue) begins. (Even in the case of a single epilogue, a cold but functional basic block might be placed after the epilogue.) Due to optimization, prologue or epilogue instructions might be mixed with other instructions, so in practice prologue and epilogue ranges might not be contiguous. DWARF does not have a way to describe these non-contiguous prologue and epilogue ranges. Compilers typically have various heuristics to pick stopping points for optimized prologue and epilogue ranges. ### Single Location Descriptions A single location description (which can be either simple or composite location descriptions) has the lifetime of its closest containing scope. The case we care about here is when that scope is a subprogram, and therefore the lifetime spans the entire subprogram. Pedantically, that lifetime includes prologue and epilogue ranges. It is common practice for unoptimized code to allocate local variables to a stack frame, and use that stack location in the single location description. Because the stack frame is not necessarily in a valid state during prologue or epilogue code, in practice, debuggers typically assume that a single location description is not valid during a prologue or epilogue, although the DWARF spec does not explicitly say so (AFAIK). ## Overview A stopping point might occur during a prologue or epilogue range, which means single location descriptions for subprogram-scope objects might not be valid. - It would be good if the DWARF spec actually said single location descriptions were not necessarily valid in those ranges. This is simply codifying existing practice. - It would be good if debuggers could reliably identify prologue and epilogue ranges. The proposal adds text that excludes prologues and epilogues from the implicit range of a subprogram-scope object, and adds a register to the line-table state machine to identify prologues and epilogues. Unlike `prologue_end` and `epilogue_begin`, the new `prologue_epilogue` register is "sticky" in that it is not automatically reset on every row of the line table. At an entry point, it must be set explicitly to indicate the beginning of a prologue; it is automatically reset by the DW_LNS_set_prologue_end. In an epilogue, it is automatically set by DW_LNS_set_epilogue_begin, and reset by DW_LNE_end_sequence. This means a function with one contiguous prologue and one contiguous epilogue, terminated by `end_sequence`, the line-number program needs only one new opcode to support `prologue_epilogue`. Note: I have not tried to determine whether this minimizes size in practice. It might be that prologues and/or epilogues typically occupy only one row of the line table, in which case having the flag reset on every row might take up less space. ## Proposed Changes In Section 2.6 "Location Descriptions" modify the last sentence of item 1 to read as follows (adding the parenthetical exclusion). > They are sufficient for describing the location of any object as long as its lifetime is either static or the same as the lexical block that owns it (excluding any prologue or epilogue ranges), and it does not move during its lifetime. In Section 6.2.2 "State Machine Registers" add the `prologue_epilogue` register to Table 6.3. | Register Name | Meaning | | ------------- | ------- | | `prologue_epilogue` | A boolean indicating that the current address is within a prologue or epilogue range. | (Keep the `prologue_end` and `epilogue_begin` registers.) In Section 6.2.3 "Line Number Program Instructions" add an entry to Table 6.4 "Line number program initial state." | `prologue_epilogue` | "false" | In Section 6.2.5.2 "Standard Opcodes" modify several descriptions as follows (exact text changes not specified for simplicity). - DW_LNS_set_prologue_end: sets the `prologue_epilogue` register to "false." - DW_LNS_set_epilogue_begin: sets the `prologue_epilogue` register to "true." In Section 6.2.5.3 "Extended Opcodes" add a new opcode at the end. > 4. DW_LNE_set_prologue_epilogue > > The DW_LNE_set_prologue_epilogue opcode takes a single parameter, an unsigned LEB128 integer. If the parameter is 0, it sets the `prologue_epilogue` register of the state machine to "false;" for any other value, it sets the register to "true." In Section 7.22 "Line Number Information" add a new entry for DW_LNE_set_prologue_epilogue in Table 7.26 (probably 0x05). ## Dependencies Not really a dependency, but an implication: Assemblers will need to add syntax to the `.loc` directive to support setting/resetting the `prologue_epilogue` flag. ## References [Issue 240108.1](https://dwarfstd.org/issues/240108.1.html): Add prologue_begin and epilogue_end state machine registers to allow identifying multiple prologue and epilogue regions -- Dwarf-discuss mailing list Dwarf-discuss@lists.dwarfstd.org https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss