I've been working on generating more precise entry points for inlined subroutines in GCC.
One of the goals of this was to provide the debug info consumer with a view number, in addition to the address (more on that later), but working on this, I found out that in several cases a single inlining of a function ends up having some basic blocks replicated, including those holding the entry point, and we have no way to represent the multiple addresses. This may often occur when a loop containing an inlined function is unrolled, but it doesn't actually require unrolling: other simpler CFG transformations that duplicate blocks in whole or in part are enough to get some inlined entry points (but not necessarily entire inlined functions) duplicated. When an inlined function is duplicated as a whole, I guess it is just reasonable to represent it as multiple inlinings of the same function, this is not always the case: I have observed cases in which little more than the entry point got duplicated into two separate branches. However, even in the unrolling case, it is often the case that all instances end up sharing the same automatic variables, except when the intent is to have separate per-iteration variables for e.g. swing modulo scheduling or other forms of iteration pipelining. Such variable replication comes with its own set of challenges, that I'm not focusing on for the time being. For now, I'm thinking of duplication of code within the same subprogram without the introduction of additional copies of variables. In this scenario, each inlined function comes with its own set of lexical blocks and local variables, even if unrolling and whatnot ends up creating multiple copies of each use of such variables. Ideally, I'd like to inform debug information consumers about all inlined entry points, even when multiple such entry points are associated with the same inlined instance of the function. But AFAICT DW_AT_entry_pc can only hold one address or offset. I've considered several representations: - multiple DW_AT_entry_pc attributes in the same DIE -> not permitted 2.2 - multiple lexical blocks with abstract origin pointing to the abstract function, each with a separate entry point -> no good, the lexical block tree will be misrepresented or replicated - using an exprloc form, composing an array of entry addresses with multiple DW_OP_piece expression -> not a very compact representation, and hardly a natural extension - abuse range lists having DW_AT_entry_pc reference one such list, with two entry points per range entry -> not an unreasonable extension, but readers might be confused if ranges are reversed, or if we have an odd number of entry points - likewise, but having a single entry point per range entry, wasting the other address -> no problem with odd counts, but wasteful - allow an address list form (addrptr class?) to be used in DW_AT_entry_pc, with some convention to terminate the list -> this would work AFAICT. Is there any reason to not propose this for consideration in DWARF6? Any other thoughts on the isuse of representing multiple entry points for an inlined subroutine, or even for lexical blocks or regular subprograms? On to adding view numbers to addresses. I don't think it's just location list addresses and inline entry points that could gain in precision being augmented with view numbers. So far, I have proposed an extension to record view numbers in location lists, and I'm now introducing a GNU extended attribute to hold the view number associated with an DW_AT_entry_pc in the same DIE. One issue I'm running into is that view numbers are often computed by the assembler, and encoded as uleb128 numbers. GCC, however, wants to compute the sizes and offsets of DIEs itself. All existing sdata- and udata-encoded attributes ever emitted by GCC are ones that it can compute as constants itself; not so when it comes to view numbers. Anyway, keeping DIE sizes compiler-computed constants is an issue I've run into, and it becomes a larger concern once I start considering proposing an attribute class that holds an address and a view. Aside from the issue of compiler-computed DIE sizes and offsets, some other lists that hold addresses want to be aligned, to simplify address relocations and whatnot. Once we add view numbers encoded as uleb128, any hope for alignment is gone. We might represent view numbers at the same alignment, to keep them in the same sections, but that would likely be wasteful. Using a single indirect representations for both the address and the view number, or a separate indirect reference to each, all have their downsides, and even for long lists of such pairs, referencing a list of addresses (good) and a list of views (erhm) so that corresponding entries of each list are combined into tuples leaves something to be desired. I'd appreciate any thoughts on how to introduce compactly-represented view numbers in most places where code addresses or address ranges could be used. Have these difficulties been run into before? Any advice to share? Thanks in advance, -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer _______________________________________________ Dwarf-Discuss mailing list [email protected] http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
