Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB

Pavel Labath via lldb-dev Tue, 20 Oct 2020 10:51:26 -0700

There's a lot of things that are unclear to me about this proposal. Themechanics of representing an segmented address are one thing, but I Ithink that the really interesting part will be the interaction with therest of lldb. Like- What's going to be the source of this address space information? Is itgoing to be statically baked into lldb (a function of the targetarchitecture?), or dynamically retrieved from the target or platformwe're debugging? How would that work?- How is this going to interact with Object/SymbolFile classes? Are youexpecting to use existing object and symbol formats for address spaceinformation, or some custom ones? AFAIK, none of the existing formatsactually support encoding address space information (though that hasn'tstopped people from trying).

Without understanding the bigger picture it's hard for me to say whetherthe proposed large scale refactoring is a good idea. Nonetheless, I amdoubtful of the viability of that approach. Some of my reasons for that are:- not all addr_ts represent an actual address -- sometimes that is adifference between two addresses, which still uses addr_t, as that'sguaranteed to fit.- relatedly to that, there is a difference (I'd expect) between theoperations supported by the two types. addr_t supports all integraloperations (though I hope we don't use all of them), but I wouldn'texpect to be able to do the same with a SegmentedAddress. For one, I'dexpect it wouldn't be possible to add two SegmentedAddresses together(which is possible for addr_t). OTOH, adding a SegmentedAddress and anaddr_t would probably be fine? Would subtracting two SegmentedAddressesshould result in an addr_t? But only if they have matching addressspaces (and assert otherwise)?- I'd also be worried about over-generalizing specialized code which canafford to work with plain addresses, and where the added address spacewould be a nuisance (or a source of bugs). E.g. ELF has no notion ofaddress space, so I don't think I'd find it helpful to replace all plaininteger calculations in elf parsing code with something more complex.(I'm aware that some people are using elf to encode address spaceinformation, but this is a pretty nonstandard extension, and it'd takemore than type substitution to support anything like that.)

- large scale refactorings are very much not the norm in llvm




On 19/10/2020 23:56, Jonas Devlieghere via lldb-dev wrote:

We want to support segmented address spaces in LLDB. Currently, all ofLLDB’s external API, command line interface, and internals assume thatan address in memory can be addressed unambiguously as an addr_t (akauint64_t). To support a segmented address space we’d need to extendaddr_t with a discriminator (an aspace_t) to uniquely identify alocation in memory. This RFC outlines what would need to change and howwe propose to do that.
### Addresses in LLDB

Currently, LLDB has two ways of representing an address:
- Address object. Mostly represents addresses as Section+offset for abinary image loaded in the Target. An Address in this form can persistacross executions, e.g. an address breakpoint in a binary image thatloads at a different address every execution. An Address object canrepresent memory not mapped to a binary image. Heap, stack, jitteditems, will all be represented as the uint64_t load address of theobject, and cannot persist across multiple executions. You must have theTarget object available to get the current load address of an Addressobject in the current process run. Some parts of lldb do not have aTarget available to them, so they require that the Address can bedevolved to an addr_t (aka uint64_t) and passed in. - The addr_t (aka uint64_t) type. Primarily used when receiving input(e.g. from a user on the command line) or when interacting with theinferior (reading/writing memory) for addresses that need not persistacross runs. Also used when reading DWARF and in our symbol tables torepresent file offset addresses, where the size of an Address objectwould be objectionable.
## Proposal

### Address + ProcessAddress
- The Address object gains a segment discriminator member variable.Everything that creates an Address will need to provide this segmentdiscriminator. - A ProcessAddress object which is a uint64_t and a segmentdiscriminator as a replacement for addr_t. ProcessAddress objects wouldnot persist across multiple executions. Similar to how you can create anaddr_t from an Address+Target today, you can create a ProcessAddressgiven an Address+Target. When we pass around addr_ts today, they wouldbe replaced with ProcessAddress, with the exception of symbol tableswhere the added space would be significant, and we do not believe weneed segment discriminators today.

I'm strongly in favor of the first approach. The reason for that is thatwe have a lot of code that can only reasonable deal with one kind of anaddress, and I'd like to be able to express that in the type system. Infact, I think we could have more distinct types even now, but addingaddress spaces makes that even more important.

### Address Only
Extend the lldb_private::Address class to be the one representation oflocations; including file based ones valid before running, fileaddresses resolved in a process, and process specific addresses(heap/stack/JIT code) that are only valid during a run. That isattractive because it would provide a uniform interface to any “where issomething” question you would ask, either about symbols in files,variables in stack frames, etc.
At present, when we resolve a Section+Offset Address to a “load address”we provide a Target to the resolution API. Providing the Targetexternally makes sense because a Target knows whether the Section ispresent or not and can unambiguously return a load address. We couldcontinue that approach since the Target always holds only one process,or extend it to allow passing in a Process when resolving non-filebacked addresses. But this would make the conversion from addr_t usesto Address uses more difficult, since we will have to push the Target orProcess into all the API’s that make use of just an addr_t. Using asingle Address class seems less attractive when you have to provide anexternal entity to make sense of it at all the use sites.
We could improve this situation by including a Process (as a weakpointer) and fill that in on the boundaries where in the current code wego from an Address to a process specific addr_t. That would make theconversion easier, but add complexity. Since Addresses are ubiquitous,you won’t know what any given Address you’ve been handed actuallycontains. It could even have been resolved for another process than thecurrent one. Making Address usage-dependent in this way reduces theattractiveness of the solution.
## Approach
Replacing all the instances of addr_t by hand would be a lot of work.Therefore we propose writing a clang-based tool to automate this menialtask. The tool would update function signatures and replace uses ofaddr_t inside those functions to get the addr_t from the ProcessAddressor Address and return the appropriate object for functions thatcurrently return an addr_t. The goal of this tool is to generate one bigNFC patch. This tool needs not be perfect, at some point it will be morework to improve the tool than fixing up the remaining code by hand.After this patch LLDB would still not really understand address spacesbut it will have everything in place to support them.
Once all the APIs are updated, we can start working on the functionalchanges. This means actually interpreting the aspace_t values and makingsure they don’t get dropped.
Finally, when all this work is done and we’re happy with the approach,we extend the SB API with overloads for the functions that currentlytake or return addr_t . I want to do this last so we have time toiterate before committing to a stable interface.
## Testing
By splitting off the intrusive non-functional changes we are able torely on the existing tests for coverage. Smaller functional changes canbe tested in isolation, either through a unit test or a small GDB remotetest. For end-to-end testing we can run the test suite with a modifieddebugserver that spoofs address spaces.
Thanks,
Jonas


_______________________________________________
lldb-dev mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


_______________________________________________
lldb-dev mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB

Reply via email to