Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB

Greg Clayton via lldb-dev Mon, 19 Oct 2020 16:12:14 -0700


> On Oct 19, 2020, at 2:56 PM, Jonas Devlieghere via lldb-dev 
> <[email protected]> wrote:
> 
> We want to support segmented address spaces in LLDB. Currently, all of LLDB’s 
> external API, command line interface, and internals assume that an address in 
> memory can be addressed unambiguously as an addr_t (aka uint64_t). To support 
> a segmented address space we’d need to extend addr_t with a discriminator (an 
> aspace_t) to uniquely identify a location in memory. This RFC outlines what 
> would need to change and how we propose to do that.
> 
> ### Addresses in LLDB
> 
> Currently, LLDB has two ways of representing an address:
> 
>  - Address object. Mostly represents addresses as Section+offset for a binary 
> image loaded in the Target. An Address in this form can persist across 
> executions, e.g. an address breakpoint in a binary image that loads at a 
> different address every execution. An Address object can represent memory not 
> mapped to a binary image. Heap, stack, jitted items, will all be represented 
> as the uint64_t load address of the object, and cannot persist across 
> multiple executions. You must have the Target object available to get the 
> current load address of an Address object in the current process run. Some 
> parts of lldb do not have a Target available to them, so they require that 
> the Address can be devolved to an addr_t (aka uint64_t) and passed in.
>  - The addr_t (aka uint64_t) type. Primarily used when receiving input (e.g. 
> from a user on the command line) or when interacting with the inferior 
> (reading/writing memory) for addresses that need not persist across runs. 
> Also used when reading DWARF and in our symbol tables to represent file 
> offset addresses, where the size of an Address object would be objectionable.
>


Correction: LLDB has 3 kinds of uint64_t addresses:
- "file address" which are always mapped to a section + offset if put into a 
Address object. This value only makes sense to the lldb_private::Module that 
contains it. The only way to pass this around is as a lldb_private::Address. 
You can make queries on a file address using "image lookup --address" before 
you are debugging, but a single file address can result in multiple matches in 
multiple modules because each module might contain something at this virtual 
address. This object might be able to be converted to a "load address" if the 
section is loaded in your debug target. Since the target contains the section 
load list, the target is needed when converting between Address and addr_t 
objects.
- "load address" which is guaranteed to be unique in a process with no 
segments. It can always be put into a lldb_private::Address object, but that 
object won't always have a section. If there is no section, it means the memory 
location maps to stack, heap, or other memory that doesn't reside in a object 
file section. This object might be able to be converted to a section + offset 
address if the address matches one of the loaded sections in a target. If this 
can be converted to a Address object that has a section, then it can persist 
across debug sessions, otherwise, not.
- "host address" which is a pointer to memory in the LLDB process itself. Used 
for storing expression results and other things. You cannot convert this 
to/from a "file" or "load" address.

> ## Proposal
> 
> ### Address + ProcessAddress
> 
>  - The Address object gains a segment discriminator member variable. 
> Everything that creates an Address will need to provide this segment 
> discriminator.

So an interesting thing to think about is if lldb_private::Section object 
should contain a segment identifier? If this is the case, then an Address 
object can have a Section that has a segment _and_ the Address object itself 
might have one that was set from the section as well. It would be good to 
figure out what the rules are for this case and it might lead to the need for 
an intelligent accessor that always prefers the section's segment if a section 
is available. The Address object must have one in case we have a pointer to 
memory in data and there is no section for this (like any heap addresses).

>  - A ProcessAddress object which is a uint64_t and a segment discriminator as 
> a replacement for addr_t. ProcessAddress objects would not persist across 
> multiple executions. Similar to how you can create an addr_t from an 
> Address+Target today, you can create a ProcessAddress given an 
> Address+Target. When we pass around addr_ts today, they would be replaced 
> with ProcessAddress, with the exception of symbol tables where the added 
> space would be significant, and we do not believe we need segment 
> discriminators today.

Would SegmentedAddress be a more descriptive name here?

A few things I would like to see on ProcessAddress or SegmentedAddress:
- Have a segment definition that says "no segment" like LLDB_INVALID_SEGMENT or 
LLDB_NO_SEGMENT and allow these objects to be constructed with just a 
lldb::addr_t and the segment gets auto set to LLDB_NO_SEGMENT
- Any code that uses these should test if there is no segment and continue to 
do what they used to do before
  - like read/write memory in ProcessGDBRemote
  - Anything that dumps one of these objects should dump just like they used to 
(just a uint64_t hex representation and no other notation)
- Add code that can convert a "load address" into a ProcessAddress or 
SegmentedAddress that invent the segment notation and have no changes for 
targets that don't support segmented address spaces
  - 0x1000 should convert to ProcessAddress where the address is 0x1000 and 
segment is LLDB_INVALID_SEGMENT or LLDB_NO_SEGMENT if the process doesn't 
support segmented addresses
  - 0x1000 would return an error on conversion for processes that do support 
segmented addresses as the segment must be specified? Or should there be a 
default segment if we run into this case?
  - Come up with some quick way to represent segmented addresses for an address 
of 0x1000 in segment 2: ideas:
    - [2]0x1000
    - {2}0x1000
    - 0x1000[2]
    - 0x1000{2}
    - {0x1000, 2}

> 
> ### Address Only
> 
> Extend the lldb_private::Address class to be the one representation of 
> locations; including file based ones valid before running, file addresses 
> resolved in a process, and process specific addresses (heap/stack/JIT code) 
> that are only valid during a run. That is attractive because it would provide 
> a uniform interface to any “where is something” question you would ask, 
> either about symbols in files, variables in stack frames, etc.
> 
> At present, when we resolve a Section+Offset Address to a “load address” we 
> provide a Target to the resolution API.  Providing the Target externally 
> makes sense because a Target knows whether the Section is present or not and 
> can unambiguously return a load address.    We could continue that approach 
> since the Target always holds only one process, or extend it to allow passing 
> in a Process when resolving non-file backed addresses.  But this would make 
> the conversion from addr_t uses to Address uses more difficult, since we will 
> have to push the Target or Process into all the API’s that make use of just 
> an addr_t.  Using a single Address class seems less attractive when you have 
> to provide an external entity to make sense of it at all the use sites.
> 
> We could improve this situation by including a Process (as a weak pointer) 
> and fill that in on the boundaries where in the current code we go from an 
> Address to a process specific addr_t.  That would make the conversion easier, 
> but add complexity.  Since Addresses are ubiquitous, you won’t know what any 
> given Address you’ve been handed actually contains.  It could even have been 
> resolved for another process than the current one.  Making Address 
> usage-dependent in this way reduces the attractiveness of the solution.
> 
> ## Approach
> 
> Replacing all the instances of addr_t by hand would be a lot of work. 
> Therefore we propose writing a clang-based tool to automate this menial task. 
> The tool would update function signatures and replace uses of addr_t inside 
> those functions to get the addr_t from the ProcessAddress or Address and 
> return the appropriate object for functions that currently return an addr_t. 
> The goal of this tool is to generate one big NFC patch. This tool needs not 
> be perfect, at some point it will be more work to improve the tool than 
> fixing up the remaining code by hand. After this patch LLDB would still not 
> really understand address spaces but it will have everything in place to 
> support them.

This won't be NFC really as each location that plays with what used to be 
addr_t now must check if the segment is invalid before doing what it did before 
_and_ return an error if the segment is something valid.

It might be better to look at all of the APIs that could end up using a plain 
"addr_t" and adding new APIs that take a ProcessAddress and call the old API if 
the segment is LLDB_INVALID_SEGMENT or LLDB_NO_SEGMENT, and return an error if 
the segment is valid. For example in the Process class we have:

virtual size_t Process::DoReadMemory(lldb::addr_t vm_addr, void *buf, size_t 
size, Status &error) = 0;

We could add a new overload:

virtual size_t Process::DoReadMemory(ProcessAddress proc_addr, void *buf, 
size_t size, Status &error) {
  if (proc_addr.GetSegment() == LLDB_NO_SEGMENT)
    return DoReadMemory(proc_addr.GetAddress(), but, size, error);
  error.SetErrorString("segmented addresses are not supported on this process");
  return 0
}

Then we can start modifying the locations that need to support segmented 
addresses as needed. For instance, if we were to add segmented address support 
to ProcessGDBRemote, then we would override this function in that class.

I am not sure if slowly adding this functionality is better than replacing this 
all right away, but we can't just do a global replace without adding 
functionality or error checking IMHO.


> Once all the APIs are updated, we can start working on the functional 
> changes. This means actually interpreting the aspace_t values and making sure 
> they don’t get dropped.
> 
> Finally, when all this work is done and we’re happy with the approach, we 
> extend the SB API with overloads for the functions that currently take or 
> return addr_t . I want to do this last so we have time to iterate before 
> committing to a stable interface.

This might be one reason for doing the approach suggested above where we add 
new internal APIs that take a ProcessAddress and cut over to using them. As it 
would mean all of the current APIs in the lldb::SB layer would remain in place 
(they can't be removed) and would still make sense.

> 
> ## Testing
> 
> By splitting off the intrusive non-functional changes we are able to rely on 
> the existing tests for coverage. Smaller functional changes can be tested in 
> isolation, either through a unit test or a small GDB remote test. For 
> end-to-end testing we can run the test suite with a modified debugserver that 
> spoofs address spaces.

That makes sense. ProcessGDBRemote will need to dynamically respond with wether 
it supports segmented addresses by overloading the DoReadMemory that takes a 
ProcessAddress and do the right thing.

Thanks for taking this on. I hope some of the comments above help moving this 
forward.

Greg

> 
> Thanks,
> Jonas
> 
> _______________________________________________
> lldb-dev mailing list
> [email protected]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

_______________________________________________
lldb-dev mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB

Reply via email to