Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB

Zdenek Prikryl via lldb-dev Tue, 10 Nov 2020 12:59:42 -0800

Hi all,

Just for the record, we have successfully implemented the wrapping ofaddr_t into a class to support multiple address spaces. The info aboutaddress space is stored in the ELF file, so we get the info from ELFparser and then pass it to the rest of the system. CLI/MI interface hasbeen extended as well, so user can select with address space he wantsfor memory printing. Similarly, we patched expression evaluation,disassembler, etc.

If the address wrap is part of the upstream version, it will be awesome:-)...


Best regards.

On 10/20/20 9:30 PM, Ted Woodward via lldb-dev wrote:

I agree with Pavel about the larger picture - we need to know the driver behind 
address spaces before we can discuss a workable solution.

I've dealt with 2 use cases - Harvard architecture cores, and low level 
hardware debugging.

A Harvard architecture core has separate instruction and data memories. These often use 
the same addresses, so to distinguish between them you need address spaces. The Motorola 
DSP56300 had 1 program and 2 data memories, called p, x and y. p:100, x:100 and y:100 
were all separate memories, so "address 100" isn't enough to get what the user 
needed to see.

For low level hardware debugging (often using JTAG), many devices let you access memories in ways like 
"virtual using the TLB", or "virtual == physical, through the core", or "physical, 
through the SoC, not cached". Memory spaces, done right, can give the user the flexibility to pick how 
to view memory.


Are these the use cases you were envisioning, Jonas?

-----Original Message-----
From: lldb-dev <lldb-dev-boun...@lists.llvm.org> On Behalf Of Pavel Labath
via lldb-dev
Sent: Tuesday, October 20, 2020 12:51 PM
To: Jonas Devlieghere <jo...@devlieghere.com>; LLDB <lldb-
d...@lists.llvm.org>
Subject: [EXT] Re: [lldb-dev] [RFC] Segmented Address Space Support in
LLDB

There's a lot of things that are unclear to me about this proposal. The
mechanics of representing an segmented address are one thing, but I I think
that the really interesting part will be the interaction with the rest of lldb. 
Like
- What's going to be the source of this address space information? Is it going
to be statically baked into lldb (a function of the target architecture?), or
dynamically retrieved from the target or platform we're debugging? How
would that work?
- How is this going to interact with Object/SymbolFile classes? Are you
expecting to use existing object and symbol formats for address space
information, or some custom ones? AFAIK, none of the existing formats
actually support encoding address space information (though that hasn't
stopped people from trying).

Without understanding the bigger picture it's hard for me to say whether the
proposed large scale refactoring is a good idea. Nonetheless, I am doubtful of
the viability of that approach. Some of my reasons for that are:
- not all addr_ts represent an actual address -- sometimes that is a difference
between two addresses, which still uses addr_t, as that's guaranteed to fit.
- relatedly to that, there is a difference (I'd expect) between the operations
supported by the two types. addr_t supports all integral operations (though I
hope we don't use all of them), but I wouldn't expect to be able to do the
same with a SegmentedAddress. For one, I'd expect it wouldn't be possible
to add two SegmentedAddresses together (which is possible for addr_t).
OTOH, adding a SegmentedAddress and an addr_t would probably be fine?
Would subtracting two SegmentedAddresses should result in an addr_t? But
only if they have matching address spaces (and assert otherwise)?
- I'd also be worried about over-generalizing specialized code which can
afford to work with plain addresses, and where the added address space
would be a nuisance (or a source of bugs). E.g. ELF has no notion of address
space, so I don't think I'd find it helpful to replace all plain integer 
calculations
in elf parsing code with something more complex.
(I'm aware that some people are using elf to encode address space
information, but this is a pretty nonstandard extension, and it'd take more
than type substitution to support anything like that.)
- large scale refactorings are very much not the norm in llvm



On 19/10/2020 23:56, Jonas Devlieghere via lldb-dev wrote:

We want to support segmented address spaces in LLDB. Currently, all of
LLDB’s external API, command line interface, and internals assume that
an address in memory can be addressed unambiguously as an addr_t (aka
uint64_t). To support a segmented address space we’d need to extend
addr_t with a discriminator (an aspace_t) to uniquely identify a
location in memory. This RFC outlines what would need to change and
how we propose to do that.

### Addresses in LLDB

Currently, LLDB has two ways of representing an address:

   - Address object. Mostly represents addresses as Section+offset for
a binary image loaded in the Target. An Address in this form can
persist across executions, e.g. an address breakpoint in a binary
image that loads at a different address every execution. An Address
object can represent memory not mapped to a binary image. Heap, stack,
jitted items, will all be represented as the uint64_t load address of
the object, and cannot persist across multiple executions. You must
have the Target object available to get the current load address of an
Address object in the current process run. Some parts of lldb do not
have a Target available to them, so they require that the Address can
be devolved to an addr_t (aka uint64_t) and passed in.
   - The addr_t (aka uint64_t) type. Primarily used when receiving
input (e.g. from a user on the command line) or when interacting with
the inferior (reading/writing memory) for addresses that need not
persist across runs. Also used when reading DWARF and in our symbol
tables to represent file offset addresses, where the size of an
Address object would be objectionable.

## Proposal

### Address + ProcessAddress

   - The Address object gains a segment discriminator member variable.
Everything that creates an Address will need to provide this segment
discriminator.
   - A ProcessAddress object which is a uint64_t and a segment
discriminator as a replacement for addr_t. ProcessAddress objects
would not persist across multiple executions. Similar to how you can
create an addr_t from an Address+Target today, you can create a
ProcessAddress given an Address+Target. When we pass around addr_ts
today, they would be replaced with ProcessAddress, with the exception
of symbol tables where the added space would be significant, and we do
not believe we need segment discriminators today.

I'm strongly in favor of the first approach. The reason for that is that we have
a lot of code that can only reasonable deal with one kind of an address, and
I'd like to be able to express that in the type system. In fact, I think we 
could
have more distinct types even now, but adding address spaces makes that
even more important.

### Address Only

Extend the lldb_private::Address class to be the one representation of
locations; including file based ones valid before running, file
addresses resolved in a process, and process specific addresses
(heap/stack/JIT code) that are only valid during a run. That is
attractive because it would provide a uniform interface to any “where
is something” question you would ask, either about symbols in files,
variables in stack frames, etc.

At present, when we resolve a Section+Offset Address to a “load address”
we provide a Target to the resolution API.  Providing the Target
externally makes sense because a Target knows whether the Section is
present or not and can unambiguously return a load address.    We
could continue that approach since the Target always holds only one
process, or extend it to allow passing in a Process when resolving
non-file backed addresses.  But this would make the conversion from
addr_t uses to Address uses more difficult, since we will have to push
the Target or Process into all the API’s that make use of just an
addr_t.  Using a single Address class seems less attractive when you
have to provide an external entity to make sense of it at all the use sites.

We could improve this situation by including a Process (as a weak
pointer) and fill that in on the boundaries where in the current code
we go from an Address to a process specific addr_t.  That would make
the conversion easier, but add complexity.  Since Addresses are
ubiquitous, you won’t know what any given Address you’ve been handed
actually contains.  It could even have been resolved for another
process than the current one.  Making Address usage-dependent in this
way reduces the attractiveness of the solution.

## Approach

Replacing all the instances of addr_t by hand would be a lot of work.
Therefore we propose writing a clang-based tool to automate this
menial task. The tool would update function signatures and replace
uses of addr_t inside those functions to get the addr_t from the
ProcessAddress or Address and return the appropriate object for
functions that currently return an addr_t. The goal of this tool is to
generate one big NFC patch. This tool needs not be perfect, at some
point it will be more work to improve the tool than fixing up the remaining

code by hand.

After this patch LLDB would still not really understand address spaces
but it will have everything in place to support them.

Once all the APIs are updated, we can start working on the functional
changes. This means actually interpreting the aspace_t values and
making sure they don’t get dropped.

Finally, when all this work is done and we’re happy with the approach,
we extend the SB API with overloads for the functions that currently
take or return addr_t . I want to do this last so we have time to
iterate before committing to a stable interface.

## Testing

By splitting off the intrusive non-functional changes we are able to
rely on the existing tests for coverage. Smaller functional changes
can be tested in isolation, either through a unit test or a small GDB
remote test. For end-to-end testing we can run the test suite with a
modified debugserver that spoofs address spaces.

Thanks,
Jonas


_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


--
Zdenek Prikryl
CTO
T +420 541 141 475
Codasip.com

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB

Reply via email to