[lldb-dev] RFC: How to handle non-address bits in the output of "memory read"

2021-12-10 Thread David Spickett via lldb-dev
(Peter and Stephen on CC since you've previously asked about this sort of thing)

This relates to https://reviews.llvm.org/D103626 and other recent
patches about non-address bits.

On AArch64 we've got a few extensions that use "non address bits".
These are bits beyond the (in most cases) 48 bit virtual address size.
Currently we have pointer authentication (armv8.3), memory tagging
(armv8.5) and top byte ignore (a feature of armv8.0-a).

This means we need to know about these bits when doing some
operations. One such time is when passing addresses to memory read.
Consider two pointers to the same location where the first one has a
greater memory tag (bits 56-60) than the second. This is what happens
if we don't remove the non-address bits:
(lldb) memory read mte_buf_alt_tag mte_buf+16
error: end address (0x900f7ff8010) must be greater than the start
address (0xa00f7ff8000).

A pure number comparison is going to think that end < begin address.
If we use the ABI plugin's FixDataAddress we can remove those bits and
read normally.

With one caveat. The output will not include those non address bits
unless we make special effort to do so, here's an example:
(lldb) p ptr1
(char *) $4 = 0x3400f140 "\x80\xf1\xff\xff\xff\xff"
(lldb) p ptr2
(char *) $5 = 0x5600f140 "\x80\xf1\xff\xff\xff\xff"
(lldb) memory read ptr1 ptr2+16
0xf140: 80 f1 ff ff ff ff 00 00 38 70 bc f7 ff ff 00 00
8p..

My current opinion is that in this case the output should not include
the non address bits:
* The actual memory being read is not at the virtual address the raw
pointer value gives.
* Many, if not all, non address bits cannot be incremented as the
memory address we're showing is incremented. (not in a way that makes
sense if you think about how the core interprets them)

For example once you get into the next memory granule, the memory tag
attached to it in hardware may be different. (and FWIW I have a series
to show the actual memory tags https://reviews.llvm.org/D107140)
You could perhaps argue that if the program itself used that pointer,
it would use those non address bits as well so show the user *how* it
would access the memory. However I don't think that justifies
complicating the implementation and output.

So what do people think of that direction? I've thought about this for
too long before asking for feedback, so I'm definitely missing some of
the wood for the trees.

Input/bug reports/complaints from anyone who (unlike me) has debugged
a large program that uses these non-address features is most welcome!

Thanks,
David Spickett.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] No script in lldb of build

2021-12-06 Thread David Spickett via lldb-dev
Can you link to/provide the build commands you used? It will help in
the case this is not a simple issue.

> there is no embedded script interpreter in this mode.

Probably because it didn't find Python (and/or LUA but I don't have
experience with that). To find out why, try passing
"-DLLDB_ENABLE_PYTHON=ON" to the initial cmake command
(LLDB_ENABLE_LUA if you want LUA). It defaults to auto which means if
it doesn't find Python it'll silently continue, with "ON" it'll print
an error and stop.

There are others on the list who use MacOS who can hopefully help from there.

On Sun, 5 Dec 2021 at 20:02, Pi Pony via lldb-dev
 wrote:
>
> Hello,
>
> I build lldb for macOS and tried to get into script but I get this error 
> message: there is no embedded script interpreter in this mode.
>
> I appreciate any help you can provide
>
>
> ___
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Adding support for FreeBSD kernel coredumps (and live memory lookup)

2021-12-02 Thread David Spickett via lldb-dev
> 1. The (older) "full memory" coredumps that use an ELF container.
>
> 2. The (newer) minidumps that dump only the active memory and use
a custom format.

Maybe a silly question, is the "minidumps" here the same sort of
minidump as lldb already supports
(https://chromium.googlesource.com/breakpad/breakpad/+/master/docs/getting_started_with_breakpad.md#the-minidump-file-format)?
Or mini meaning small and/or sparse relative to the ELF container core
files.

I see that the minidump tests use yaml2obj to make their files, but if
you end up only needing 1 file and it would need changes to yaml2obj
probably not worth pursuing.

On Thu, 2 Dec 2021 at 13:38, Michał Górny  wrote:
>
> On Thu, 2021-12-02 at 11:50 +, David Spickett wrote:
> > > Right now, the idea is that when the kernel crashes, the developer can
> > > take the vmcore file use LLDB to look the kernel state up.
> >
> > Thanks for the explanation. (FWIW your first email is clear now that I
> > read it properly but this still helped me :))
> >
> > > 2) How to integrate "live kernel" support into the current user
> > > interface?  I don't think we should make major UI modifications to
> > > support this specific case but I'd also like to avoid gross hacks.
> >
> > Do you think it will always be one or the other, corefile or live
> > memory? I assume you wouldn't want to fall back to live memory because
> > that memory might not have been in use at the time of the core dump.
>
> Yes, it's always one or the other.  When you're debugging crashed
> kernel, you want to see the state of the crashed kernel and not
> the kernel that's running right now.
>
> Reading the memory of running kernel seems less useful but I've been
> told that it sometimes helps debugging non-crash kernel bugs.
>
> > But I'm thinking about debuggers where they use the ELF file as a
> > quicker way to read memory. Not sure if lldb does this already but you
> > could steal some ideas from there if so.
> >
> > Using /dev/mem as the path seems fine unless you do need some
> > combination of that and a corefile. Is /dev/mem format identical to
> > the corefile format? (probably not an issue anyway because the plugin
> > is what will decide how to use it)
>
> No, the formats are distinct (well, /dev/mem doesn't really have
> a container format, to be precise) but libkvm distinguishes this case
> and handles it specially.
>
> > Your plans B and C seem like they are enablement of the initial use
> > case but have limited scope for improvements. The gdb-remote wrapper
> > for example would work fine but would you hit issues where the current
> > FreeBSD plugin is making userspace assumptions? For example the
> > AArch64 Linux plugin assumes that addresses will be in certain ranges,
> > so if you connected it to an in kernel stub you'd probably get some
> > surprises.
> >
> > So I agree a new plugin would make the most sense. Only reason I'd be
> > against it is if it added significant maintenance or build issues but
> > I'm not aware of any. (beyond checking for some libraries and plenty
> > of bits of llvm do that) And it'll be able to give the best
> > experience.
>
> Well, my initial attempt turned out quite trivial, primarily because
> the external library does most of the work:
>
> https://reviews.llvm.org/D114911
>
> Right now it just supports reading memory and printing variables.
> I still need to extend it to recognize kernel threads through the memory
> dump, and then add support for grabbing registers out of that to get
> backtraces.
>
> > Do you have a plan to test this if it is an in tree plugin? Will the
> > corefiles take up a lot of space or would you be able to craft minimal
> > files just for testing?
>
> I have some ideas but I don't have small core files right now.  I need
> to write more code to determine what exactly is necessary, and then
> decide to pursue either:
>
> a. trying to build a minimal FreeBSD kernel and run it in a VM with
> minimal amount of RAM to get a small minicore
>
> b. trying to strip unnecessary data from real minicore
>
> c. trying to construct a minicore file directly
>
> But as I said, I don't have enough data to decide which route would
> involve the least amount of work.
>
> --
> Best regards,
> Michał Górny
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Adding support for FreeBSD kernel coredumps (and live memory lookup)

2021-12-02 Thread David Spickett via lldb-dev
> Right now, the idea is that when the kernel crashes, the developer can
> take the vmcore file use LLDB to look the kernel state up.

Thanks for the explanation. (FWIW your first email is clear now that I
read it properly but this still helped me :))

> 2) How to integrate "live kernel" support into the current user
> interface?  I don't think we should make major UI modifications to
> support this specific case but I'd also like to avoid gross hacks.

Do you think it will always be one or the other, corefile or live
memory? I assume you wouldn't want to fall back to live memory because
that memory might not have been in use at the time of the core dump.
But I'm thinking about debuggers where they use the ELF file as a
quicker way to read memory. Not sure if lldb does this already but you
could steal some ideas from there if so.

Using /dev/mem as the path seems fine unless you do need some
combination of that and a corefile. Is /dev/mem format identical to
the corefile format? (probably not an issue anyway because the plugin
is what will decide how to use it)

Your plans B and C seem like they are enablement of the initial use
case but have limited scope for improvements. The gdb-remote wrapper
for example would work fine but would you hit issues where the current
FreeBSD plugin is making userspace assumptions? For example the
AArch64 Linux plugin assumes that addresses will be in certain ranges,
so if you connected it to an in kernel stub you'd probably get some
surprises.

So I agree a new plugin would make the most sense. Only reason I'd be
against it is if it added significant maintenance or build issues but
I'm not aware of any. (beyond checking for some libraries and plenty
of bits of llvm do that) And it'll be able to give the best
experience.

Do you have a plan to test this if it is an in tree plugin? Will the
corefiles take up a lot of space or would you be able to craft minimal
files just for testing?

On Thu, 2 Dec 2021 at 10:03, Michał Górny  wrote:
>
> On Thu, 2021-12-02 at 09:40 +, David Spickett wrote:
> > Can you give an example workflow of how these core files are used by a
> > developer? For some background.
>
> Right now, the idea is that when the kernel crashes, the developer can
> take the vmcore file use LLDB to look the kernel state up.  Initially,
> this means reading the "raw" memory, i.e. looking up basic symbol values
> but eventually (like kGDB) we'd like to add basic support for looking up
> kernel thread states.
>
> > Most of my experience is in userspace, the corefile is "offline" debug
> > and then you have "live" debug of the running process. Is that the
> > same here or do we have a mix since you can access some of the live
> > memory after the core has been dumped?
>
> It's roughly the same, i.e. you either use a crash dump (i.e. saved
> kernel state) or you use /dev/mem to read memory from the running
> kernel.
>
> > I'm wondering if a FreeBSD Kernel plugin would support these corefiles
> > and/or live debug, or if they are just two halves of the same
> > solution. Basically, would you end up with a FreeBSDKernelCoreDump and
> > a FreeBSDKernelLive plugin?
>
> I think one plugin is the correct approach here.  Firstly, because
> the interface for reading memory is abstracted out to a single library
> and the API is the same for both cases.  Secondly, because the actual
> interpreting logic would also be shared.
>
> --
> Best regards,
> Michał Górny
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Adding support for FreeBSD kernel coredumps (and live memory lookup)

2021-12-02 Thread David Spickett via lldb-dev
Can you give an example workflow of how these core files are used by a
developer? For some background.

Most of my experience is in userspace, the corefile is "offline" debug
and then you have "live" debug of the running process. Is that the
same here or do we have a mix since you can access some of the live
memory after the core has been dumped?

I'm wondering if a FreeBSD Kernel plugin would support these corefiles
and/or live debug, or if they are just two halves of the same
solution. Basically, would you end up with a FreeBSDKernelCoreDump and
a FreeBSDKernelLive plugin?

On Tue, 30 Nov 2021 at 19:59, Michał Górny via lldb-dev
 wrote:
>
> Hi,
>
> I'm working on a FreeBSD-sponsored project aiming at improving LLDB's
> support for debugging FreeBSD kernel to achieve feature parity with
> KGDB.  As a part of that, I'd like to improve LLDB's ability of working
> with kernel coredumps ("vmcores"), plus add the ability to read kernel
> memory via special character device /dev/mem.
>
>
> The FreeBSD kernel supports two coredump formats that are of interest to
> us:
>
> 1. The (older) "full memory" coredumps that use an ELF container.
>
> 2. The (newer) minidumps that dump only the active memory and use
> a custom format.
>
> At this point, LLDB recognizes the ELF files but doesn't handle them
> correctly, and outright rejects the FreeBSD minidump format.  In both
> cases some additional logic is required.  This is because kernel
> coredumps contain physical contents of memory, and for user convenience
> the debugger needs to be able to read memory maps from the physical
> memory and use them to translate virtual addresses to physical
> addresses.
>
> Unless I'm mistaken, the rationale for using this format is that
> coredumps are -- after all -- usually created when something goes wrong
> with the kernel.  In that case, we want the process for dumping core to
> be as simple as possible, and coredumps need to be small enough to fit
> in swap space (that's where they're being usually written).
> The complexity of memory translation should then naturally fall into
> userspace processes used to debug them.
>
> FreeBSD (following Solaris and other BSDs) provides a helper libkvm
> library that can be used by userspace programs to access both coredumps
> and running kernel memory.  Additionally, we have split the routines
> related to coredumps and made them portable to other operating systems
> via libfbsdvmcore [1].  We have also included a program that can convert
> minidump into a debugger-compatible ELF core file.
>
>
> We'd like to discuss the possible approaches to integrating this
> additional functionality to LLDB.  At this point, our goal is to make it
> possible for LLDB to correctly read memory from coredumps and live
> system.
>
>
> Plan A: new FreeBSDKernel plugin
> 
> I think the preferable approach is to write a new plugin that would
> enable out-of-the-box support for the new functions in LLDB.  The plugin
> would be based on using both libraries.  When available, libfbsdvmcore
> will be used as the primary provider for vmcore support on all operating
> systems.  Additionally, libkvm will be usable on FreeBSD as a fallback
> provider for coredump support, and as the provider of live memory
> support.
>
> support using system-installed libfbsdvmcore to read coredumps and
> libkvm to read coredumps (as a fallback) and to read live memory.
>
> The two main challenges with this approach are:
>
> 1) "Full memory" vmcores are currently recognized by LLDB's elf-core
> plugin.  I haven't investigated LLDB's plugin architecture in detail yet
> but I think the cleanest solution here would be to teach elf-core to
> distinguish and reject FreeBSD vmcores, in order to have the new plugin
> handle them.
>
> 2) How to integrate "live kernel" support into the current user
> interface?  I don't think we should make major UI modifications to
> support this specific case but I'd also like to avoid gross hacks.
> My initial thought is to allow specifying "/dev/mem" as core path, that
> would match how libkvm handles it.
>
> Nevertheless, I think this is the cleanest approach and I think we
> should go with it if possible.
>
>
> Plan B: GDB Remote Protocol-based wrapper
> =
> If we cannot integrate FreeBSD vmcore support into LLDB directly,
> I think the next best approach is to create a minimal GDB Remote
> Protocol server for it.  The rough idea is that the server implements
> the minimal subset of the protocol necessary for LLDB to connect,
> and implements memory read operations via the aforementioned libraries.
>
> The advantage of this solution is that it is still relatively clean
> and can be implemented outside LLDB.  It still provides quite good
> performance but probably requires more work than the alternatives
> and does not provide out-of-box support in LLDB.
>
>
> Plan C: converting vmcores
> ==
> Our final optio

Re: [lldb-dev] [RFC] lldb integration with (user mode) qemu

2021-11-08 Thread David Spickett via lldb-dev
> I actually did consider this, but it was not clear to me how this would tie 
> in to the rest of lldb.
> The "run qemu and connect to it" part could be reused, of course, but what 
> else?

That part seems like a good start. I'm sure a lot of other things
would break/not work like you said but if I was shipping a modified
lldb anyway maybe I'd put the effort in to make it work nicely.

Again not something this work needs to consider. Just me relating the
idea to something I have more experience with and has some parallels
with the qemu-user idea.

On Fri, 5 Nov 2021 at 14:08, Pavel Labath via lldb-dev
 wrote:
>
> On 04/11/2021 22:46, Jessica Clarke via lldb-dev wrote:
> > On Fri, Oct 29, 2021 at 05:55:02AM +, David Spickett via lldb-dev wrote:
> >>> I don't think it does. Or at least I'm not sure how do you propose to 
> >>> solve them (who is "you" in the paragraph above?).
> >>
> >> I tend to use "you" meaning "you or I" in hypotheticals. Same thing as
> >> "if I had" but for whatever reason I phrase it like that to include
> >> the other person, and it does have its ambiguities.
> >>
> >> What I was proposing is, if I was correct (which I wasn't) then having
> >> the user "platform select qemu-user" would solve things. (which it
> >> doesn't)
> >>
> >>> What currently happens is that when you open a non-native (say, linux) 
> >>> executable, the appropriate remote platform gets selected automatically.
> >>
> >> ...because of this. I see where the blocker is now. I thought remote
> >> platforms had to be selected before they could claim.
> >>
> >>> If we do have a prompt, then this may not be so critical, though I expect 
> >>> that most users would still prefer it we automatically selected qemu.
> >>
> >> Seems reasonable to put qemu-user above remote-linux. Only claiming if
> >> qemu-user has been configured sufficiently. I guess architecture would
> >> be the minimum setting, given we can't find the qemu binary without
> >> it.
> >>
> >> Is this similar in any way to how the different OS remote platforms
> >> work? For example there is a remote-linux and a remote-netbsd, is
> >> there enough information in the program file itself to pick just one
> >> or is there an implicit default there too?
> >> (I see that platform CreateInstance gets an ArchSpec but having
> >> trouble finding where that comes from)
> >
> > Please make sure you don't forget that bsd-user also exists (and after
> > living in a fork for many years for various boring reasons is in the
> > middle of being upstreamed), so don't tie it entirely to remote-linux.
> >
>
> I am. In fact one of the reason's I haven't started putting up patches
> yet is because I'm trying to figure out the best way to handle this. :)
>
> My understanding is (let me know if I'm wrong) is that user-mode qemu
> can emulate a different arhitecture, but not a different os. So, the
> idea is that the "qemu" platform would forward all operations that don't
> need special handling to the "host" platform. That would mean you get
> freebsd behavior when running on freebsd, etc.
>
> pl
> ___
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [RFC] lldb integration with (user mode) qemu

2021-11-03 Thread David Spickett via lldb-dev
> Yeah, I think we can start with that.

No need to consider this now but it could easily be adapted to
qemu-system as well. Spinning up qemu-system for Cortex-M debug might
be a future use case. Once you've got a "run this program and connect
to this port" platform you can sub in almost anything that talks GDB.

> Having some mechanism to resolve ambiguities might also help with that.

Cool, I figured someone would have thought about it on the ELF side.
So as long as Linux remains the standout things work ok.

Most importantly, the way it's currently handled doesn't contradict
anything you want to do here.

On Wed, 3 Nov 2021 at 10:34, Pavel Labath  wrote:
>
> On 29/10/2021 14:55, David Spickett wrote:
> >> I don't think it does. Or at least I'm not sure how do you propose to 
> >> solve them (who is "you" in the paragraph above?).
> >
> > I tend to use "you" meaning "you or I" in hypotheticals. Same thing as
> > "if I had" but for whatever reason I phrase it like that to include
> > the other person, and it does have its ambiguities.
> >
> > What I was proposing is, if I was correct (which I wasn't) then having
> > the user "platform select qemu-user" would solve things. (which it
> > doesn't)
> Great, thanks for clarifying.
>
> >> If we do have a prompt, then this may not be so critical, though I expect 
> >> that most users would still prefer it we automatically selected qemu.
> >
> > Seems reasonable to put qemu-user above remote-linux. Only claiming if
> > qemu-user has been configured sufficiently. I guess architecture would
> > be the minimum setting, given we can't find the qemu binary without
> > it.
> Yeah, I think we can start with that.
>
> > Is this similar in any way to how the different OS remote platforms
> > work? For example there is a remote-linux and a remote-netbsd, is
> > there enough information in the program file itself to pick just one
> > or is there an implicit default there too?
> This is actually one of the pain points in lldb. The overall design
> assumes that you can precisely identify the platform(triple) that the
> file is meant to be run on by looking at the object file. This is
> definitely true on Apple platforms (where lldb originated) as even the
> "simulator" binaries have their own triples.
>
> The situation is more fuzzy in the elf world. TTe *bsd oses have (and
> use) a ELFOSABI_ constant to identify the binary. Linux uses
> ELFOSABI_NONE even though there is a dedicated constant it could use
> (there's probably an interesting story in there). This makes it hard to
> positively identify a file as a linux binary, but we can mostly get away
> with it because there's just one OS like that. Having some mechanism to
> resolve ambiguities might also help with that.
>
> I'm also not sure how much do the OSes actually validate the contents of
> the elf headers. I wouldn't be surprised if one could create "polyglot"
> elf binaries that can run on multiple operating systems.
>
> > (I see that platform CreateInstance gets an ArchSpec but having
> > trouble finding where that comes from)
> It gets called from
> TargetList::CreateTargetInternal->Platform::CreateTargetForArchitecture->Platform::Create.
> There may be other callers, but I think this is the relevant one.
>
> pl
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [RFC] lldb integration with (user mode) qemu

2021-10-29 Thread David Spickett via lldb-dev
> I don't think it does. Or at least I'm not sure how do you propose to solve 
> them (who is "you" in the paragraph above?).

I tend to use "you" meaning "you or I" in hypotheticals. Same thing as
"if I had" but for whatever reason I phrase it like that to include
the other person, and it does have its ambiguities.

What I was proposing is, if I was correct (which I wasn't) then having
the user "platform select qemu-user" would solve things. (which it
doesn't)

> What currently happens is that when you open a non-native (say, linux) 
> executable, the appropriate remote platform gets selected automatically.

...because of this. I see where the blocker is now. I thought remote
platforms had to be selected before they could claim.

> If we do have a prompt, then this may not be so critical, though I expect 
> that most users would still prefer it we automatically selected qemu.

Seems reasonable to put qemu-user above remote-linux. Only claiming if
qemu-user has been configured sufficiently. I guess architecture would
be the minimum setting, given we can't find the qemu binary without
it.

Is this similar in any way to how the different OS remote platforms
work? For example there is a remote-linux and a remote-netbsd, is
there enough information in the program file itself to pick just one
or is there an implicit default there too?
(I see that platform CreateInstance gets an ArchSpec but having
trouble finding where that comes from)

On Fri, 29 Oct 2021 at 13:10, Pavel Labath  wrote:
>
> On 29/10/2021 14:00, Pavel Labath via lldb-dev wrote:
> > On 29/10/2021 12:39, David Spickett wrote:
> >>> So there wouldn't be a three-way tie, but if you actually wanted to
> >>> debug a native executable under qemu, you would have to explicitly
> >>> select the qemu platform. This is the same thing that already happens
> >>> when you want to debug a native executable remotely, but there it's
> >>> kind of expected because you need to connect to the remote machine
> >>> anyway.
> >>
> >> Since we already have the host vs remote with native arch situation,
> >> is it any different to ask users to do "platform select qemu-user" if
> >> they really want qemu-user? Preferring host to qemu-user seems
> >> logical.
> > It does. I am perfectly fine with preferring host over qemu-user.
> >
> >> For non native it would come up when you're currently connected to a
> >> remote but want qemu-user on the host. So again you explicitly select
> >> qemu-user.
> >>
> >> Does that solve all the ambiguous situations?
> > I don't think it does. Or at least I'm not sure how do you propose to
> > solve them (who is "you" in the paragraph above?).
> >
> > What currently happens is that when you open a non-native (say, linux)
> > executable, the appropriate remote platform gets selected automatically.
> > $ lldb aarch64/bin/lldb
> > (lldb) target create "aarch64/bin/lldb"
> > Current executable set to 'aarch64/bin/lldb' (aarch64).
> > (lldb) platform status
> >Platform: remote-linux
> >   Connected: no
> >
> > That happens because the remote-linux platform unconditionally claims
> > the non-native executables (well.. it claims all of them, but it is
> > overridden by the host platform for native ones). It does not check
> > whether it is connected or anything like that.
> >
> > And I think that behavior is fine, because for a lot of actions you
> > don't actually need to connect to anything. For example, you usually
> > don't connect anywhere when inspecting core files (though you can do
> > that, and it would mean lldb can download relevant shared libraries).
> > And you can always connect at a later time, if needed.
> >
> > Now the question is what should the new platform do. If it followed the
> > remote-linux pattern, it would also claim those executables
> > unconditionally, we would always have a conflict (*).
>
> I meant to add an explanation for this asterisk. I was going to say that
> in the current setup, I believe we would just choose whichever platform
> comes first (which is the first platform to get initialized), but that
> is not that great -- ideally, our behavior should not depend on the
> initialization order.
>
> >
> > Or, it can try to be a bit less greedy and claim an executable only when
> > it is configured. That would mean that in a clean state, everything
> > would behave as it. However, the conflict would reappear as soon as the
> > platform is configured (which will be always, for our users). The idea
> > behind this (sub)feature was that there would be a way to configure lldb
> > so that the qemu plugin comes out on top (of remote-linux, not host).
> >
> > If we do have a prompt, then this may not be so critical, though I
> > expect that most users would still prefer it we  automatically selected
> > qemu.
>
> I also realized that implementing the prompt for the case where the
> executable is specified on the command line will be a bit tricky,
> because at that lldb hasn't gone interactive yet. I don't think there'

Re: [lldb-dev] [RFC] lldb integration with (user mode) qemu

2021-10-29 Thread David Spickett via lldb-dev
> So there wouldn't be a three-way tie, but if you actually wanted to debug a 
> native executable under qemu, you would have to explicitly select the qemu 
> platform. This is the same thing that already happens when you want to debug 
> a native executable remotely, but there it's kind of expected because you 
> need to connect to the remote machine anyway.

Since we already have the host vs remote with native arch situation,
is it any different to ask users to do "platform select qemu-user" if
they really want qemu-user? Preferring host to qemu-user seems
logical.
For non native it would come up when you're currently connected to a
remote but want qemu-user on the host. So again you explicitly select
qemu-user.

Does that solve all the ambiguous situations?

> Do you mean like, each platform would advertise its kind 
> (host/emulator/remote), and the relative kind priorities would be hardcoded 
> in lldb?

Yes. Though I think that opens more issues than it solves. Host being
higher priority than everything else seems ok. Then you have to think
about how many emulation/connection hops each one has, but sometimes
that's not the metric that matters. E.g. an armv7 file on a Mac would
make more sense going to an Apple Watch simulator than qemu-user.

> Yes, those were my thoughts as well, but I am unsure how often would that 
> occur in practice (I'm pretty sure I'll need to care for only one arch for my 
> use case).

Seems like starting with a single "qemu-user" platform is the way to
go for now. When it's not configured it just won't be able to claim
anything.

The hypothetical I had was shipping a development kit that included
qemu-arch1 and qemu-arch2. Would you rather ship one init file that
can set all those settings at once (since each one has its own
namespace) or symlink lldb-arch1 to be "lldb -s ". However anyone who's looking at shipping lldb has control
of the sources so they could make their own platform entries. Or
choose a command line based on an IDE setting.

On Fri, 29 Oct 2021 at 10:13, Pavel Labath  wrote:
>
> Thanks for reading this. Responses inline.
>
> On 28/10/2021 16:28, David Spickett wrote:
> > Glad to hear the gdb server in qemu plays nicely with lldb. Perhaps
> > some of that is the compatibility work that has been going on.
> >
> >> The introduction of a qemu platform would introduce such an ambiguity, 
> >> since (when running on a linux host) a linux executable would be claimed 
> >> by both the qemu plugin and the existing remote-linux platform. This would 
> >> prevent "target create arm-linux.exe" from working out-of-the-box.
> >
> > I assume you wouldn't get a 3 way tie here because in connecting to a
> > remote-linux you've "disconnected" the host platform, right?
> IIUC, the host platform is not consulted at this step. It can only be
> claim an executable when it is selected as the "current" platform,
> because the current platform is consulted first. (And this is what
> happens in most "normal" debug sessions.)
>
> So there wouldn't be a three-way tie, but if you actually wanted to
> debug a native executable under qemu, you would have to explicitly
> select the qemu platform. This is the same thing that already happens
> when you want to debug a native executable remotely, but there it's kind
> of expected because you need to connect to the remote machine anyway.
>
> >
> >> To resolve this, I'd like to create some kind of a mechanism to give 
> >> preference to some plugin.
> >
> > This choosing of plugin, does it mostly take place automatically at
> > the moment or is there a good spot where we could say "X and Y could
> > load this file, please choose one/resolve the tie"?
> This currently happens in TargetList::CreateTargetInternal, and one
> cannot create a prompt there, as that code is also used by the
> non-interactive paths (SBDebugger::CreateTarget, for instance). But I
> like the idea, and it may not be too difficult to refactor this to make
> that work. (I am imagining changing this code to use llvm::Error, and
> then creating a special AmbiguousPlatformError type, which could get
> caught by the command line code and transformed into a prompt.)
>
> >
> > My first thought for automatic resolve is a native/emulator/remote
> > sort of hierarchy if you were going to order them. (with some nice
> > message "preferring X to Y because..." when it starts up)
> Do you mean like, each platform would advertise its kind
> (host/emulator/remote), and the relative kind priorities would be
> hardcoded in lldb?
>
> >
> >> a) have just a single set of settings, effectively limiting the user to 
> >> emulating just a single architecture per session. While it would most 
> >> likely be enough for most use cases, this kind of limitation seems 
> >> artificial.
> >
> > One aspect here is the way you configure them if you want to use many
> > architectures of qemu-user.
> >
> > If I have only one platform, I set qemu-user.foo to some Arm focused
> > value. Then if I want to wor

Re: [lldb-dev] [RFC] lldb integration with (user mode) qemu

2021-10-28 Thread David Spickett via lldb-dev
Glad to hear the gdb server in qemu plays nicely with lldb. Perhaps
some of that is the compatibility work that has been going on.

> The introduction of a qemu platform would introduce such an ambiguity, since 
> (when running on a linux host) a linux executable would be claimed by both 
> the qemu plugin and the existing remote-linux platform. This would prevent 
> "target create arm-linux.exe" from working out-of-the-box.

I assume you wouldn't get a 3 way tie here because in connecting to a
remote-linux you've "disconnected" the host platform, right?

> To resolve this, I'd like to create some kind of a mechanism to give 
> preference to some plugin.

This choosing of plugin, does it mostly take place automatically at
the moment or is there a good spot where we could say "X and Y could
load this file, please choose one/resolve the tie"?

My first thought for automatic resolve is a native/emulator/remote
sort of hierarchy if you were going to order them. (with some nice
message "preferring X to Y because..." when it starts up)

> a) have just a single set of settings, effectively limiting the user to 
> emulating just a single architecture per session. While it would most likely 
> be enough for most use cases, this kind of limitation seems artificial.

One aspect here is the way you configure them if you want to use many
architectures of qemu-user.

If I have only one platform, I set qemu-user.foo to some Arm focused
value. Then if I want to work on AArch64 I edit my lldbinit to switch
it. (or have many init files)
If there's one platform per arch I can set qemu-arm.foo and qemu-aarch64.foo.

Not much between them without having a specific use case for it. You
could work around either in various ways.

Wouldn't most of the platform entries just be subclasses of some
generic qemu-user-platform? So code wise it wouldn't be that much
extra to add them.
You could say it's bad to list qemu-xyz-platform when that isn't
installed, but then again, lldb lists a "local Mac OSX user platform
plug in" even on Linux. So not a big deal.
(and an apt install of qemu-user gives me every arch so easy to fix)

And you have to handle the ambiguity issue either way.


On Thu, 28 Oct 2021 at 14:33, Pavel Labath via lldb-dev
 wrote:
>
> Hello everyone,
>
> I'd like to propose a new plugin for better lldb+qemu integration.
>
> As you're probably aware qemu has an integrated gdb stub. Lldb is able
> to communicate with it, but currently this process is somewhat tedious.
> One has to manually start qemu, giving it a port number, and then
> separately start lldb, and have it connect to that port.
>
> The chief purpose of this feature would be to automate this behavior,
> ideally to the point where one can just point lldb to an executable,
> type "run", and everything would just work. It would take the form of a
> platform plugin (PlatformQemuUser, perhaps). This would be a non-host,
> always-connected plugin, and it's heart would be the DebugProcess
> method, which would ensure the emulator gets started when the user wants
> to start debugging. It would operate the same way as our host platforms
> do, except that it would start qemu instead of debug/lldb-server. Most
> of the other methods would be implemented by delegating to the host
> platform (as the process will be running on the host), possibly with
> some minor adjustments like prepending sysroot to the paths, etc. (My
> initial proof-of-concept implementation was 200 LOC.)
>
> The plugin would be configured via multiple settings, which would let
> the user specify, the path to the emulator, the kind of cpu it should
> emulate and the path to the system libraries, and any other arguments
> that the user wishes to pass to the emulator. The user could then
> configure it in their lldbinit file to match their system setup.
>
> The needs of this plugin should match the existing Platform abstraction
> fairly well, so I don't anticipate (*) the need to add new entry points
> or modify existing ones. There is one tricky aspect which I see, and it
> relates to platform selection. Our current platform selection code gives
> each platform instance (while preferring the current platform) a chance
> to "claim" an executable, and aborts if the choice is ambiguous. The
> introduction of a qemu platform would introduce such an ambiguity, since
> (when running on a linux host) a linux executable would be claimed by
> both the qemu plugin and the existing remote-linux platform. This would
> prevent "target create arm-linux.exe" from working out-of-the-box.
>
> To resolve this, I'd like to create some kind of a mechanism to give
> preference to some plugin. This could either be something internal,
> where a plugin indicates "strong" preference for an executable (the qemu
> platform could e.g. do this when the user sets the emulator path, the
> remote platform when it is connected), or some external mechanism like a
> global setting giving the preferred platform order. I'd very much like
> 

Re: [lldb-dev] RFC: AArch64 Linux Memory Tagging Support for LLDB

2021-06-21 Thread David Spickett via lldb-dev
Hi all, the first series of changes for MTE has been in review
phabricator for a while now. They have all bar one been approved by
Omair Javaid and we'd like to start landing them with a view towards
having MTE support for llvm 13. (there is another series waiting in
the wings for tag writing)

Not calling out anyone in particular who I might have added to any of
the reviews, we're all busy folk. I'm just aware that memory tagging
is an AArch64 Linux only feature right now and Omair and myself are
both Linaro so I want to give others a chance to comment on the
changes from a general lldb perspective.

You can the changes here:
https://reviews.llvm.org/D97281
https://reviews.llvm.org/D97282
https://reviews.llvm.org/D95601
https://reviews.llvm.org/D95602
https://reviews.llvm.org/D97285

Even if you are not familiar with memory tagging, if something
generally sticks out, let me know. If I don't receive any request for
changes in the next few days I'll start landing them.

If you want to peek at the changes that aren't in review yet you can
do so here: https://github.com/DavidSpickett/llvm-project/commits/mte_commands

On Tue, 18 Aug 2020 at 11:09, David Spickett  wrote:
>
> > The initial idea of commands like "memory showptrtag", "memory showtag", 
> > "memory checktag" - it might be better to put all of these under "memory 
> > tag ...", similar to how "breakpoint command ..." works.
>
> Sounds good to me, I didn't know there was a 3 level command in there
> already. The names get a bit redundant since "memory tag set" doesn't
> tell you which one of the pair it's setting. So we could have "memory
> tag setptrtag" "memory tag setmemorytag", or make "set" one command
> with variable arguments:
> Set logical tag: memory tag set  
> Set logical and allocation: memory tag set  
>  
> Set only allocation: memory tag set  --only-memory  
> (which I think is a bit neater)
>
> Where "pointer tag" and "memory tag" were the best generic names for
> "logical" and "allocation" I came up with. (think of it like the
> memory tag is attached to the memory, pointer tag is attached to a
> pointer)
> Also "memory tag check" can be removed since it's just "memory tag
> show" with a warning on mismatch.
>
> > My general design is that the Process object will keep track of the # of 
> > bits used for virtual addresses.
>
> I hadn't considered this issue thanks for bringing it up. Your scheme
> seems reasonable to me. I see that "addressing_bits" is in the
> upstream qHostInfo but only in the RNBRemote, does that mean that
> upstream already uses this in some way? (presumably just for Apple
> platforms?)
>
> > I am working on a kernel patch which will make this information available 
> > via siginfo, and once the tag becomes available from the kernel you 
> > shouldn't need to decode the instruction.
>
> Great! I'll keep an eye on it.
>
> On Fri, 14 Aug 2020 at 02:40, Peter Collingbourne  wrote:
> >
> >
> >
> > On Mon, Aug 10, 2020 at 3:41 AM David Spickett via lldb-dev 
> >  wrote:
> >>
> >> Hi all,
> >>
> >> What follows is my proposal for supporting AArch64's memory tagging
> >> extension in LLDB. I think the link in the first paragraph is a good
> >> introduction if you haven't come across memory tagging before.
> >>
> >> I've also put the document in a Google Doc if that's easier for you to
> >> read: 
> >> https://docs.google.com/document/d/13oRtTujCrWOS_2RSciYoaBPNPgxIvTF2qyOfhhUTj1U/edit?usp=sharing
> >> (please keep comments to this list though)
> >>
> >> Any and all comments welcome. Particularly I would like opinions on
> >> the naming of the commands, as this extension is AArch64 specific but
> >> the concept of memory tagging itself is not.
> >> (I've added some people on Cc who might have particular interest)
> >>
> >> Thanks,
> >> David Spickett.
> >>
> >> 
> >>
> >> # RFC: AArch64 Linux Memory Tagging Support for LLDB
> >>
> >> ## What is memory tagging?
> >>
> >> Memory tagging is an extension added in the Armv8.5-a architecture for 
> >> AArch64.
> >> It allows tagging pointers and storing those tags so that hardware can 
> >> validate
> >> that a pointer matches the memory address it is trying to access. These 
> >> paired
> >> tags are stored in the upper bits of the pointer (the “logical” tag) and in

Re: [lldb-dev] Accepted GSoC project for working on the LLDB GUI

2021-05-21 Thread David Spickett via lldb-dev
Hi Omar, welcome!

"Embedded Command Line Interface" is something I miss compared to GDB
so I'm looking forward to seeing what you come up with.

On Fri, 21 May 2021 at 09:31, Omar Emara via lldb-dev
 wrote:
>
> Hi everyone,
>
> I was asked to share my accepted GSoC project in the appropriate mailing list.
>
> As you may know, my project titled "Evolving the LLDB GUI" was accepted into
> GSoC this year. The project aims to improve the LLDB curses GUI to provide a
> complete and intuitive IDE-like debugging experience without having to resort 
> to
> the command line interface.
>
> You can check the project proposal here:
>
> https://docs.google.com/document/d/1pgiouts7pN2jM8iuXBFokQ_vTGTSKMgNGvRg2-px4r4/edit?usp=sharing
>
> I look forward to working on LLDB with you!
> ___
> lldb-dev mailing list
> lldb-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Updating or removing lldb's copy of unittest2

2021-01-29 Thread David Spickett via lldb-dev
Thanks for the info.

If I get some time soon I can at least see what the current results
are with that updated module. If not I'll pitch in sometime after the
13 branch.

On Thu, 28 Jan 2021 at 17:52, Jonas Devlieghere  wrote:
>
> Hey David,
>
> On Thu, Jan 28, 2021 at 2:46 AM David Spickett via lldb-dev 
>  wrote:
>>
>> I came across a minor bug writing some lldb-server tests where single
>> line diffs weren't handled correctly by unittest2. Turns out they are
>> in the latest version but the third_party/ version is older than that.
>>
>> https://bugs.python.org/issue9174
>> https://hg.python.org/unittest2/rev/96e432563d53 (though I think the
>> commit title is a mistake)
>>
>> So I thought of cherry picking that one thing (assuming licensing
>> would allow me to), or even updating the whole copy (a lot of churn
>> for a single fix). Then I remembered that llvm in general has been
>> moving to Python3.
>
>
> I made an attempt to update the vendored unittest2 module in the past [1]. I 
> diffed our vendored version with the release it was based on, updated the 
> module and re-applied the changes. That was the easy part. The more intrusive 
> part is that the testing framework changed the way it deals with expected 
> failures. The old version used exceptions, while the new framework only looks 
> at asserts that fail. I don't remember the details, but we are relying on 
> that mechanism somehow and as a result a bunch of test failed. The good thing 
> is that this uncovered a few tests that were XFAILed but were really failing 
> for unrelated reasons (i.e. Python throwing an exception because the test was 
> plain wrong, rather than an assertion triggering or what was being tested not 
> working). Anyway, hopefully this isn't too much work, but at the time 
> something more important came up and I haven't had time to look at this again 
> since.
>
>>
>> Looking at https://lldb.llvm.org/resources/build.html it doesn't
>> explicitly say Python2 isn't supported, but only Python3 is mentioned
>> so I assume lldb is now Python3 only.
>
>
> LLVM dropped support for Python 2 at the beginning of this year [2]. For LLDB 
> specifically, I've asked for a bit more time before we start making "Python 2 
> incompatible" changes [3] as we still have to maintain Python 2 support 
> internally. We're actively working to drop that requirement.
>
>>
>> If that is correct, is it worth me investigating using Python3's built
>> in unittest module instead, and removing our copy of unittest2?
>
>
> I'm in favor of dropping a vendored dependency, assuming of course we can get 
> rid of the modification we rely on today. If we go that route I want to ask 
> to land this after the 13 release is branched.
>
> Cheers,
> Jonas
>
> [1] 
> https://github.com/JDevlieghere/llvm-project/tree/update-vendored-unittest2
> [2] https://lists.llvm.org/pipermail/llvm-dev/2020-December/147372.html
> [3] https://lists.llvm.org/pipermail/lldb-dev/2020-August/016388.html
>
>>
>> Thanks,
>> David Spickett.
>> ___
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Updating or removing lldb's copy of unittest2

2021-01-28 Thread David Spickett via lldb-dev
I came across a minor bug writing some lldb-server tests where single
line diffs weren't handled correctly by unittest2. Turns out they are
in the latest version but the third_party/ version is older than that.

https://bugs.python.org/issue9174
https://hg.python.org/unittest2/rev/96e432563d53 (though I think the
commit title is a mistake)

So I thought of cherry picking that one thing (assuming licensing
would allow me to), or even updating the whole copy (a lot of churn
for a single fix). Then I remembered that llvm in general has been
moving to Python3.

Looking at https://lldb.llvm.org/resources/build.html it doesn't
explicitly say Python2 isn't supported, but only Python3 is mentioned
so I assume lldb is now Python3 only.

If that is correct, is it worth me investigating using Python3's built
in unittest module instead, and removing our copy of unittest2?

Thanks,
David Spickett.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] RFC: AArch64 Linux Memory Tagging Support for LLDB

2020-08-18 Thread David Spickett via lldb-dev
> The initial idea of commands like "memory showptrtag", "memory showtag", 
> "memory checktag" - it might be better to put all of these under "memory tag 
> ...", similar to how "breakpoint command ..." works.

Sounds good to me, I didn't know there was a 3 level command in there
already. The names get a bit redundant since "memory tag set" doesn't
tell you which one of the pair it's setting. So we could have "memory
tag setptrtag" "memory tag setmemorytag", or make "set" one command
with variable arguments:
Set logical tag: memory tag set  
Set logical and allocation: memory tag set  
 
Set only allocation: memory tag set  --only-memory  
(which I think is a bit neater)

Where "pointer tag" and "memory tag" were the best generic names for
"logical" and "allocation" I came up with. (think of it like the
memory tag is attached to the memory, pointer tag is attached to a
pointer)
Also "memory tag check" can be removed since it's just "memory tag
show" with a warning on mismatch.

> My general design is that the Process object will keep track of the # of bits 
> used for virtual addresses.

I hadn't considered this issue thanks for bringing it up. Your scheme
seems reasonable to me. I see that "addressing_bits" is in the
upstream qHostInfo but only in the RNBRemote, does that mean that
upstream already uses this in some way? (presumably just for Apple
platforms?)

> I am working on a kernel patch which will make this information available via 
> siginfo, and once the tag becomes available from the kernel you shouldn't 
> need to decode the instruction.

Great! I'll keep an eye on it.

On Fri, 14 Aug 2020 at 02:40, Peter Collingbourne  wrote:
>
>
>
> On Mon, Aug 10, 2020 at 3:41 AM David Spickett via lldb-dev 
>  wrote:
>>
>> Hi all,
>>
>> What follows is my proposal for supporting AArch64's memory tagging
>> extension in LLDB. I think the link in the first paragraph is a good
>> introduction if you haven't come across memory tagging before.
>>
>> I've also put the document in a Google Doc if that's easier for you to
>> read: 
>> https://docs.google.com/document/d/13oRtTujCrWOS_2RSciYoaBPNPgxIvTF2qyOfhhUTj1U/edit?usp=sharing
>> (please keep comments to this list though)
>>
>> Any and all comments welcome. Particularly I would like opinions on
>> the naming of the commands, as this extension is AArch64 specific but
>> the concept of memory tagging itself is not.
>> (I've added some people on Cc who might have particular interest)
>>
>> Thanks,
>> David Spickett.
>>
>> 
>>
>> # RFC: AArch64 Linux Memory Tagging Support for LLDB
>>
>> ## What is memory tagging?
>>
>> Memory tagging is an extension added in the Armv8.5-a architecture for 
>> AArch64.
>> It allows tagging pointers and storing those tags so that hardware can 
>> validate
>> that a pointer matches the memory address it is trying to access. These 
>> paired
>> tags are stored in the upper bits of the pointer (the “logical” tag) and in
>> special memory in hardware (the “allocation” tag). Each tag is 4 bits in 
>> size.
>>
>> https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety
>>
>> ## Definitions
>>
>> * memtag - This is the clang name for the extension as in
>> “-march=armv8.5-a+memtag”
>> * mte - An alternative name for mmtag, also the llvm backend name for
>> the extension.
>>   This document may use memtag/memory tagging/MTE at times, they mean
>> the same thing.
>> * logical tag - The tag stored inside a pointer variable (accessible
>> via normal shift and mask)
>> * allocation tag - The tag stored in tag memory (which the hardware provides)
>>   for a particular tag granule
>> * tag granule - The amount of memory that a single tag applies to,
>> which is 16 bytes.
>>
>> ## Existing Tool Support
>>
>> * GCC/Clang can generate MTE instructions
>> * Clang has an option to memory tag the stack (discussed later)
>> * QEMU support has been merged
>> * Linux Kernel patches are in progress
>>   (git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
>> “devel/mte-v5” branch)
>> * GDB support is in review and this design takes a lot of direction from that
>>   
>> (https://sourceware.org/git/?p=binutils-gdb.git;a=shortlog;h=refs/heads/users/luisgpm/aarch64-mte-v2)
>>   (originally proposed
>> https://sourceware.org/pipermail/gdb-patches/2019-August/

[lldb-dev] RFC: AArch64 Linux Memory Tagging Support for LLDB

2020-08-10 Thread David Spickett via lldb-dev
Hi all,

What follows is my proposal for supporting AArch64's memory tagging
extension in LLDB. I think the link in the first paragraph is a good
introduction if you haven't come across memory tagging before.

I've also put the document in a Google Doc if that's easier for you to
read: 
https://docs.google.com/document/d/13oRtTujCrWOS_2RSciYoaBPNPgxIvTF2qyOfhhUTj1U/edit?usp=sharing
(please keep comments to this list though)

Any and all comments welcome. Particularly I would like opinions on
the naming of the commands, as this extension is AArch64 specific but
the concept of memory tagging itself is not.
(I've added some people on Cc who might have particular interest)

Thanks,
David Spickett.



# RFC: AArch64 Linux Memory Tagging Support for LLDB

## What is memory tagging?

Memory tagging is an extension added in the Armv8.5-a architecture for AArch64.
It allows tagging pointers and storing those tags so that hardware can validate
that a pointer matches the memory address it is trying to access. These paired
tags are stored in the upper bits of the pointer (the “logical” tag) and in
special memory in hardware (the “allocation” tag). Each tag is 4 bits in size.

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety

## Definitions

* memtag - This is the clang name for the extension as in
“-march=armv8.5-a+memtag”
* mte - An alternative name for mmtag, also the llvm backend name for
the extension.
  This document may use memtag/memory tagging/MTE at times, they mean
the same thing.
* logical tag - The tag stored inside a pointer variable (accessible
via normal shift and mask)
* allocation tag - The tag stored in tag memory (which the hardware provides)
  for a particular tag granule
* tag granule - The amount of memory that a single tag applies to,
which is 16 bytes.

## Existing Tool Support

* GCC/Clang can generate MTE instructions
* Clang has an option to memory tag the stack (discussed later)
* QEMU support has been merged
* Linux Kernel patches are in progress
  (git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
“devel/mte-v5” branch)
* GDB support is in review and this design takes a lot of direction from that
  
(https://sourceware.org/git/?p=binutils-gdb.git;a=shortlog;h=refs/heads/users/luisgpm/aarch64-mte-v2)
  (originally proposed
https://sourceware.org/pipermail/gdb-patches/2019-August/159881.html)

## New lldb features

Assuming your software is acting correctly, memory tagging can “just work”
without debugger support. This assumes the compiler/toolchain/user are
always correct.

For when that isn’t the case we want to be able to:
* Read/write the logical tags in a pointer
* Read/write the allocation tags assigned to a given area of memory
* Test whether the logical tag in a pointer matches the allocation tag of the
  memory it refers to
* Read/write memory even when tags are mismatched

The most obvious use case for this is working through issues where bugs in the
toolchain don’t generate correct code. On the other hand there’s a good case for
deliberately messing with pointers in your code to prove that such protection
actually works.

Note: potential extensions to scripting such as tags as attributes of values and
such are not being proposed here. Of course the new commands will be
added in the
standard ways so you can use those.

## New Commands

### Command Availability

Note: commands will be listed in tab completion and help regardless of
these checks

* The remote server must support memory tagging packets. lldb will send/check
  for the “memory-tagging” feature in the qSupported packet. (this
name aligns with gdb)
* The process must have MTE available. We check HWCAP2_MTE for this.
* The process must have enabled tagged addressing using prctl
  (see “New Registers” for details)
* The address given must be in a range that has MTE enabled, since you can mmap
  with or without MTE. (this information is in /proc/.../smaps)

 Interaction With Clang’s Stack Tagging

We’re relying on the kernel to tell us if MTE is enabled, so stack tagging will
not be visible to the debugger this way.
(https://github.com/google/sanitizers/wiki/Stack-instrumentation-with-ARM-Memory-Tagging-Extension-(MTE))

E.g. {int x; use(&x); } where x is void x(int* ptr);
“ptr” will have a memory tag but the kernel won’t know this.

To work around this a setting will be added to tell lldb to assume that MTE is
enabled, so that you can at least see the logical tags of a pointer.
(see “New Settings”)

### General Properties/Errors

*  must resolve to some value that can be handled as an
  address by lldb. (though it need not be a pointer specifically)
* Tags will be printed in hexadecimal to reflect the fact that they are a 4 bit
  field. (and since tags are randomly generated, ordering is unlikely
to be a concern)
* Packed tags will be 1 tag per byte (matches what ptrace expects)
* Addresses will be rounded down to the nearest granule (not always by ll