I finally figured out what is happening, but I am not sure what would be
the best way to work around it.

The problem is that with FEATURES=splitdebug the vmlinux binary is being
processed by estrip, which uses debugedit and specifically asks it to
recompute the build id.  However, the bzImage is created from the
vmlinux *before* that, and thus preserves the old build-id.

One option would be to create the vmlinux.debug file manually, but I am
afraid it would duplicate lot of the code from estrip, unless it can
somehow be uses cleanly by the ebuild.  The advantage of this would be
that there is no need for the huge vmlinux file after that and we can
just keep the vmlinux.debug around instead.

I'll end with a couple of closing questions if I may:

- Does anyone have an idea for some a clean way to do this?

- Is it preferable to use GitHub PRs or this ML for such eclass changes?

- What is exactly the reason for portage using the `-i`/`--build-id`
  option of debugedit?

Thanks and have a nice day,
Martin

On Fri, Jun 10, 2022 at 02:22:00PM +0200, Martin Kletzander wrote:
Hello,

I am trying to make systemtap work with gentoo-kernel (or ideally all
dist kernels) and I got a few steps closer with kernel-build.eclass
modification I sent this week [0].  However there is still one issue and
that is the fact that build-id of the kernel does not match the
installed vmlinux file:

# stap mba_sc.stp
WARNING: Build-id mismatch [man warning::buildid]:
"/usr/src/linux-5.17.13-gentoo-dist/vmlinux" pid 0 address
0xffffffff8a7b572c, expected c43e775aad5e11755bf5cf1329d2240b519e7518
actual 3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

I also noticed that when kernel-build.eclass installs the vmlinux file
it also (I presume portage) creates vmlinux.debug using objcopy
--only-keep-debug --compress-debug-sections.

So now I am in a situation where I have these relevant files on the
system:

- /usr/src/linux-5.17.13-gentoo-dist/vmlinux
- /usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug
  (symlink to the first file)
- /usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug and
- /boot/vmlinuz-5.17.13-gentoo-dist


When I check the build ids (using readelf -n or just "file") of the
first three files I get:

/usr/src/linux-5.17.13-gentoo-dist/vmlinux:
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked,
BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, not stripped

/usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug:
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked,
BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info,
not stripped

/usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug:
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked,
BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info,
not stripped

which looks great except:

1) the first file does not say it is "with debug_info",

2) there is no reason to keep the original vmlinux in place since there
   is a smaller file that works as a substitute, but I'm not sure what's
   a clean way to not install it, and most importantly

3) the fact that the running kernel has a different build id.

The last point is the main issue here.  I was trying to find how to
check for the build id of the running kernel, but haven't found any way
on how to do it with a kernel API, so instead I checked the
/boot/vmlinuz-5.17.13-gentoo-dist like this:

~/dev/linux/scripts/extract-vmlinux /boot/vmlinuz-5.17.13-gentoo-dist 
>vmlinux.extracted

and for good measure also tried what objcopy does to it:

objcopy --only-keep-debug vmlinux.extracted vmlinux.extracted.debug
objcopy --only-keep-debug --compress-debug-sections vmlinux.extracted 
vmlinux.extracted.compressed

Now when I check the build id is different from the first files, but
unchanged with objcopy and same as systemtap reports for the running
kernel:

vmlinux.extracted:
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked,
BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped

vmlinux.extracted.compressed:
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked,
BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped

vmlinux.extracted.debug:
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked,
BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped


At this point I got stuck, not knowing when and how does the build-id
changes and where to extract the debug symbols from.  I would also like
to clean up the change I did.  So I came here with my question(s) and
rather lengthy explanations.  Does anyone know what would be the best
way to deal with this?  Or even where to continue looking?  I would
really like to make systemtap "just work" on Gentoo with the
distribution kernels, but I already spent a lot of time on it, so I
figured I'll rather ask here since I'm not that proficient with the
intricacies of the build system parts.

Thanks a lot for any pointers and have a great day,
Martin

[0] https://github.com/gentoo/gentoo/pull/25789

Reply via email to