I finally figured out what is happening, but I am not sure what would be the best way to work around it.
The problem is that with FEATURES=splitdebug the vmlinux binary is being processed by estrip, which uses debugedit and specifically asks it to recompute the build id. However, the bzImage is created from the vmlinux *before* that, and thus preserves the old build-id. One option would be to create the vmlinux.debug file manually, but I am afraid it would duplicate lot of the code from estrip, unless it can somehow be uses cleanly by the ebuild. The advantage of this would be that there is no need for the huge vmlinux file after that and we can just keep the vmlinux.debug around instead. I'll end with a couple of closing questions if I may: - Does anyone have an idea for some a clean way to do this? - Is it preferable to use GitHub PRs or this ML for such eclass changes? - What is exactly the reason for portage using the `-i`/`--build-id` option of debugedit? Thanks and have a nice day, Martin On Fri, Jun 10, 2022 at 02:22:00PM +0200, Martin Kletzander wrote:
Hello, I am trying to make systemtap work with gentoo-kernel (or ideally all dist kernels) and I got a few steps closer with kernel-build.eclass modification I sent this week [0]. However there is still one issue and that is the fact that build-id of the kernel does not match the installed vmlinux file: # stap mba_sc.stp WARNING: Build-id mismatch [man warning::buildid]: "/usr/src/linux-5.17.13-gentoo-dist/vmlinux" pid 0 address 0xffffffff8a7b572c, expected c43e775aad5e11755bf5cf1329d2240b519e7518 actual 3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run failed. [man error::pass5] I also noticed that when kernel-build.eclass installs the vmlinux file it also (I presume portage) creates vmlinux.debug using objcopy --only-keep-debug --compress-debug-sections. So now I am in a situation where I have these relevant files on the system: - /usr/src/linux-5.17.13-gentoo-dist/vmlinux - /usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug (symlink to the first file) - /usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug and - /boot/vmlinuz-5.17.13-gentoo-dist When I check the build ids (using readelf -n or just "file") of the first three files I get: /usr/src/linux-5.17.13-gentoo-dist/vmlinux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, not stripped /usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info, not stripped /usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info, not stripped which looks great except: 1) the first file does not say it is "with debug_info", 2) there is no reason to keep the original vmlinux in place since there is a smaller file that works as a substitute, but I'm not sure what's a clean way to not install it, and most importantly 3) the fact that the running kernel has a different build id. The last point is the main issue here. I was trying to find how to check for the build id of the running kernel, but haven't found any way on how to do it with a kernel API, so instead I checked the /boot/vmlinuz-5.17.13-gentoo-dist like this: ~/dev/linux/scripts/extract-vmlinux /boot/vmlinuz-5.17.13-gentoo-dist >vmlinux.extracted and for good measure also tried what objcopy does to it: objcopy --only-keep-debug vmlinux.extracted vmlinux.extracted.debug objcopy --only-keep-debug --compress-debug-sections vmlinux.extracted vmlinux.extracted.compressed Now when I check the build id is different from the first files, but unchanged with objcopy and same as systemtap reports for the running kernel: vmlinux.extracted: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped vmlinux.extracted.compressed: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped vmlinux.extracted.debug: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped At this point I got stuck, not knowing when and how does the build-id changes and where to extract the debug symbols from. I would also like to clean up the change I did. So I came here with my question(s) and rather lengthy explanations. Does anyone know what would be the best way to deal with this? Or even where to continue looking? I would really like to make systemtap "just work" on Gentoo with the distribution kernels, but I already spent a lot of time on it, so I figured I'll rather ask here since I'm not that proficient with the intricacies of the build system parts. Thanks a lot for any pointers and have a great day, Martin [0] https://github.com/gentoo/gentoo/pull/25789