Hello, I am trying to make systemtap work with gentoo-kernel (or ideally all dist kernels) and I got a few steps closer with kernel-build.eclass modification I sent this week [0]. However there is still one issue and that is the fact that build-id of the kernel does not match the installed vmlinux file:
# stap mba_sc.stp WARNING: Build-id mismatch [man warning::buildid]: "/usr/src/linux-5.17.13-gentoo-dist/vmlinux" pid 0 address 0xffffffff8a7b572c, expected c43e775aad5e11755bf5cf1329d2240b519e7518 actual 3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run failed. [man error::pass5] I also noticed that when kernel-build.eclass installs the vmlinux file it also (I presume portage) creates vmlinux.debug using objcopy --only-keep-debug --compress-debug-sections. So now I am in a situation where I have these relevant files on the system: - /usr/src/linux-5.17.13-gentoo-dist/vmlinux - /usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug (symlink to the first file) - /usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug and - /boot/vmlinuz-5.17.13-gentoo-dist When I check the build ids (using readelf -n or just "file") of the first three files I get: /usr/src/linux-5.17.13-gentoo-dist/vmlinux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, not stripped /usr/lib/debug/.build-id/c4/3e775aad5e11755bf5cf1329d2240b519e7518.debug: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info, not stripped /usr/lib/debug/usr/src/linux-5.17.13-gentoo-dist/vmlinux.debug: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=c43e775aad5e11755bf5cf1329d2240b519e7518, with debug_info, not stripped which looks great except: 1) the first file does not say it is "with debug_info", 2) there is no reason to keep the original vmlinux in place since there is a smaller file that works as a substitute, but I'm not sure what's a clean way to not install it, and most importantly 3) the fact that the running kernel has a different build id. The last point is the main issue here. I was trying to find how to check for the build id of the running kernel, but haven't found any way on how to do it with a kernel API, so instead I checked the /boot/vmlinuz-5.17.13-gentoo-dist like this: ~/dev/linux/scripts/extract-vmlinux /boot/vmlinuz-5.17.13-gentoo-dist >vmlinux.extracted and for good measure also tried what objcopy does to it: objcopy --only-keep-debug vmlinux.extracted vmlinux.extracted.debug objcopy --only-keep-debug --compress-debug-sections vmlinux.extracted vmlinux.extracted.compressed Now when I check the build id is different from the first files, but unchanged with objcopy and same as systemtap reports for the running kernel: vmlinux.extracted: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped vmlinux.extracted.compressed: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped vmlinux.extracted.debug: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=3a757e0a2b0d777762cd4aaf9cac0c40bc8c398c, stripped At this point I got stuck, not knowing when and how does the build-id changes and where to extract the debug symbols from. I would also like to clean up the change I did. So I came here with my question(s) and rather lengthy explanations. Does anyone know what would be the best way to deal with this? Or even where to continue looking? I would really like to make systemtap "just work" on Gentoo with the distribution kernels, but I already spent a lot of time on it, so I figured I'll rather ask here since I'm not that proficient with the intricacies of the build system parts. Thanks a lot for any pointers and have a great day, Martin [0] https://github.com/gentoo/gentoo/pull/25789