Hi all, Recently, our CI started running into several hangs when running the spinlock torture tests during a boot with QEMU 3.1.0 on powernv_defconfig and pseries_defconfig when compiled with Clang.
I initially bisected Linux and came down to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in C") [1], which seems to make sense. However, I realized I could not reproduce this in my local environment no matter how hard I tried, only in our Docker image. I then realized my environment's QEMU version was 4.2.0; I compiled 3.1.0 and was able to reproduce it then. I bisected QEMU down to two commits: powernv_defconfig was fixed by [2] and pseries_defconfig was fixed by [3]. I ran 100 boots with our boot-qemu.sh script [4] and QEMU 3.1.0 failed approximately 80% of the time but 4.2.0 and 5.0.0-rc1 only failed 1% of the time [5]. GCC 9.3.0 built kernels failed approximately 3% of time [6]. Without access to real hardware, I cannot really say if there is a problem here. We are going to upgrade to QEMU 4.2.0 to fix it. This is more of an FYI so that there is some record of it outside of our issue tracker and so people can be aware of it in case it comes up somewhere else. [1]: https://git.kernel.org/linus/3282a3da25bd63fdb7240bc35dbdefa4b1947005 [2]: https://git.qemu.org/?p=qemu.git;a=commit;h=f30c843ced5055fde71d28d10beb15af97fdfe39 [3]: https://git.qemu.org/?p=qemu.git;a=commit;h=34a6b015a98733a4b32881777dafd70156c5a322. [4]: https://github.com/ClangBuiltLinux/boot-utils/blob/5f49a87e272fbe967a8d26cf405cec15b024702c/boot-qemu.sh [5]: https://user-images.githubusercontent.com/11478138/78957618-b1842080-7a9a-11ea-8856-279c3dcc6c19.png [6]: https://user-images.githubusercontent.com/11478138/78955535-62d38800-7a94-11ea-9e61-9e3d8c068ace.png Cheers, Nathan