Hi, I'm trying to figure out the root cause of a floating point calculation error on kernel 4.4.98. My coworker runs a SHA1 test tool. The generated sha1 does not match the expected value. Strangely, this test just goes well on one VM. After a lot of comparison between this VM and the bare metal x86-64 environment, we find the suspicious point -- the VM uses 'lazy' mode FPU context switch while bare metal server uses 'eager' mode. Then I rebuilt the kernel with "eagerfpu=DISABLE" by default. I'm happily to see the test passes across different platforms(different VMs and different x86 servers).
We don't have any custom FPU setting or modification to the native Linux 4.4.98 kernel code. Per my understanding, during boot, system will choose eagerfpu mode automatically according to the CPU's capability. It should have just worked well if the CPU supports eager mode. But the test result shows that there might be FPU context corruption. Having googled around, I don't find similar report. Could FPU experts shed some light on this issue? Thanks, Lei Chen