Would it be useful for heartbeat to have a just-log-don't-panic option? It feels like are in a state where we know there is a problem somewhere, and we don't know if it is in heartbeat, the kernel, or hardware.
I would not want to run a watchdog that reboots the system unless the FP rate is well under once per year, and really under 0.2/year. Having this logged instead of panicing would make it more comfortable to turn on. Probably it should be default to not panic, if this turns into enough reports that it seems to have significantly non-zero probability. (Presumably atf runs on real hw survive HEARTBEAT though, so whatever is happening seems low probability to start with.)