> i have indeed disabled quick/quiet boot options to no avail. i've also tried > failsafe mode, ide vs ahci, acpi v1/2/3. the issue does not present with > linux, which makes me wonder whether the openbsd kernel is somehow making > some kind of hardware setting change that is not cleared on reboot. despite > this only presenting in openbsd, i still blame hardware - but am hoping > there might be some openbsd-related tweak. > > thanks for the idea. > >
Hi Dewey, On my X7SPA-HF-D525 system quick boot has never been enabled, and it's set to AHCI mode and it never mattered the storage device, i.e. independent of USB flash stick or HDD disk (been running both for long periods), the problem has been manifesting itself over the years. What I have to make clear is that not every reboot leads to this condition here, only if the system has been running for a considerable longer time. Typically I run the system for a while (as long as possible / needed) between snapshot upgrades as it's in use 24/7 behind a true sine wave UPS. I have ruled out power supply, memory and there is no periphery. After the system has been running for while I usually download the sets from a mirror and rsync them to local storage, then issue a reboot. There is a pretty high chance the system will NOT boot at all as you're reporting exactly, but it does go into the reboot process OK cleanly exiting the OS and doing a reset. It goes into the early stages of the POST and can not complete it, but the system passes is accessible over the IPMI BMC and can be power cycled etc over IPMI over LAN util, and via the web based interface on the BMC as well. The system can not boot up properly once it enters this condition, since on an IPMI power cycle or off the POST goes into long beep (~5-7s) silend (~1s) repeat long beep / silence pattern that means memory error, but it's not the memory's fault. The IPMI can not be used to reset the system, only to power off/on or power cycle in this condition. My most critical presumption is that it is a BIOS POST or an IPMI hook related to the BIOS post however, and would like that further taken with Supermicro if the OS factor is ruled out as well. The system can only be brought back by the PSU breaker switch or power (mains) cable disconnect / reconnect for 5s. Once up the system boots, passes through the upgrade OK, can be rebooted and the problem IS NOT present. Several reboots work OK, tested, so it's not caused by the OS unclean exit, it works several reboot cycles / upgrades etc... until you leave it running for a longer period of time. This is what you may be seeing with the Linux reboot cycle test script. The system runs for a long while no issues, and after getting it rebooted no matter how, over SSH, local KBD, serial cable, or serial over LAN (Ethernet) IPMI tool, or the IPMI web based tool, it gets into this flawed state where it can not pass the POST. So, the system is a total fail for locating at a data centre without a PDU unit with real disconnect feature. I have never ran Linux on this box and can not do so (live system in production, no spares or budget for this), but I would recommend that you try and see if it makes a difference over a longer run with Linux and see if you can trigger this happening independent of the OS. The most important issue for me is to know if it is OS dependent or not, as this will be very valuable in bringing it back to Supermicro, or alternatively comparing the reboot state between OpenBSD and another OS. Thank you for your tests and perseverance on this, much appreciated. Regards, Anton