> i have indeed disabled quick/quiet boot options to no avail. i've also tried
> failsafe mode, ide vs ahci, acpi v1/2/3. the issue does not present with
> linux, which makes me wonder whether the openbsd kernel is somehow making
> some kind of hardware setting change that is not cleared on reboot. despite
> this only presenting in openbsd, i still blame hardware - but am hoping
> there might be some openbsd-related tweak.
> 
> thanks for the idea.
> 
> 

Hi Dewey,

On my X7SPA-HF-D525 system quick boot has never been enabled, and
it's set to AHCI mode and it never mattered the storage device, i.e.
independent of USB flash stick or HDD disk (been running both for long
periods), the problem has been manifesting itself over the years.

What I have to make clear is that not every reboot leads to this
condition here, only if the system has been running for a considerable
longer time. Typically I run the system for a while (as long as
possible / needed) between snapshot upgrades as it's in use 24/7
behind a true sine wave UPS. I have ruled out power supply, memory and
there is no periphery.

After the system has been running for while I usually download the sets
from a mirror and rsync them to local storage, then issue a reboot.
There is a pretty high chance the system will NOT boot at all as
you're reporting exactly, but it does go into the reboot process OK
cleanly exiting the OS and doing a reset. It goes into the early stages
of the POST and can not complete it, but the system passes is
accessible over the IPMI BMC and can be power cycled etc over IPMI over
LAN util, and via the web based interface on the BMC as well.

The system can not boot up properly once it enters this condition, since
on an IPMI power cycle or off the POST goes into long beep (~5-7s)
silend (~1s) repeat long beep / silence pattern that means memory
error, but it's not the memory's fault.

The IPMI can not be used to reset the system, only to power off/on or
power cycle in this condition. My most critical presumption is that it
is a BIOS POST or an IPMI hook related to the BIOS post however, and
would like that further taken with Supermicro if the OS factor is ruled
out as well.

The system can only be brought back by the PSU breaker switch or power
(mains) cable disconnect / reconnect for 5s.

Once up the system boots, passes through the upgrade OK, can be
rebooted and the problem IS NOT present. Several reboots work OK,
tested, so it's not caused by the OS unclean exit, it works several
reboot cycles / upgrades etc... until you leave it running for a longer
period of time. This is what you may be seeing with the Linux reboot
cycle test script.

The system runs for a long while no issues, and after getting it
rebooted no matter how, over SSH, local KBD, serial cable, or serial
over LAN (Ethernet) IPMI tool, or the IPMI web based tool, it gets into
this flawed state where it can not pass the POST. So, the system is a
total fail for locating at a data centre without a PDU unit with real
disconnect feature.

I have never ran Linux on this box and can not do so (live system in
production, no spares or budget for this), but I would recommend that
you try and see if it makes a difference over a longer run with Linux
and see if you can trigger this happening independent of the OS.

The most important issue for me is to know if it is OS dependent or
not, as this will be very valuable in bringing it back to Supermicro,
or alternatively comparing the reboot state between OpenBSD and another
OS.

Thank you for your tests and perseverance on this, much appreciated.

Regards,
Anton

Reply via email to