On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote: > > The lower level driver should never not return an answer, it is supposed > to guarantee that it returns an error if the BMC doesn't respond. > > So the bug is not here, the bug is elsewhere. My guess is that there > is some new failure mode where a BMC is not working but it responds well > enough that it sort of works and fools the driver. But that's only a > guess.
I can now reproduce this pretty reliably by running concurrent ipmitool commands (sensor/sel/mc info) + sysfs readers + periodic ipmitool mc reset cold. It wedges in a few minutes. My working theory is handle_flags() in ipmi_si_intf.c can loop on flag-driven commands (e.g. READ_EVENT_MSG_BUFFER) without ever calling start_next_msg(), starving waiting_msg indefinitely. Captured state at wedge: si_state=SI_GETTING_EVENTS msg_flags=0x02 si_curr cycling cmd=0x35 (READ_EVENT_MSG_BUFFER) si_wait frozen cmd=0x08 (GET_DEVICE_GUID, never promoted) The cold reset makes the BMC report EVENT_MSG_BUFFER_FULL during re-init, which drives the flag loop. Thanks, Matt _______________________________________________ Openipmi-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openipmi-developer
