On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote:
> 
> The lower level driver should never not return an answer, it is supposed
> to guarantee that it returns an error if the BMC doesn't respond.
> 
> So the bug is not here, the bug is elsewhere.  My guess is that there
> is some new failure mode where a BMC is not working but it responds well
> enough that it sort of works and fools the driver.  But that's only a
> guess.

I can now reproduce this pretty reliably by running concurrent
ipmitool commands (sensor/sel/mc info) + sysfs readers + periodic
ipmitool mc reset cold. It wedges in a few minutes. 

My working theory is handle_flags() in ipmi_si_intf.c can loop on
flag-driven commands (e.g. READ_EVENT_MSG_BUFFER) without ever calling
start_next_msg(), starving waiting_msg indefinitely.

Captured state at wedge:

  si_state=SI_GETTING_EVENTS  msg_flags=0x02
  si_curr cycling cmd=0x35 (READ_EVENT_MSG_BUFFER)
  si_wait frozen cmd=0x08 (GET_DEVICE_GUID, never promoted)

The cold reset makes the BMC report EVENT_MSG_BUFFER_FULL during
re-init, which drives the flag loop.

Thanks,
Matt


_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to