On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote:
> 
> I've seen this before in several scenarios, including a system that put
> IPMI in the ACPI tree and it sort of worked but there was no BMC
> present.  I had to disable that particular device.
> 
> What hardware is involved here?
 
I'm fairly sure we've seen this across a bunch of different BMCs, so
it's not a BMC hardware thing. Almost certainly a driver issue.

> Can you give a more detailed example of what's happening in the
> low-level hardware?  If it's KCS there's a debug flag in the
> drivers/char/ipmi/ipmi_kcs_sm.c file that should help.

Yep, it's KCS. Unfortunately I haven't found a way to reproduce this
reliably yet.

Looking at a wedged machine (cat /sys/class/ipmi/.../firmware_revision)
with drgn I can see that there's 99,846 messages sat on intf->xmit_msgs
and the KCS SM is idle (it's responding to internal traffic like Get
Global Enables and Get Msg Flags). So it looks like completions are
getting dropped.

We're running a 6.18.18 kernel which includes c08ec55617cb ("ipmi: Fix
use-after-free and list corruption on sender error"), so it's not that.

Here's a dump of some of the data structures.

intf                       = 0xffff9d2f4a5a0000
intf->curr_msg             = 0xffff9d34f21a9c00    
intf->xmit_msgs.next       = 0xffff9d30c28e3c80 
intf->waiting_rcv_msgs     = empty
intf->maintenance_mode     = 0
intf->maintenance_mode_state = 0
intf->in_shutdown          = false
intf->seq_table            = 0/64 slots used
intf->smi_work.pending     = 0

The stuck message itself — intf->curr_msg:

msg @ 0xffff9d34f21a9c00
  .data      = { 0x18, 0x01 }           # NetFn 0x06 (App), cmd 0x01 = Get 
Device ID
  .data_size = 2
  .rsp_size  = 38                       
  .rsp[0..7] = 2c 01 00 00 ...          
                                        
                                        
  .done      = free_smi_msg             
  .user_data = NULL
  .msgid     = (internal GDI poll)
  .type      = IPMI_SMI_MSG_TYPE_NORMAL


smi_info                   = 0xffff9d2f4a010000
smi_info->si_state         = SI_NORMAL (0)
smi_info->curr_msg         = 0xffff9d2f48c7b800 
smi_info->waiting_msg      = NULL
smi_info->interrupt_disabled = false
smi_info->supports_event_msg_buff = true
smi_info->io.irq           = 0                     
smi_info->run_to_completion = false
smi_info->in_maintenance_mode = 0

Let me know if you want any other info. I'll try to trace this and
catch it reproducing.


_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to