Le 05/09/2025 à 20:50, Corey Minyard a écrit : > I'm adding the OpenIPMI mailing list and the LKML. > > On Fri, Sep 05, 2025 at 05:04:28PM +0200, Gilles BULOZ wrote: >> Le 05/09/2025 à 15:15, Gilles BULOZ a écrit : >>> Le 05/09/2025 à 14:03, Corey Minyard a écrit : >>>> On Fri, Sep 05, 2025 at 06:54:23AM -0500, Corey Minyard wrote: >>>>> On Fri, Sep 05, 2025 at 10:16:19AM +0200, Gilles BULOZ wrote: >>>>>> Hi Corey, >>>>>> >>>>>> I'm HW/SW developer at Kontron (computer manufacturer) and don't know >>>>>> what to >>>>>> think about an issue with user->nr_msgs going negative in >>>>>> ipmi_msghandler.c. >>>>>> Not sure if it's a weakness in ipmi_msghandler.c or a bug in the IPMC >>>>>> software >>>>>> of our computer board (satellite) with event messages. >>>>> I worked with people from Kontron a long time ago. Those were good >>>>> times :). >>>>> >>>>>> I see this when I run ipmitool on this board while some event messages >>>>>> already >>>>>> available. In this case deliver_response() is processing the messages >>>>>> and is >>>>>> decreasing user->nr_msgs below 0. Then when ipmitool calls >>>>>> ioctl(IPMICTL_SEND_COMMAND) it gets an error with errno=EBUSY because in >>>>>> i_ipmi_request() user->nr_msgs is incremented but still negative, casted >>>>>> to >>>>>> unsigned int so becomes huge, and found greater than max_msgs_per_user >>>>>> (100). >>>>> Thanks for the detailed description. The fix for the bug is: >>>>> >>>>> diff --git a/drivers/char/ipmi/ipmi_msghandler.c >>>>> b/drivers/char/ipmi/ipmi_msghandler.c >>>>> index e12b531f5c2f..ba33070622b1 100644 >>>>> --- a/drivers/char/ipmi/ipmi_msghandler.c >>>>> +++ b/drivers/char/ipmi/ipmi_msghandler.c >>>>> @@ -1634,6 +1634,7 @@ int ipmi_set_gets_events(struct ipmi_user *user, >>>>> bool val) >>>>> >>>>> list_for_each_entry_safe(msg, msg2, &msgs, link) { >>>>> msg->user = user; >>>>> + atomic_add(1, &usr->nr_msgs); >>>> Sorry, that should obviously be user->nr_msgs >>> Thanks very much ! >>> I've tried it with kernel 6.11.8 and it's better but still not enough. >>> Running "ipmitool sensor" with some debug in ipmi_msghandler.c shows that I >>> always have nr_msgs=1 right after the atomic_add (called >>> 9 times), then when in i_ipmi_request() I reach line "rv = -EBUSY;" with >>> nr_msgs=-2 (twice). >> My understanding is that ipmitool calls ioctl(IPMICTL_SET_GETS_EVENTS_CMD) >> calling ipmi_set_gets_events() and thanks to your patch >> for each event available the nr_msgs is incremented here and decremented in >> deliver_response(). So your patch is OK for that. >> But after that if other events are coming, deliver_response() is called and >> nr_msgs gets decremented. So when ipmitool calls >> ioctl(IPMICTL_SEND_COMMAND), this calls i_ipmi_request() with nr_msgs < 0 >> and that returns -EBUSY > > Yeah, after I sent my email I started looking through the driver for > other issues around this, and there were several. That change wasn't > well thought through. > > So, I've added some tests around this in my test suite, and I've > reworked this to be a much better implementation that's harder to get > wrong. > > I'm going to send the fix in a separate email. > > Thanks, > > -corey > Thanks Corey I confirm your fix (separate email) is working for me. Applied on kernel 6.17-rc5 sources, with few changes under drivers/char/ipmi/ to build the kernel modules there for kernel 6.11.8, and then using these modules. Another simple fix that worked for me on 6.11.8 was to replace atomic_dec() with atomic_dec_if_positive() in deliver_response), in addition to your suggested change to add an atomic_add() to ipmi_set_gets_events(). >>>>> kref_get(&user->refcount); >>>>> deliver_local_response(intf, msg); >>>>> } >>>>> >>>>> >>>>> Can you try that out? >>>>> >>>>> I'll forward port this to current kernel (if appropriate, this has all >>>>> been rewritten) and required a backport. >>>>> >>>>> Thanks, >>>>> >>>>> -corey >>>>> >>>>>> As workaround I set module parameter max_msgs_per_user to 0xffffffff so >>>>>> that >>>>>> user->nr_msgs is never greater than this value. >>>>>> I was using kernel 6.11.8 but updated to 6.16.3 and get the same issue. >>>>>> I'm also not sure if our board is supposed to receive such event >>>>>> messages as >>>>>> it is not Shelf Manager, even if these events come from the local >>>>>> sensors of >>>>>> the board. >>>>>> >>>>>> Best regards >>>>>> >>>>>> Gilles Buloz >>>>>> Kontron Modular Computers France >>>>>> R&D SW/HW senior developer >>>>>> >>>> . > .
_______________________________________________ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer