Hi Corey & Tony, On Wed, Apr 15, 2026 at 11:46:27AM -0400, 'Tony Camuso' via kernel-team wrote: > On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote: > > From: Matt Fleming <mfl...@cl...> > > > > When the BMC does not respond to a "Get Device ID" command, the > > wait_event() in __get_device_id() blocks forever in > > TASK_UNINTERRUPTIBLE while holding bmc->dyn_mutex. Every subsequent > > sysfs reader then piles up in D state. Replace with > > wait_event_timeout() to return -EIO after 1 second. > > On Wed, Apr 15, 2026 at 12:17:04PM, Corey Minyard wrote: > > This is the second report I have of something like this. So > > something is up. I'm adding Tony, who reported something like this > > dealing with the watchdog. > > > > The lower level driver should never not return an answer, it is > > supposed to guarantee that it returns an error if the BMC doesn't > > respond. So the bug is not here, the bug is elsewhere.
This is a bit of a throwback to our previous discussions around [1]. I did end up applying [2] based on that discussion, and had limited success, but we still have external resets that cause us to enter this undesirable state :( [1]: https://lore.kernel.org/all/[email protected]/ [2]: https://lore.kernel.org/all/[email protected]/ > > I've been tracking a related issue (RHEL customer case) where BMC > reset while the IPMI watchdog is active causes D-state hangs. This > appears to be the same root cause Matt is hitting. > > I backported the recent upstream KCS/SI fixes to a RHEL 9 test kernel > (54 patches bringing it to mainline parity) and tested today on a > Dell R640. I assume this patch series: "ipmi:watchdog: Fix panic, D-state hang, and lost protection on BMC reset" [3]? [3]: https://lore.kernel.org/all/[email protected]/ _______________________________________________ Openipmi-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openipmi-developer
