Hi Ritesh, No, I haven't encountered a specific hang on live hardware in production.
This was a proactive fix that originated from an audit of the RTAS call sites. I noticed the explicit `/* TODO: Add upper time limit for the delay */` comments in rtas-fadump.c that had not yet been implemented. When cross-referencing this with other parts of the pSeries code (such as the bounded busy-wait pattern used in rtas-rtc.c), it seemed that adding a timeout is the standard defensive programming approach for these hypervisor calls. Since these fadump registration and teardown paths happen during critical boot and state transition phases, a misbehaving hypervisor or firmware anomaly could lead to a hard-to-debug, silent system stall. I implemented the 60-second timeout to resolve the pending TODOs and ensure the kernel remains resilient in those specific edge cases. Thanks, Adriano On Tue, Apr 7, 2026 at 6:07 AM Ritesh Harjani <[email protected]> wrote: > Adriano Vero <[email protected]> writes: > > > The ibm,configure-kernel-dump RTAS call sites in > > rtas_fadump_register(), rtas_fadump_unregister(), and > > rtas_fadump_invalidate() polled indefinitely while firmware returned > > a busy status. A misbehaving or hung firmware could stall these paths > > forever, blocking fadump registration at boot or preventing clean > > teardown. > > Was there an issue which you encountered? Can you share the details of > the same please? > > > > > > Track the accumulated delay in a total_wait counter and bail out with > > -ETIMEDOUT if it reaches RTAS_FADUMP_MAX_WAIT_MS (60 seconds) before > > firmware signals completion. This follows the bounded busy-wait pattern > > used in rtas-rtc.c. > > > > Signed-off-by: Adriano Vero <[email protected]> >
