Hi Ritesh,

No, I haven't encountered a specific hang on live hardware in production.

This was a proactive fix that originated from an audit of the RTAS call
sites. I noticed the explicit `/* TODO: Add upper time limit for the delay
*/` comments in rtas-fadump.c that had not yet been implemented.

When cross-referencing this with other parts of the pSeries code (such as
the bounded busy-wait pattern used in rtas-rtc.c), it seemed that adding a
timeout is the standard defensive programming approach for these hypervisor
calls.

Since these fadump registration and teardown paths happen during critical
boot and state transition phases, a misbehaving hypervisor or firmware
anomaly could lead to a hard-to-debug, silent system stall. I implemented
the 60-second timeout to resolve the pending TODOs and ensure the kernel
remains resilient in those specific edge cases.

Thanks,
Adriano

On Tue, Apr 7, 2026 at 6:07 AM Ritesh Harjani <[email protected]> wrote:

> Adriano Vero <[email protected]> writes:
>
> > The ibm,configure-kernel-dump RTAS call sites in
> > rtas_fadump_register(), rtas_fadump_unregister(), and
> > rtas_fadump_invalidate() polled indefinitely while firmware returned
> > a busy status. A misbehaving or hung firmware could stall these paths
> > forever, blocking fadump registration at boot or preventing clean
> > teardown.
>
> Was there an issue which you encountered? Can you share the details of
> the same please?
>
>
> >
> > Track the accumulated delay in a total_wait counter and bail out with
> > -ETIMEDOUT if it reaches RTAS_FADUMP_MAX_WAIT_MS (60 seconds) before
> > firmware signals completion. This follows the bounded busy-wait pattern
> > used in rtas-rtc.c.
> >
> > Signed-off-by: Adriano Vero <[email protected]>
>

Reply via email to