[AMD Official Use Only - AMD Internal Distribution Only]
> -----Original Message-----
> From: amd-gfx <[email protected]> On Behalf Of Kent
> Russell
> Sent: Thursday, January 22, 2026 11:25 PM
> To: [email protected]
> Cc: Zhang, Hawking <[email protected]>; Russell, Kent
> <[email protected]>
> Subject: [PATCH] drm/amdgpu: Send RMA CPER at bad page loading
>
> Some older builds weren't sending RMA CPERs when the bad page threshold was
> exceeded. Newer builds have resolved this, but there could be systems out
> there
> with bad page numbers higher than the threshold, that haven't sent out an RMA
> CPER. To be thorough and safe, send an RMA CPER when we load the table, if the
> threshold is met or exceeded, instead of waiting for the next UE to trigger
> the CPER.
>
> Signed-off-by: Kent Russell <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> index 64dd7a81bff5..469d04a39d7d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> @@ -1712,6 +1712,10 @@ int amdgpu_ras_eeprom_check(struct
> amdgpu_ras_eeprom_control *control)
> dev_warn(adev->dev, "RAS records:%u exceeds 90%% of
> threshold:%d",
> control->ras_num_bad_pages,
> ras->bad_page_cnt_threshold);
> + if (amdgpu_bad_page_threshold != 0 &&
> + control->ras_num_bad_pages >= ras-
> >bad_page_cnt_threshold)
> + amdgpu_dpm_send_rma_reason(adev);
> +
[Tao]: 1. Better to add comment to describe this special case;
2. Do we need to trigger in-band cper as well? Like:
if (adev->cper.enabled && !amdgpu_uniras_enabled(adev) &&
amdgpu_cper_generate_bp_threshold_record(adev))
dev_warn(adev->dev, "fail to generate bad page threshold cper
records\n");
> } else if (hdr->header == RAS_TABLE_HDR_BAD &&
> amdgpu_bad_page_threshold != 0) {
> if (hdr->version >= RAS_TABLE_VER_V2_1) {
> --
> 2.43.0