On Tue, Mar 10, 2026 at 10:40:14AM +0200, Ville Syrjälä wrote:
> On Mon, Mar 09, 2026 at 06:48:03PM +0200, Imre Deak wrote:
> > intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been
> > initialized, and dmc is thus NULL.
> >
> > That would be the case when the call path is
> > intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() ->
> > gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as
> > intel_power_domains_init_hw() is called *before* intel_dmc_init().
> >
> > However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count()
> > conditionally, depending on the current and target DC states. At probe,
> > the target is disabled, but if DC6 is enabled, the function is called,
> > and an oops follows. Apparently it's quite unlikely that DC6 is enabled
> > at probe, as we haven't seen this failure mode before.
> >
> > It is also strange to have DC6 enabled at boot, since that would require
> > the DMC firmware (loaded by BIOS); the BIOS loading the DMC firmware and
> > the driver stopping / reprogramming the firmware is a poorly specified
> > sequence and as such unlikely an intentional BIOS behaviour. It's more
> > likely that BIOS is leaving an unintentionally enabled DC6 HW state
> > behind (without actually loading the required DMC firmware for this).
>
> Wasn't the original case some kdump kernel thing?
According to Jani the original issue was a KASAN run in QEMU, see [1].
Not sure if that also resulted in kexec/kdump.
However the case reported by Tao later is related to kexec/kdump indeed.
> I think that has a few issues:
> - loading full GPU drivers for a kdump kernel after the real kernel
> has crashed seems a bit risky. Who knows what state the hardware
> is in after the crash...
> - we should probably try to unload DMC at kexec time (to the extent
> that DMC can actually be unloaded)
AFAICS that involves calling the pci_driver::shutdown which (for both xe
and i915) ends up calling intel_power_domains_disable(), which disables
DC states at least (hence the kexec'ed kernel should still not see DC6
being enabled). The DMC FW event handlers are not disabled though in
this case (which would be what you refer to unloading DMC I presume) as
opposed to system/runtime suspend, where all the DMC events are also
disabled.
I agree that the kexec->shutdown, driver remove etc. handlers should be
synced at least wrt. the above DMC unloading with the suspend handlers.
However, I consider that as a separate issue to the one fixed in this
patch, which is using the HW DC state (which is unreliable) incorrectly
to track the DC6 allowed counter (the correct way being using the SW DC
state instead). So are you okay to go ahead with this patch still for
now and follow up with syncing the above shutdown/driver remove handlers
with the suspend ones?
[1]
https://lore.kernel.org/all/[email protected]
> > The tracking of the DC6 allowed counter only works if starting /
> > stopping the counter depends on the _SW_ DC6 state vs. the current _HW_
> > DC6 state (since stopping the counter requires the DC5 counter captured
> > when the counter was started). Thus, using the HW DC6 state is incorrect
> > and it also leads to the above oops. Fix both issues by using the SW DC6
> > state for the tracking.
> >
> > This is v2 of the fix originally sent by Jani, updated based on the
> > first Link: discussion below.
> >
> > Link:
> > https://lore.kernel.org/all/[email protected]
> > Link: https://lore.kernel.org/all/[email protected]
> > Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter")
> > Cc: Mohammed Thasleem <[email protected]>
> > Cc: Jani Nikula <[email protected]>
> > Cc: Tao Liu <[email protected]>
> > Cc: <[email protected]> # v6.16+
> > Tested-by: Tao Liu <[email protected]>
> > Reviewed-by: Jani Nikula <[email protected]>
> > Signed-off-by: Imre Deak <[email protected]>
> > ---
> > drivers/gpu/drm/i915/display/intel_display_power_well.c | 2 +-
> > drivers/gpu/drm/i915/display/intel_dmc.c | 3 +--
> > 2 files changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_display_power_well.c
> > b/drivers/gpu/drm/i915/display/intel_display_power_well.c
> > index 1e03187dbd38a..f855f0f886946 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display_power_well.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display_power_well.c
> > @@ -852,7 +852,7 @@ void gen9_set_dc_state(struct intel_display *display,
> > u32 state)
> > power_domains->dc_state, val & mask);
> >
> > enable_dc6 = state & DC_STATE_EN_UPTO_DC6;
> > - dc6_was_enabled = val & DC_STATE_EN_UPTO_DC6;
> > + dc6_was_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
> > if (!dc6_was_enabled && enable_dc6)
> > intel_dmc_update_dc6_allowed_count(display, true);
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c
> > b/drivers/gpu/drm/i915/display/intel_dmc.c
> > index c3b411259a0c5..90ba932d940ac 100644
> > --- a/drivers/gpu/drm/i915/display/intel_dmc.c
> > +++ b/drivers/gpu/drm/i915/display/intel_dmc.c
> > @@ -1598,8 +1598,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct
> > intel_display *display, u32 *
> > return false;
> >
> > mutex_lock(&power_domains->lock);
> > - dc6_enabled = intel_de_read(display, DC_STATE_EN) &
> > - DC_STATE_EN_UPTO_DC6;
> > + dc6_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
> > if (dc6_enabled)
> > intel_dmc_update_dc6_allowed_count(display, false);
> >
> > --
> > 2.49.1
>
> --
> Ville Syrjälä
> Intel