On Tue, 26 Mar 2024 05:48:38 -0700, Badal Nilawar wrote:
>

Hi Badal,

> i915_hwmon and its resources are managed resources of i915 dev.
> During i915 driver unregister flow the function i915_hwmon_unregister()
> explicitly makes i915_hwmon resource NULL. This happen before
> hwmon is actually unregistered. Doing so may cause UAF if hwmon
> sysfs is being accessed:

I don't agree with this patch. For even if we remove setting to NULL, it
doesn't explain what we see, which is that the devm_kzalloc'd i915->hwmon
has been *freed* (since ddat (i915->hwmon->ddat) is 0x6b6b6b6b6b6b6dbf):

        (gdb) list *hwm_energy+0x2b
        0x161f8b is in hwm_energy (drivers/gpu/drm/i915/i915_hwmon.c:134).
        129             struct hwm_energy_info *ei = &ddat->ei;
        130             intel_wakeref_t wakeref;
        131             i915_reg_t rgaddr;
        132             u32 reg_val;
        133
        134             if (ddat->gt_n >= 0)
        135                     rgaddr = hwmon->rg.energy_status_tile;
        136             else
        137                     rgaddr = hwmon->rg.energy_status_all;
        138

If we don't have i915_hwmon_unregister equivalent in xe, that's because xe
has not implemented equivalents of i915_hwmon_power_max_disable and
i915_hwmon_power_max_restore, where the check is used.

Also, we should verify any fix before sending a patch out, either via local
repro (say use a script to read hwmon energy file while running the
selftests) or CI. Though in this case CI doesn't show any failures but I am
wondering if that is only by chance.

Fwiu, the issue is somehow being caused by ddat being *freed* before hwmon
sysfs has disappeared, at module unload time (since selftests use module
unload), allowing prometheus-node process to call into the energy read
sysfs and cause the crash.

Thanks.
--
Ashutosh


>
> <7> [518.386591] i915 0000:03:00.0: [drm] intel_gt_set_wedged called from 
> intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
> <7> [518.471128] i915 0000:03:00.0: [drm:drm_client_release] drm_fb_helper
> <4> [518.501476] general protection fault, probably for non-canonical address 
> 0x6b6b6b6b6b6b6dbf: 0000 [#1] PREEMPT SMP NOPTI
> <4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G     U       
>       6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
> <4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client 
> Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 
> 05/27/2023
> <4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
> <4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 
> f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 
> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
> <4> [518.559746] RSP: 0018:ffffc900077efd00 EFLAGS: 00010202
> <4> [518.564943] RAX: 0000000000000000 RBX: ffff8881e3078428 RCX: 
> 0000000000000000
> <4> [518.572025] RDX: 0000000000000001 RSI: ffffc900077efda0 RDI: 
> 000000006b6b6b6b
> <4> [518.579107] RBP: ffffc900077efd40 R08: ffffc900077efda0 R09: 
> 0000000000000001
> <4> [518.586186] R10: 0000000000000001 R11: ffff88810c19aa00 R12: 
> ffff888243e1a010
> <4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: 
> ffff8881e3078428
> <4> [518.600343] FS:  00007f9def400700(0000) GS:ffff88888d100000(0000) 
> knlGS:0000000000000000
> <4> [518.608373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [518.614077] CR2: 0000564f19bff288 CR3: 0000000119f94000 CR4: 
> 0000000000f50ef0
> <4> [518.621158] PKRU: 55555554
> <4> [518.623858] Call Trace:
> <4> [518.626303]  <TASK>
> <4> [518.628400]  ? __die_body+0x1a/0x60
> <4> [518.631881]  ? die_addr+0x38/0x60
> <4> [518.635182]  ? exc_general_protection+0x1a1/0x400
> <4> [518.639862]  ? asm_exc_general_protection+0x26/0x30
> <4> [518.644710]  ? hwm_energy+0x2b/0x100 [i915]
> <4> [518.649007]  hwm_read+0x9a/0x310 [i915]
> <4> [518.652953]  hwmon_attr_show+0x36/0x140
> <4> [518.656775]  dev_attr_show+0x15/0x60
>
> Fixes: b3b088e28183 ("drm/i915/hwmon: Add HWMON infrastructure")
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/10366
> Cc: Ashutosh Dixit <ashutosh.di...@intel.com>
> Cc: Janusz Krzysztofik <janusz.krzyszto...@linux.intel.com>
> Signed-off-by: Badal Nilawar <badal.nila...@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_driver.c | 2 --
>  drivers/gpu/drm/i915/i915_hwmon.c  | 5 -----
>  2 files changed, 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_driver.c 
> b/drivers/gpu/drm/i915/i915_driver.c
> index 4b9233c07a22..a95b455964b7 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -660,8 +660,6 @@ static void i915_driver_unregister(struct 
> drm_i915_private *dev_priv)
>       for_each_gt(gt, dev_priv, i)
>               intel_gt_driver_unregister(gt);
>
> -     i915_hwmon_unregister(dev_priv);
> -
>       i915_perf_unregister(dev_priv);
>       i915_pmu_unregister(dev_priv);
>
> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
> b/drivers/gpu/drm/i915/i915_hwmon.c
> index c9169e56b9a1..91f171752d34 100644
> --- a/drivers/gpu/drm/i915/i915_hwmon.c
> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
> @@ -841,8 +841,3 @@ void i915_hwmon_register(struct drm_i915_private *i915)
>                       ddat_gt->hwmon_dev = hwmon_dev;
>       }
>  }
> -
> -void i915_hwmon_unregister(struct drm_i915_private *i915)
> -{
> -     fetch_and_zero(&i915->hwmon);
> -}
> --
> 2.25.1
>

Reply via email to