Hello Boris,

Thanks for reviewing.

>-----Original Message-----
>From: [email protected] [mailto:linux-edac-
>[email protected]] On Behalf Of Borislav Petkov
>Sent: 26 August 2020 09:52
>To: Shiju Jose <[email protected]>
>Cc: [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; Linuxarm <[email protected]>
>Subject: Re: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in
>ghes_edac_register()
>
>On Tue, Aug 25, 2020 at 02:01:08PM +0100, Shiju Jose wrote:
>> After the 'commit b9cae27728d1 ("EDAC/ghes: Scan the system once on
>driver init")'
>> applied, following error has occurred in ghes_edac_register() when
>> CONFIG_DEBUG_TEST_DRIVER_REMOVE is enabled. The null
>ghes_hw.dimms
>> pointer in the mci_for_each_dimm() of ghes_edac_register() caused the
>error.
>>
>> The error occurs when all the previously initialized ghes instances
>> are removed and then probe a new ghes instance. In this case, the
>> ghes_refcount would be 0, ghes_hw.dimms and mci already freed. The
>> ghes_hw.dimms would be null because ghes_scan_system() would not call
>enumerate_dimms() again.
>
>Try the below instead and see if it fixes the issue for you too.
>
>If it does, pls send it as v2 but do not add the splat to the commit message -
>that's a lot of noise for something which is clear why it happens and you
>explain it properly in text anyway.

I tested with your changes and it fixes the issue.  I will send v2.
 
>
>Thx.
>
>---
>diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index
>da60c29468a7..54ebc8afc6b1 100644
>--- a/drivers/edac/ghes_edac.c
>+++ b/drivers/edac/ghes_edac.c
>@@ -55,6 +55,8 @@ static DEFINE_SPINLOCK(ghes_lock);  static bool
>__read_mostly force_load;  module_param(force_load, bool, 0);
>
>+static bool system_scanned;
>+
> /* Memory Device - Type 17 of SMBIOS spec */  struct memdev_dmi_entry {
>       u8 type;
>@@ -225,14 +227,12 @@ static void enumerate_dimms(const struct
>dmi_header *dh, void *arg)
>
> static void ghes_scan_system(void)
> {
>-      static bool scanned;
>-
>-      if (scanned)
>+      if (system_scanned)
>               return;
>
>       dmi_walk(enumerate_dimms, &ghes_hw);
>
>-      scanned = true;
>+      system_scanned = true;
> }
>
> void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err
>*mem_err) @@ -631,6 +631,8 @@ void ghes_edac_unregister(struct ghes
>*ghes)
>
>       mutex_lock(&ghes_reg_mutex);
>
>+      system_scanned = false;
>+
>       if (!refcount_dec_and_test(&ghes_refcount))
>               goto unlock;
>
>
>--
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette

Thanks,
Shiju

Reply via email to