Bug#1033732: [PATCH v2] x86/acpi/boot: Do not register processors that cannot be onlined for x2apic

2023-04-02 Thread Borislav Petkov
On Sun, Apr 02, 2023 at 03:13:05PM +0200, Guy Durrieu wrote:
> Yes it does.

Thanks for testing.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Bug#1033732: [PATCH v2] x86/acpi/boot: Do not register processors that cannot be onlined for x2apic

2023-04-02 Thread Borislav Petkov
On April 2, 2023 12:41:46 PM GMT+02:00, Guy Durrieu  
wrote:
>My system worked fine with kernel 6.1.15, but stopped booting after
>upgrading to 6.1.20 and resulted in a kernel panic:

Does this fix it:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/urgent

Thx.

-- 
Sent from a small device: formatting sucks and brevity is inevitable.



Bug#717473: [PATCH] adm64_edac: Fix single-channel setups

2013-07-29 Thread Borislav Petkov
It can happen that configurations are running in a single-channel mode
even with a dual-channel memory controller, by, say, putting the DIMMs
only on the one channel and leaving the other empty. This causes a
problem in init_csrows which implicitly assumes that when the second
channel is enabled, i.e. channel 1, the struct dimm hierarchy will be
present. Which is not.

So always allocate two channels unconditionally.

This provides for the nice side effect that the data structures are
initialized so some day, when memory hotplug is supported, it should
just work out of the box when all of a sudden a second channel appears.

Reported-and-tested-by: Roger Leigh rle...@debian.org
Signed-off-by: Borislav Petkov b...@suse.de
---
 drivers/edac/amd64_edac.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 8b6a0343c220..8b3d90143514 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2470,8 +2470,15 @@ static int amd64_init_one_instance(struct pci_dev *F2)
layers[0].size = pvt-csels[0].b_cnt;
layers[0].is_virt_csrow = true;
layers[1].type = EDAC_MC_LAYER_CHANNEL;
-   layers[1].size = pvt-channel_count;
+
+   /*
+* Always allocate two channels since we can have setups with DIMMs on
+* only one channel. Also, this simplifies handling later for the price
+* of a couple of KBs tops.
+*/
+   layers[1].size = 2;
layers[1].is_virt_csrow = false;
+
mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, 0);
if (!mci)
goto err_siblings;
-- 
1.8.3

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#717473: [PATCH] adm64_edac: Fix single-channel setups

2013-07-29 Thread Borislav Petkov
On Mon, Jul 29, 2013 at 04:00:52PM +0100, Ben Hutchings wrote:
 There's a typo in the subject line. :-)

Yep, looks like I've gradually gotten unnaccustomed to typing amd ...
it's not in the fingers anymore. :-)

Thanks, fixed.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#717473: BUG: Null pointer deref in amd64_edac_mod/amd64_probe_one_instance during boot

2013-07-24 Thread Borislav Petkov
On Wed, Jul 24, 2013 at 12:04:49AM +0100, Ben Hutchings wrote:
 This is absolutely stable material. Most of the people affected are
 not going to work out that they plugged their memory in wrong. (And
 maybe some of them only have one module.)

Hmm, not from my experience. This is the first report I've seen so far
and most people running amd64_edac have DIMMs on both channels.

But if you really insist, I'll tag it for stable and send it to Linus
now.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#717473: BUG: Null pointer deref in amd64_edac_mod/amd64_probe_one_instance during boot

2013-07-23 Thread Borislav Petkov
On Mon, Jul 22, 2013 at 08:19:26PM +0100, Roger Leigh wrote:
 Ben's patch does allow me to boot the system with the memory in this
 configuration on a 3.10 kernel.

Ok, I actually think we can fix it the way below. It should be
equivalent to Ben's patch in current functionality with the difference
that it is a bit simpler and keeps the special handling for K8 which I
want to have there as a future info.

In addition, it still provides for the data structures to be initialized
so some day, when memory hotplug is supported, it should work out of the
box when all of a sudden a second channel appears.

I think it should apply cleanly to 3.8 or 3.9 too as we haven't had a
whole lot of movement in that area :-)

Thanks.

---
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 8b6a0343c220..52f2da1a89a9 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2470,8 +2470,15 @@ static int amd64_init_one_instance(struct pci_dev *F2)
layers[0].size = pvt-csels[0].b_cnt;
layers[0].is_virt_csrow = true;
layers[1].type = EDAC_MC_LAYER_CHANNEL;
-   layers[1].size = pvt-channel_count;
+
+   /*
+* Always allocate two channels since we can have setups with DIMMs on
+* only one channel. Also, this simplifies handling later for the price
+* of a couple of KBs tops.
+*/
+   layers[1].size = 2;
layers[1].is_virt_csrow = false;
+
mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, 0);
if (!mci)
goto err_siblings;
--

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#717473: BUG: Null pointer deref in amd64_edac_mod/amd64_probe_one_instance during boot

2013-07-23 Thread Borislav Petkov
On Tue, Jul 23, 2013 at 11:24:17PM +0100, Roger Leigh wrote:
 I've tested it against 3.10 and I can confirm that it works.  I've
 booted the system with the DRAM on the same channel, and on
 separate channels, and it's working without problems in both cases.

That's good news, thanks for testing and helping with this Roger.

I'll add your Tested-by and queue it for 3.12 (i.e., I don't see it
being urgent enough to rush it to -stable since your dual-channel layout
takes care of the issue indirectly).

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#717473: BUG: Null pointer deref in amd64_edac_mod/amd64_probe_one_instance during boot

2013-07-21 Thread Borislav Petkov
On Sun, Jul 21, 2013 at 06:41:52PM +0100, Ben Hutchings wrote:
 On Sun, 2013-07-21 at 09:54 +0100, Roger Leigh wrote:
  If the bug is in amd64_edac_mod, there are only two possible commits
  which could cause the problem:
  
  1eef12825 amd64_edac: Correct DIMM sizes
  94c1acf2c amd64_edac: Add Family 16h support
  
  Which are the only commits between 3.8 and 3.9 (and none made since)
 
 The crash is at this line:
 
   csrow-channels[1]-dimm-nr_pages = row_dct1_pages;

Hmm.

If I can read it correctly above, 3.8 works for you Roger, correct?

If so, can you please enable CONFIG_EDAC_DEBUG, rebuild 3.8 and boot
your machine with it and send me the full dmesg of the boot?

 with csrow-channels[1]-dimm == NULL.
 
 This code was introduced by the first commit above.  Does the patch
 below fix this?
 
 Ben.
 
 ---
 [PATCH] amd64_edac: Fix crash in init_csrows() for memory controller in 
 64-bit mode
 
 init_csrows() assumes all processesors after K8 have 2 memory channels.
 But these processors support a mode where only one channel is used.
 It seems that csrow_enabled() may still return true for the second
 channel (BIOS bug?).

Ok, I think I know what the problem is:

[5.815246] EDAC amd64: DRAM ECC enabled.
[5.816328] EDAC amd64: F15h detected (node 0).
[5.817397] EDAC amd64: MC: 0: 0MB 1: 0MB
[5.818379] EDAC amd64: MC: 2: 0MB 3: 0MB
[5.819332] EDAC amd64: MC: 4: 0MB 5: 0MB
[5.820250] EDAC amd64: MC: 6: 0MB 7: 0MB
[5.821176] EDAC amd64: MC: 0:  4096MB 1:  4096MB
[5.822131] EDAC amd64: MC: 2:  4096MB 3:  4096MB
[5.823048] EDAC amd64: MC: 4: 0MB 5: 0MB
[5.823927] EDAC amd64: MC: 6: 0MB 7: 0MB
[5.824818] EDAC amd64: using x4 syndromes.
[5.825680] EDAC amd64: MCT channel count: 1

Roger's DIMMs are only on the one channel and the second one is empty.
Btw, Roger, you might want to move one of the DIMMs to a another DIMM
socket on the board so that you can use both channels for performance
reasons.

I'm saying one of the DIMMs because I'm assuming those 4G above are
dual-ranked DIMMs and you have two 8G DIMMs on the board.

If they're single-ranked i.e. 4G each, then you shouldn't have any
choice because your board has only 4 DIMM slots anyway, AFAICT from the
manual:

http://www.asus.com/Motherboards/SABERTOOTH_990FX_R20/#support_Download_10

Which would be buggy because you're still using only one DCT.

Btw if you have two DIMMs on there, please put them according to the
recommended memory configurations, i.e. one in A2 and the other in B2.

Here's my layout, for example:

[5.890887] EDAC MC: DCT0 chip selects:
[5.890888] EDAC amd64: MC: 0:  2048MB 1:  2048MB
[5.890889] EDAC amd64: MC: 2:  2048MB 3:  2048MB
[5.890890] EDAC amd64: MC: 4: 0MB 5: 0MB
[5.890891] EDAC amd64: MC: 6: 0MB 7: 0MB
[5.890893] EDAC MC: DCT1 chip selects:
[5.890894] EDAC amd64: MC: 0:  2048MB 1:  2048MB
[5.890894] EDAC amd64: MC: 2:  2048MB 3:  2048MB
[5.890895] EDAC amd64: MC: 4: 0MB 5: 0MB
[5.890896] EDAC amd64: MC: 6: 0MB 7: 0MB
[5.890897] EDAC amd64: using x4 syndromes.
[5.890901] EDAC amd64: MCT channel count: 2

And I have 4 DIMM slots occupied.

 Check pvt-channel_count before csrow_enabled(), and remove the family
 number conditions.
 
 Reported-by: Roger Leigh rle...@debian.org
 Signed-off-by: Ben Hutchings b...@decadent.org.uk
 Cc: 717...@bugs.debian.org
 ---
  drivers/edac/amd64_edac.c | 9 +++--
  1 file changed, 3 insertions(+), 6 deletions(-)
 
 diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
 index 8b6a034..be9c2fe 100644
 --- a/drivers/edac/amd64_edac.c
 +++ b/drivers/edac/amd64_edac.c
 @@ -2084,10 +2084,8 @@ static int init_csrows(struct mem_ctl_info *mci)
*/
   for_each_chip_select(i, 0, pvt) {
   bool row_dct0 = !!csrow_enabled(i, 0, pvt);
 - bool row_dct1 = false;
 -
 - if (boot_cpu_data.x86 != 0xf)
 - row_dct1 = !!csrow_enabled(i, 1, pvt);

Ok, this shouldn't be set if DCT1 doesn't have enabled csrows. So yes,
Roger, that debugging output would be of great help.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org