Re: [CentOS] C 7: smpboot: CPU 16 is now offline, and slabs...
m.r...@5-cent.us wrote: > m.r...@5-cent.us wrote: >> m.r...@5-cent.us wrote: >>> Current kernel, and I just booted, and dmesg shows, of the 32 cores, 0, >>> 2, 4 and 6 ok, and *all* other show "is now offline. >>> >>> What's happening here? > > Ok, more info. I found how to online a CPU - > echo 1 > /sys/devices/system/cpu/cpu23/online > > Perhaps I should have started with 1,3, etc, but I was doing the 20's, > instead. Got to CPU27... and the system rebooted. > > Now I'm wondering if the offline'd CPUs have something to do with the fact > that this (and an identical one, in the datacenter, are rebooting around > 04:00 every day. Btw, they're Dell PE R530's from 2016 > Still more info (come on, folks, help me out!): these two machines that keep rebooting, and only one other that doesn't, have Intel E5-2630's in them. These two are v3, while the one other is a v.2. The latter's microcode is microcode: CPU0 sig=0x306e4, pf=0x1, revision=0x428 while on the two that reboot, they have microcode: CPU0 sig=0x306f2, pf=0x1, revision=0x3a Anyone think I might be going down the wrong path? Any thoughts at all? If not, any cmts on my downgrading to the previous microcode? This happened once a week ago, and then, starting last Friday, began happening at least around 04:00 every day. mark ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] C 7: smpboot: CPU 16 is now offline, and slabs...
m.r...@5-cent.us wrote: > m.r...@5-cent.us wrote: >> Current kernel, and I just booted, and dmesg shows, of the 32 cores, 0, >> 2, 4 and 6 ok, and *all* other show "is now offline. >> >> What's happening here? Ok, more info. I found how to online a CPU - echo 1 > /sys/devices/system/cpu/cpu23/online Perhaps I should have started with 1,3, etc, but I was doing the 20's, instead. Got to CPU27... and the system rebooted. Now I'm wondering if the offline'd CPUs have something to do with the fact that this (and an identical one, in the datacenter, are rebooting around 04:00 every day. Btw, they're Dell PE R530's from 2016 mark ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] C 7: smpboot: CPU 16 is now offline, and slabs...
m.r...@5-cent.us wrote: > Current kernel, and I just booted, and dmesg shows, of the 32 cores, 0, 2, > 4 and 6 ok, and *all* other show "is now offline. > > What's happening here? > A followup: I also find a core in /var/spool/abrt, and "reason" is kernel BUG at mm/slub.c:3601! In googling, I see threads about incorrect calculation of slabs. Following one thread, I find cat /sys/kernel/slab/:t-048/cpu_slabs gives me 4 N0=4 Meanwhile, slabtop shows Active / Total Slabs (% used) : 25927 / 25927 (100.0%) Which changes, but just varying around that number, and st 100%. So: should I increase the number of slabs, using the kernel parm of swiotlb, and if so, for what I show above, should I set it to, say, 32000? mark ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos