[DNG] Some RAID1's inaccessible after upgrade to beowulf from ascii
I upgraded my server to beowulf. After rebooting, all home directories except root's are no longer accessible. They are all on an LVM on software RAID. The problem seems to be that two of my three RAID1 systems are not starting up properly. What can I do about it? hendrik@april:/$ cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md1 : inactive sda2[3](S) 2391296000 blocks super 1.2 md2 : inactive sda3[0](S) 1048512 blocks md0 : active raid1 sdf4[1] 706337792 blocks [2/1] [_U] unused devices: hendrik@april:/$ hendrik@april:/$ cat /etc/mdadm/mdadm.conf DEVICE partitions ARRAY /dev/md0 level=raid1 num-devices=2 UUID=4dc189ba:e7a12d38:e6262cdf:db1beda2 ARRAY /dev/md1 metadata=1.2 name=april:1 UUID=c328565c:16dce536:f16da6e2:db603645 ARRAY /dev/md2 UUID=5d63f486:183fd2ea:c2a3a88f:cb2b61de MAILADDR root hendrik@april:/$ The standard recommendation seems to be to replace lines in /etc/mdadm/mdadm.conf by lines prouced by mdadm --examine --scan: april:~# mdadm --examine --scan ARRAY /dev/md/1 metadata=1.2 UUID=c328565c:16dce536:f16da6e2:db603645 name=april:1 ARRAY /dev/md2 UUID=5d63f486:183fd2ea:c2a3a88f:cb2b61de ARRAY /dev/md0 UUID=4dc189ba:e7a12d38:e6262cdf:db1beda2 april:~# But this replacement involves changing a line that dies work (md0), not changing one that did not (md2), and changing another one that did not work (md1). Since --examine's suggested changes seem uncorrelated with the active/inactive record, I have little faith in this alleged fix without first gaining more understanding. -- hendrik ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [DNG] Is it dead yet? -- SOLVED
Not dead, but just injured. Replaced two 312M RAM cards with one faster 2G RAM card. Old RAM defective; tech told me that its speed was actually mismatched to the rest of the system, slowing it down. Of course "defective" was the crucial issue, not speed. Machine works fine now, running devuan ascii. Next plan: upgrade to current chaemera in stages Thank you for all the testing suggestions. They really helped narrow down the problem. Not mention moral support. -- hendrik ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [DNG] Is it dead yet?
Chimaera didn't help, though the symptoms changed. I booted from the chimaera live desktop, and it started looking hopeful, but before much was running it had a kernel panic -- attempt to kill init. Yes, I checked the checksum on the download. And when I used chimaera to boot another computer (a Purism laptop) it worked flawlessly. I'm once more suspecting hardware. Next stop -- memtest. And memtest shows an enormous number of memory failures. Which makes me suspect something systemic, not just a bit here or there. I'll have to replace the RAM, I guess. Or find out what memory bus is failing. -- hendrik On Tue, Oct 26, 2021 at 11:26 AM tempforever wrote: > Download links are available on the devuan download page > https://www.devuan.org/get-devuan > > choose a mirror (for example, mirror.leaseweb.com/devuan/) then navigate > to the devuan_chimaera/ directory where you will hopefully find a > desktop-live/ directory which should contain > devuan_chimaera_4.0.0_[amd64|i386]_desktop-live.iso along with the > shasums files. If you prefer not to use http(s), there are ftp mirrors > and torrent on that same download page. > > Hendrik Boom via Dng wrote: > > Then I'd have to figure out how to get a new kernel into it while it > crashes during bootup, > which it now does quite consistently. > > If I had > made recent > changes (such as upgrades) I'd consider a kernel problem more likely. > > But it has been running for months with only occasional problems, and > now I get problems > every time I boot. Since I haven't done an upgrade > at all yet this month, I'll suspect hardware. > > Still, I will try, > say, a chimaera live CD if I can find one. Where are the Devuan live > CD's? > I seem to remember that they were called something other than > "devuan". > (Yes, I'm aware that the live CD will likely be a USB stick) > > > -- hendrik > > On Tue, Oct 26, 2021 at 7:47 AM Boian Bonev > mailto:bbo...@ipacct.com>> wrote: > > ___ > Dng mailing list > Dng@lists.dyne.org > https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng > ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [DNG] Is it dead yet?
Then I'd have to figure out how to get a new kernel into it while it crashes during bootup, which it now does quite consistently. If I had made recent changes (such as upgrades) I'd consider a kernel problem more likely. But it has been running for months with only occasional problems, and now I get problems every time I boot. Since I haven't done an upgrade at all yet this month, I'll suspect hardware. Still, I will try, say, a chimaera live CD if I can find one. Where are the Devuan live CD's? I seem to remember that they were called something other than "devuan". (Yes, I'm aware that the live CD will likely be a USB stick) -- hendrik On Tue, Oct 26, 2021 at 7:47 AM Boian Bonev wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA512 > > Hi Hendrik, > > It is clearly saying that is a soft lockup, if it was a hardware problem it > would say hard lockup isn't it? > > That problem really does not indicate a hardware problem, most probably a > bug > in the kernel; it can also show when a kernel driver is waiting for e.g. > hard > drive that is slow to respond. Try upgrading to a more recent kernel (e.g. > 5.10.x or 5.14.x) - maybe the problem is already fixed there? It may also > be a > temporary condition - a very rare edge case that made the kernel lockup, > then a > reboot will help to clean the state... > > With best regards, > b. > > On Mon, 2021-10-25 at 20:28 -0400, Hendrik Boom via Dng wrote: > > I think I may have a clue to the mysterious stoppages of my server. > > I ssh'd into it today and got the following. It is time to give up on > this > > machine and replace it? > > > > -- hendrik > > > > > > hendrik@april:~$ su - > > Password: > > april:~# ps | grep lighttpd > > ^C^C > > Message from syslogd@april at Oct 25 20:15:48 ... > > kernel:[ 5700.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for > 22s! > > [ps:9184] > > > > Message from syslogd@april at Oct 25 20:16:16 ... > > kernel:[ 5728.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for > 22s! > > [ps:9184] > > > > Message from syslogd@april at Oct 25 20:16:52 ... > > kernel:[ 5764.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for > 23s! > > [ps:9184] > > > > > > Message from syslogd@april at Oct 25 20:17:20 ... > > kernel:[ 5792.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for > 22s! > > [ps:9184] > > > > Message from syslogd@april at Oct 25 20:17:56 ... > > kernel:[ 5828.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for > 22s! > > [ps:9184] > > > > Message from syslogd@april at Oct 25 20:18:24 ... > > kernel:[ 5856.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for > 22s! > > [ps:9184] > > > > Message from syslogd@april at Oct 25 20:18:56 ... > > kernel:[ 5888.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for > 22s! > > [ps:9184] > > > > Message from syslogd@april at Oct 25 20:19:24 ... > > kernel:[ 5916.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for > 22s! > > [ps:9184] > > > > Message from syslogd@april at Oct 25 20:20:00 ... > > kernel:[ 5952.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for > 22s! > > [ps:9184] > > > > Message from syslogd@april at Oct 25 20:20:28 ... > > kernel:[ 5980.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for > 22s! > > [ps:9184] > > > > -BEGIN PGP SIGNATURE- > > iQIzBAEBCgAdFiEEumC8IPN+WURNbSUAE2VyCRPS8i0FAmF36qsACgkQE2VyCRPS > 8i13tA//Yc6FzSg/wMjAm1am11kLHIzp31y0v+59ZGuXtHu7h1gatrYrnRAUwnRM > MlaSFkiLO0HWr03db/6GaI0IJsWUWg9hNHuWlbBiQ0fZifCU8zjLLA2nU947t25d > gBFshWFVqwZJ6rEBguZFZpg9AlGe/Ppslc//ONs2C3abvLzfVn/owWbH5Mx8+O6u > raqVYutH7D/qD04I3JNyPQsWURjpVxES579cI1uqG+9wZ6L946YDChH6s4B8an+u > QXLZZyPm4wLapSYuxjM4Es5UmtzN9QY59UtavTnrAPqNtudyuNBPtSfzHtycQsa8 > H0BgxYp2IQxwFQMzAaFTehpQ/eUTqt4DEta40LLK2GZkKs21259RTXTE8LM1Aqrn > T+WsBO5yg+UoxW/mzyipk7nUwzFYA1xUjYvVKyZEXsIyufrxPCvFCuPmV28XiVPu > q0x0hfp7/7vxN8khMBIOeWROM6cEC8CV9wPgVhj4Y7vmHJou7tokB/XD4ugvf5U6 > Ym5c+WQjXiDXVUoYMqLZNs75N5UV3AREq52mr8aPpu6FxCUDjKEywxSBpjuQu7Ch > 6u5B6BxfWw8i/nXb0lAVEYpCp0xHQcqZvVBidEnle2QQ0djV0vNaNUS6vBbsa8F2 > cJSIyWcmHDRkdCv1mj7ot1OpuxUmvBFbsl1sAaFWUCfwVMqqGSE= > =Bbpj > -END PGP SIGNATURE- > > ___ > Dng mailing list > Dng@lists.dyne.org > https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng > ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
[DNG] Is it dead yet?
I think I may have a clue to the mysterious stoppages of my server. I ssh'd into it today and got the following. It is time to give up on this machine and replace it? -- hendrik hendrik@april:~$ su - Password: april:~# ps | grep lighttpd ^C^C Message from syslogd@april at Oct 25 20:15:48 ... kernel:[ 5700.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ps:9184] Message from syslogd@april at Oct 25 20:16:16 ... kernel:[ 5728.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ps:9184] Message from syslogd@april at Oct 25 20:16:52 ... kernel:[ 5764.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [ps:9184] Message from syslogd@april at Oct 25 20:17:20 ... kernel:[ 5792.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ps:9184] Message from syslogd@april at Oct 25 20:17:56 ... kernel:[ 5828.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ps:9184] Message from syslogd@april at Oct 25 20:18:24 ... kernel:[ 5856.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ps:9184] Message from syslogd@april at Oct 25 20:18:56 ... kernel:[ 5888.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ps:9184] Message from syslogd@april at Oct 25 20:19:24 ... kernel:[ 5916.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ps:9184] Message from syslogd@april at Oct 25 20:20:00 ... kernel:[ 5952.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ps:9184] Message from syslogd@april at Oct 25 20:20:28 ... kernel:[ 5980.156005] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ps:9184] ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng