Re: [DNG] random sudden stops
On Fri, 27 Aug 2021 00:32:05 -0400, Steve wrote in message <20210827003205.55c65...@mydesk.domain.cxm>: > Hendrik Boom said on Thu, 26 Aug 2021 11:55:12 -0400 > > >On Wed, Aug 25, 2021 at 09:16:06PM -0400, william moss via Dng > >wrote: > >> On 8/25/21 8:10 PM, Hendrik Boom wrote: > >> > For the past few months my home server (running an ascii > >> > installation physically moved from another computer) has been > >> > suddenly stopping all processing about once a month. apparently > >> > at random. It seems to stop instantly, leaving power on and > >> > becoming completely responsive to ping, existing ssh connexions > >> > and use of the physical keyboard. > >> > > >> > The system log, after a reboot, shows nothing unusual except of > >> > course that there are no log entries for a shut-down. > >> > > >> > Can anyone provide ideas about tracking this down? > >> > > >> > It could of course be a random rare intermittent hardware error. > >> > > >> > -- hendrik > >> > ___ > >> > Dng mailing list > >> > Dng@lists.dyne.org > >> > https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng > >> > > >> I had the same problem on a work station running ASCII. Since I > >> could access the system from another machine on the LAN and even > >> log in, I guessed that it was Xorg. Killing X Via a remote login > >> cleared the problem. With the use of sar and other tools, I > >> determined it was the video card and/or NVIDIA's drivers (kernel > >> modules). Switched back to the system board's video (AMD) and the > >> problem went away. > > > >Not running X on this machine. Just have the usual text consoles on > >cntl-alt-F1 through F6. > > > >Don't have a separate video card either. > > The first time I read your symptom, my first thought was "I bet he has > an nVidia card, just like I did before switching. So, acknowledging > that you never run X and might not even have any nVidia drivers > installed (if you do, I suggest removing them, under the > circumstances), is your built in card an nVidia? If so, do you have a > less than 5 year old Radeon to temporarily install while disabling > your nVidia in BIOS? After my horrendous intermittent hangs and > reboots of November and December 2020, I would never use any nVidia > graphics unit with Linux again. If I somehow acquired a computer with > built in nVidia graphics, I'd disable the built-in and use a Radeon. > Even if I didn't use X. ..or just kill off any of nVidia's proprietary drivers and use the nouveau driver. Caviat: My last Radeon purchase, 9 years ago was a 2nd hand HD 4890 that required a new powersupply with (an 8 pin plug AFAIR?), so I had to use that box filler Nvidea GeForce GTS 250 that came along the 4890 to get that powersupply, the 250 came bang right up on X @ 2048x1536 on the nouveau driver. It drove FlightGear at a flyable 9 to 15fps AFAIR, and the FlightGear developers svore it would be much faster on nVidia's proprietary driver, which I never got working, so I went with the 4890 on radeon. -- ..med vennlig hilsen = with Kind Regards from Arnt Karlsen ...with a number of polar bear hunters in his ancestry... Scenarios always come in sets of three: best case, worst case, and just in case. ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [DNG] random sudden stops
Hendrik Boom said on Thu, 26 Aug 2021 11:55:12 -0400 >On Wed, Aug 25, 2021 at 09:16:06PM -0400, william moss via Dng wrote: >> On 8/25/21 8:10 PM, Hendrik Boom wrote: >> > For the past few months my home server (running an ascii >> > installation physically moved from another computer) has been >> > suddenly stopping all processing about once a month. apparently at >> > random. It seems to stop instantly, leaving power on and becoming >> > completely responsive to ping, existing ssh connexions and use of >> > the physical keyboard. >> > >> > The system log, after a reboot, shows nothing unusual except of >> > course that there are no log entries for a shut-down. >> > >> > Can anyone provide ideas about tracking this down? >> > >> > It could of course be a random rare intermittent hardware error. >> > >> > -- hendrik >> > ___ >> > Dng mailing list >> > Dng@lists.dyne.org >> > https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng >> > >> I had the same problem on a work station running ASCII. Since I could >> access the system from another machine on the LAN and even log in, I >> guessed that it was Xorg. Killing X Via a remote login cleared the >> problem. With the use of sar and other tools, I determined it was the >> video card and/or NVIDIA's drivers (kernel modules). Switched back to >> the system board's video (AMD) and the problem went away. > >Not running X on this machine. Just have the usual text consoles on >cntl-alt-F1 through F6. > >Don't have a separate video card either. The first time I read your symptom, my first thought was "I bet he has an nVidia card, just like I did before switching. So, acknowledging that you never run X and might not even have any nVidia drivers installed (if you do, I suggest removing them, under the circumstances), is your built in card an nVidia? If so, do you have a less than 5 year old Radeon to temporarily install while disabling your nVidia in BIOS? After my horrendous intermittent hangs and reboots of November and December 2020, I would never use any nVidia graphics unit with Linux again. If I somehow acquired a computer with built in nVidia graphics, I'd disable the built-in and use a Radeon. Even if I didn't use X. SteveT Steve Litt Spring 2021 featured book: Troubleshooting Techniques of the Successful Technologist http://www.troubleshooters.com/techniques ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [DNG] random sudden stops
On Wed, Aug 25, 2021 at 08:10:55PM -0400, Hendrik Boom wrote: > For the past few months my home server (running an ascii installation > physically moved from another computer) has been suddenly stopping all > processing about once a month. Quite seriously, check it for excessive dust. Heat can do that. You can also keep a baseline of that... Here's what I use: $ cat bin/heat #!/bin/sh watch -n 5 "sensors ; top -b | head -20" I also recently learned about cpulimit(1), which is really useful for, as an example, transcoding. Could easily be something else, but checking for dust isn't a bad idea. -- Mason Loring Bliss (( If I have not seen as far as others, it is because ma...@blisses.org )) giants were standing on my shoulders. - Hal Abelson signature.asc Description: PGP signature ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [DNG] random sudden stops
On Thursday, August 26th, 2021 at 1:10 AM, Hendrik Boom wrote: > For the past few months my home server (running an ascii installation > physically moved from another computer) has been suddenly stopping all > processing about once a month. apparently at random. It seems to stop > instantly, leaving power on and becoming completely responsive to ping, > existing ssh connexions and use of the physical keyboard. > The system log, after a reboot, shows nothing unusual except of course > that there are no log entries for a shut-down. > Can anyone provide ideas about tracking this down? > It could of course be a random rare intermittent hardware error. > -- hendrik Sounds like a Kernel panic, which can be tricky to resolve. My first step would be to enable the Magic SysReq Key and wait for a system freeze to see if it can reveal anything. https://en.wikipedia.org/wiki/Magic_SysRq_key publickey - g4sra@protonmail.com - 0x42E94623.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [DNG] random sudden stops
Hendrik Boom wrote: > When the machine stops I cannot access it by network. Even existing > connexions stop working. Have you disabled console screen blanking (IIRC “setterm --blank 0”)so that any messages put out are readable ? Perhaps you’ve already tried that and there’s no clues given ? Simon ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [DNG] random sudden stops
On Wed, Aug 25, 2021 at 09:16:06PM -0400, william moss via Dng wrote: > On 8/25/21 8:10 PM, Hendrik Boom wrote: > > For the past few months my home server (running an ascii installation > > physically moved from another computer) has been suddenly stopping all > > processing about once a month. apparently at random. It seems to stop > > instantly, leaving power on and becoming completely responsive to ping, > > existing ssh connexions and use of the physical keyboard. > > > > The system log, after a reboot, shows nothing unusual except of course > > that there are no log entries for a shut-down. > > > > Can anyone provide ideas about tracking this down? > > > > It could of course be a random rare intermittent hardware error. > > > > -- hendrik > > ___ > > Dng mailing list > > Dng@lists.dyne.org > > https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng > > > I had the same problem on a work station running ASCII. Since I could > access the system from another machine on the LAN and even log in, I > guessed that it was Xorg. Killing X Via a remote login cleared the > problem. With the use of sar and other tools, I determined it was the > video card and/or NVIDIA's drivers (kernel modules). Switched back to > the system board's video (AMD) and the problem went away. Not running X on this machine. Just have the usual text consoles on cntl-alt-F1 through F6. Don't have a separate video card either. When the machine stops I cannot access it by network. Even existing connexions stop working. Being ext4 with full journalling, the file system is safe. If it's video drivers, maybe upgrading to beowulf will clear it out? Who knows? It's probably time to do that anyway. There is, I su[[ose, a slight chance that the specific installation of ascii I had on the hard drive I moved from another machine isn't quite compatible with the hardware I have now. But they're both AMd64 processors of comparable vintage. -- hendrik > > Hope this helps. > > -- > William (Bill) Moss > billm...@acm.org > NY (USA) > Those who will not reason, are bigots, > those who cannot, are fools, > and those who dare not, are slaves. > Lord Byron > > Justice will not be served until those who are > unaffected are as outraged as those who are. > Benjamin Franklin > > When the people fear the government there is > tyranny, when the government fears the people > there is liberty. > John Basil Barnhill > ___ > Dng mailing list > Dng@lists.dyne.org > https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [DNG] random sudden stops
On 26/8/21 8:10 am, Hendrik Boom wrote: > For the past few months my home server (running an ascii installation > physically moved from another computer) has been suddenly stopping all > processing about once a month. apparently at random. It seems to stop > instantly, leaving power on and becoming completely responsive to ping, > existing ssh connexions and use of the physical keyboard. > > The system log, after a reboot, shows nothing unusual except of course > that there are no log entries for a shut-down. > > Can anyone provide ideas about tracking this down? > > It could of course be a random rare intermittent hardware error. Sounds like the perfect application for netconsole. I have a raspberry pi that runs some stuff, on that I installed udplogger : https://lwn.net/Articles/571589/ Run with : /usr/local/bin/udplogger port= dir=/root/udplogs/ I have a number of machines set up with netconsole on the command line, or loaded after boot. There are easier ways to do this, but for whatever reason this is what I use (I honestly don't recall) : DEST=192.168.24.218 mount none -t configfs /sys/kernel/config mkdir /sys/kernel/config/netconsole/target1 pushd /sys/kernel/config/netconsole/target1 echo 192.168.24.1 > local_ip echo $DEST > remote_ip echo br0 > dev_name arping -c1 $DEST | grep -o ..:..:..:..:..:.. > remote_mac echo 1 > enabled popd Or on the kernel command line : netconsole=@192.168.24.187/eth0,@192.168.42.218/ab:cd:ef:12:34:56 That way I pretty much always get the oops that never makes it to disk. 2021-07-09 11:19:14 192.168.24.187: [1076324.113147] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: radeon_dp_needs_link_train+0x69/0x70 [radeon] 2021-07-09 11:19:14 192.168.24.187: [1076324.113163] CPU: 4 PID: 4109 Comm: kworker/4:1 Not tainted 5.12.10+ #11 2021-07-09 11:19:14 192.168.24.187: [1076324.113170] Hardware name: Apple Inc. iMac12,2/Mac-XX, BIOS 87.0.0.0.0 06/14/2019 2021-07-09 11:19:14 192.168.24.187: [1076324.113174] Workqueue: events radeon_dp_work_func [radeon] 2021-07-09 11:19:14 192.168.24.187: [1076324.113229] Call Trace: 2021-07-09 11:19:14 192.168.24.187: [1076324.113232] dump_stack+0x64/0x7c 2021-07-09 11:19:14 192.168.24.187: [1076324.113237] panic+0xf6/0x280 2021-07-09 11:19:14 192.168.24.187: [1076324.113241] ? radeon_dp_needs_link_train+0x69/0x70 [radeon] 2021-07-09 11:19:14 192.168.24.187: [1076324.113267] __stack_chk_fail+0x10/0x10 2021-07-09 11:19:14 192.168.24.187: [1076324.113271] radeon_dp_needs_link_train+0x69/0x70 [radeon] 2021-07-09 11:19:14 192.168.24.187: [1076324.113297] radeon_connector_hotplug+0xa8/0xe0 [radeon] 2021-07-09 11:19:14 192.168.24.187: [1076324.113315] radeon_dp_work_func+0x28/0x40 [radeon] 2021-07-09 11:19:14 192.168.24.187: [1076324.113335] process_one_work+0x1c4/0x310 2021-07-09 11:19:14 192.168.24.187: [1076324.113339] worker_thread+0x240/0x3c0 2021-07-09 11:19:14 192.168.24.187: [1076324.113341] ? wq_update_unbound_numa+0x10/0x10 2021-07-09 11:19:14 192.168.24.187: [1076324.113344] kthread+0x10a/0x120 2021-07-09 11:19:14 192.168.24.187: [1076324.113346] ? kthread_park+0x80/0x80 2021-07-09 11:19:14 192.168.24.187: [1076324.113348] ret_from_fork+0x1f/0x30 2021-07-09 11:19:14 192.168.24.187: [1076324.113391] Kernel Offset: disabled 2021-07-09 11:19:14 192.168.24.187: [1076324.113393] Rebooting in 10 seconds.. 2021-07-09 11:19:24 192.168.24.187: [1076334.114131] ACPI MEMORY or I/O RESET_REG. Regards, Brad ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
Re: [DNG] random sudden stops
On 8/25/21 8:10 PM, Hendrik Boom wrote: > For the past few months my home server (running an ascii installation > physically moved from another computer) has been suddenly stopping all > processing about once a month. apparently at random. It seems to stop > instantly, leaving power on and becoming completely responsive to ping, > existing ssh connexions and use of the physical keyboard. > > The system log, after a reboot, shows nothing unusual except of course > that there are no log entries for a shut-down. > > Can anyone provide ideas about tracking this down? > > It could of course be a random rare intermittent hardware error. > > -- hendrik > ___ > Dng mailing list > Dng@lists.dyne.org > https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng > I had the same problem on a work station running ASCII. Since I could access the system from another machine on the LAN and even log in, I guessed that it was Xorg. Killing X Via a remote login cleared the problem. With the use of sar and other tools, I determined it was the video card and/or NVIDIA's drivers (kernel modules). Switched back to the system board's video (AMD) and the problem went away. Hope this helps. -- William (Bill) Moss billm...@acm.org NY (USA) Those who will not reason, are bigots, those who cannot, are fools, and those who dare not, are slaves. Lord Byron Justice will not be served until those who are unaffected are as outraged as those who are. Benjamin Franklin When the people fear the government there is tyranny, when the government fears the people there is liberty. John Basil Barnhill ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
[DNG] random sudden stops
For the past few months my home server (running an ascii installation physically moved from another computer) has been suddenly stopping all processing about once a month. apparently at random. It seems to stop instantly, leaving power on and becoming completely responsive to ping, existing ssh connexions and use of the physical keyboard. The system log, after a reboot, shows nothing unusual except of course that there are no log entries for a shut-down. Can anyone provide ideas about tracking this down? It could of course be a random rare intermittent hardware error. -- hendrik ___ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng