Re: [DNG] random sudden stops

2021-08-27 Thread Arnt Karlsen
On Fri, 27 Aug 2021 00:32:05 -0400, Steve wrote in message 
<20210827003205.55c65...@mydesk.domain.cxm>:

> Hendrik Boom said on Thu, 26 Aug 2021 11:55:12 -0400
> 
> >On Wed, Aug 25, 2021 at 09:16:06PM -0400, william moss via Dng
> >wrote:  
> >> On 8/25/21 8:10 PM, Hendrik Boom wrote:
> >> > For the past few months my home server (running an ascii
> >> > installation physically moved from another computer) has been
> >> > suddenly stopping all processing about once a month. apparently
> >> > at random.  It seems to stop instantly, leaving power on and
> >> > becoming completely responsive to ping, existing ssh connexions
> >> > and use of the physical keyboard.
> >> > 
> >> > The system log, after a reboot, shows nothing unusual except of
> >> > course that there are no log entries for a shut-down.
> >> > 
> >> > Can anyone provide ideas about tracking this down?
> >> > 
> >> > It could of course be a random rare intermittent hardware error.
> >> > 
> >> > -- hendrik
> >> > ___
> >> > Dng mailing list
> >> > Dng@lists.dyne.org
> >> > https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
> >> > 
> >> I had the same problem on a work station running ASCII. Since I
> >> could access the system from another machine on the LAN and even
> >> log in, I guessed that it was Xorg. Killing X Via a remote login
> >> cleared the problem. With the use of sar and other tools, I
> >> determined it was the video card and/or NVIDIA's drivers (kernel
> >> modules). Switched back to the system board's video (AMD) and the
> >> problem went away.
> >
> >Not running X on this machine.  Just have the usual text consoles on 
> >cntl-alt-F1 through F6.
> >
> >Don't have a separate video card either.  
> 
> The first time I read your symptom, my first thought was "I bet he has
> an nVidia card, just like I did before switching. So, acknowledging
> that you never run X and might not even have any nVidia drivers
> installed (if you do, I suggest removing them, under the
> circumstances), is your built in card an nVidia? If so, do you have a
> less than 5 year old Radeon to temporarily install while disabling
> your nVidia in BIOS? After my horrendous intermittent hangs and
> reboots of November and December 2020, I would never use any nVidia
> graphics unit with Linux again. If I somehow acquired a computer with
> built in nVidia graphics, I'd disable the built-in and use a Radeon.
> Even if I didn't use X.

..or just kill off any of nVidia's proprietary drivers and use the 
nouveau driver.  
Caviat: My last Radeon purchase, 9 years ago was a 2nd hand HD 4890 that
required a new powersupply with (an 8 pin plug AFAIR?), so I had to use 
that box filler Nvidea GeForce GTS 250 that came along the 4890 to get 
that powersupply, the 250 came bang right up on X @ 2048x1536 on the
nouveau driver.  It drove FlightGear at a flyable 9 to 15fps AFAIR, 
and the FlightGear developers svore it would be much faster on nVidia's
proprietary driver, which I never got working, so I went with the 
4890 on radeon.

-- 
..med vennlig hilsen = with Kind Regards from Arnt Karlsen
...with a number of polar bear hunters in his ancestry...
  Scenarios always come in sets of three: 
  best case, worst case, and just in case.
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


Re: [DNG] random sudden stops

2021-08-26 Thread Steve Litt
Hendrik Boom said on Thu, 26 Aug 2021 11:55:12 -0400

>On Wed, Aug 25, 2021 at 09:16:06PM -0400, william moss via Dng wrote:
>> On 8/25/21 8:10 PM, Hendrik Boom wrote:  
>> > For the past few months my home server (running an ascii
>> > installation physically moved from another computer) has been
>> > suddenly stopping all processing about once a month. apparently at
>> > random.  It seems to stop instantly, leaving power on and becoming
>> > completely responsive to ping, existing ssh connexions and use of
>> > the physical keyboard.
>> > 
>> > The system log, after a reboot, shows nothing unusual except of
>> > course that there are no log entries for a shut-down.
>> > 
>> > Can anyone provide ideas about tracking this down?
>> > 
>> > It could of course be a random rare intermittent hardware error.
>> > 
>> > -- hendrik
>> > ___
>> > Dng mailing list
>> > Dng@lists.dyne.org
>> > https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
>> >   
>> I had the same problem on a work station running ASCII. Since I could
>> access the system from another machine on the LAN and even log in, I
>> guessed that it was Xorg. Killing X Via a remote login cleared the
>> problem. With the use of sar and other tools, I determined it was the
>> video card and/or NVIDIA's drivers (kernel modules). Switched back to
>> the system board's video (AMD) and the problem went away.  
>
>Not running X on this machine.  Just have the usual text consoles on 
>cntl-alt-F1 through F6.
>
>Don't have a separate video card either.

The first time I read your symptom, my first thought was "I bet he has
an nVidia card, just like I did before switching. So, acknowledging
that you never run X and might not even have any nVidia drivers
installed (if you do, I suggest removing them, under the
circumstances), is your built in card an nVidia? If so, do you have a
less than 5 year old Radeon to temporarily install while disabling your
nVidia in BIOS? After my horrendous intermittent hangs and reboots of
November and December 2020, I would never use any nVidia graphics unit
with Linux again. If I somehow acquired a computer with built in nVidia
graphics, I'd disable the built-in and use a Radeon. Even if I didn't
use X.

SteveT

Steve Litt 
Spring 2021 featured book: Troubleshooting Techniques of the Successful
Technologist http://www.troubleshooters.com/techniques
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


Re: [DNG] random sudden stops

2021-08-26 Thread Mason Loring Bliss
On Wed, Aug 25, 2021 at 08:10:55PM -0400, Hendrik Boom wrote:

> For the past few months my home server (running an ascii installation 
> physically moved from another computer) has been suddenly stopping all 
> processing about once a month.

Quite seriously, check it for excessive dust. Heat can do that. You can
also keep a baseline of that... Here's what I use:

$ cat bin/heat 
#!/bin/sh
watch -n 5 "sensors ; top -b | head -20"

I also recently learned about cpulimit(1), which is really useful for, as
an example, transcoding.

Could easily be something else, but checking for dust isn't a bad idea.

-- 
Mason Loring Bliss  ((   If I have not seen as far as others, it is because
 ma...@blisses.org   ))   giants were standing on my shoulders. - Hal Abelson


signature.asc
Description: PGP signature
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


Re: [DNG] random sudden stops

2021-08-26 Thread g4sra via Dng

On Thursday, August 26th, 2021 at 1:10 AM, Hendrik Boom 
 wrote:
> For the past few months my home server (running an ascii installation
> physically moved from another computer) has been suddenly stopping all
> processing about once a month. apparently at random. It seems to stop
> instantly, leaving power on and becoming completely responsive to ping,
> existing ssh connexions and use of the physical keyboard.
> The system log, after a reboot, shows nothing unusual except of course
> that there are no log entries for a shut-down.

> Can anyone provide ideas about tracking this down?
> It could of course be a random rare intermittent hardware error.

> -- hendrik

Sounds like a Kernel panic, which can be tricky to resolve.
My first step would be to enable the Magic SysReq Key and 

wait for a system freeze to see if it can reveal anything.

https://en.wikipedia.org/wiki/Magic_SysRq_key



publickey - g4sra@protonmail.com - 0x42E94623.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


Re: [DNG] random sudden stops

2021-08-26 Thread Simon
Hendrik Boom  wrote:

> When the machine stops I cannot access it by network.  Even existing 
> connexions stop working.

Have you disabled console screen blanking (IIRC “setterm --blank 0”)so that any 
messages put out are readable ?
Perhaps you’ve already tried that and there’s no clues given ?

Simon

___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


Re: [DNG] random sudden stops

2021-08-26 Thread Hendrik Boom
On Wed, Aug 25, 2021 at 09:16:06PM -0400, william moss via Dng wrote:
> On 8/25/21 8:10 PM, Hendrik Boom wrote:
> > For the past few months my home server (running an ascii installation 
> > physically moved from another computer) has been suddenly stopping all 
> > processing about once a month. apparently at random.  It seems to stop 
> > instantly, leaving power on and becoming completely responsive to ping,
> > existing ssh connexions and use of the physical keyboard.
> > 
> > The system log, after a reboot, shows nothing unusual except of course 
> > that there are no log entries for a shut-down.
> > 
> > Can anyone provide ideas about tracking this down?
> > 
> > It could of course be a random rare intermittent hardware error.
> > 
> > -- hendrik
> > ___
> > Dng mailing list
> > Dng@lists.dyne.org
> > https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
> > 
> I had the same problem on a work station running ASCII. Since I could
> access the system from another machine on the LAN and even log in, I
> guessed that it was Xorg. Killing X Via a remote login cleared the
> problem. With the use of sar and other tools, I determined it was the
> video card and/or NVIDIA's drivers (kernel modules). Switched back to
> the system board's video (AMD) and the problem went away.

Not running X on this machine.  Just have the usual text consoles on 
cntl-alt-F1 through F6.

Don't have a separate video card either.

When the machine stops I cannot access it by network.  Even existing 
connexions stop working.  Being ext4 with full journalling, the file 
system is safe.

If it's video drivers, maybe upgrading to beowulf will clear it out?  Who 
knows?  It's probably time to do that anyway.

There is, I su[[ose, a slight chance that the specific installation of 
ascii I had on the hard drive I moved from another machine isn't quite 
compatible with the hardware I have now.  But they're both AMd64 
processors of comparable vintage.

-- hendrik

> 
> Hope this helps.
> 
> -- 
> William (Bill) Moss
> billm...@acm.org
> NY (USA)
> Those who will not reason, are bigots,
> those who cannot, are fools,
> and those who dare not, are slaves.
> Lord Byron
> 
> Justice will not be served until those who are
> unaffected are as outraged as those who are.
> Benjamin Franklin
> 
> When the people fear the government there is
> tyranny, when the government fears the people
> there is liberty.
> John Basil Barnhill
> ___
> Dng mailing list
> Dng@lists.dyne.org
> https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


Re: [DNG] random sudden stops

2021-08-25 Thread Brad Campbell via Dng
On 26/8/21 8:10 am, Hendrik Boom wrote:
> For the past few months my home server (running an ascii installation 
> physically moved from another computer) has been suddenly stopping all 
> processing about once a month. apparently at random.  It seems to stop 
> instantly, leaving power on and becoming completely responsive to ping,
> existing ssh connexions and use of the physical keyboard.
> 
> The system log, after a reboot, shows nothing unusual except of course 
> that there are no log entries for a shut-down.
> 
> Can anyone provide ideas about tracking this down?
> 
> It could of course be a random rare intermittent hardware error.

Sounds like the perfect application for netconsole.

I have a raspberry pi that runs some stuff, on that I installed udplogger : 
https://lwn.net/Articles/571589/
Run with : /usr/local/bin/udplogger port= dir=/root/udplogs/

I have a number of machines set up with netconsole on the command line, or 
loaded after boot. There are easier ways to do this, but for whatever reason 
this is what I use (I honestly don't recall) :

DEST=192.168.24.218
mount none -t configfs /sys/kernel/config
mkdir /sys/kernel/config/netconsole/target1
pushd /sys/kernel/config/netconsole/target1
echo 192.168.24.1 > local_ip
echo $DEST > remote_ip
echo br0 > dev_name
arping -c1 $DEST | grep -o ..:..:..:..:..:.. > remote_mac
echo 1 > enabled
popd

Or on the kernel command line  :
netconsole=@192.168.24.187/eth0,@192.168.42.218/ab:cd:ef:12:34:56

That way I pretty much always get the oops that never makes it to disk.

2021-07-09 11:19:14 192.168.24.187: [1076324.113147] Kernel panic - not 
syncing: stack-protector: Kernel stack is corrupted in: 
radeon_dp_needs_link_train+0x69/0x70 [radeon]
2021-07-09 11:19:14 192.168.24.187: [1076324.113163] CPU: 4 PID: 4109 Comm: 
kworker/4:1 Not tainted 5.12.10+ #11
2021-07-09 11:19:14 192.168.24.187: [1076324.113170] Hardware name: Apple 
Inc. iMac12,2/Mac-XX, BIOS 87.0.0.0.0 06/14/2019
2021-07-09 11:19:14 192.168.24.187: [1076324.113174] Workqueue: events 
radeon_dp_work_func [radeon]
2021-07-09 11:19:14 192.168.24.187: [1076324.113229] Call Trace:
2021-07-09 11:19:14 192.168.24.187: [1076324.113232]  dump_stack+0x64/0x7c
2021-07-09 11:19:14 192.168.24.187: [1076324.113237]  panic+0xf6/0x280
2021-07-09 11:19:14 192.168.24.187: [1076324.113241]  ? 
radeon_dp_needs_link_train+0x69/0x70 [radeon]
2021-07-09 11:19:14 192.168.24.187: [1076324.113267]  
__stack_chk_fail+0x10/0x10
2021-07-09 11:19:14 192.168.24.187: [1076324.113271]  
radeon_dp_needs_link_train+0x69/0x70 [radeon]
2021-07-09 11:19:14 192.168.24.187: [1076324.113297]  
radeon_connector_hotplug+0xa8/0xe0 [radeon]
2021-07-09 11:19:14 192.168.24.187: [1076324.113315]  
radeon_dp_work_func+0x28/0x40 [radeon]
2021-07-09 11:19:14 192.168.24.187: [1076324.113335]  
process_one_work+0x1c4/0x310
2021-07-09 11:19:14 192.168.24.187: [1076324.113339]  
worker_thread+0x240/0x3c0
2021-07-09 11:19:14 192.168.24.187: [1076324.113341]  ? 
wq_update_unbound_numa+0x10/0x10
2021-07-09 11:19:14 192.168.24.187: [1076324.113344]  kthread+0x10a/0x120
2021-07-09 11:19:14 192.168.24.187: [1076324.113346]  ? 
kthread_park+0x80/0x80
2021-07-09 11:19:14 192.168.24.187: [1076324.113348]  
ret_from_fork+0x1f/0x30
2021-07-09 11:19:14 192.168.24.187: [1076324.113391] Kernel Offset: disabled
2021-07-09 11:19:14 192.168.24.187: [1076324.113393] Rebooting in 10 
seconds..
2021-07-09 11:19:24 192.168.24.187: [1076334.114131] ACPI MEMORY or I/O 
RESET_REG.

Regards,
Brad
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


Re: [DNG] random sudden stops

2021-08-25 Thread william moss via Dng
On 8/25/21 8:10 PM, Hendrik Boom wrote:
> For the past few months my home server (running an ascii installation 
> physically moved from another computer) has been suddenly stopping all 
> processing about once a month. apparently at random.  It seems to stop 
> instantly, leaving power on and becoming completely responsive to ping,
> existing ssh connexions and use of the physical keyboard.
> 
> The system log, after a reboot, shows nothing unusual except of course 
> that there are no log entries for a shut-down.
> 
> Can anyone provide ideas about tracking this down?
> 
> It could of course be a random rare intermittent hardware error.
> 
> -- hendrik
> ___
> Dng mailing list
> Dng@lists.dyne.org
> https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
> 
I had the same problem on a work station running ASCII. Since I could
access the system from another machine on the LAN and even log in, I
guessed that it was Xorg. Killing X Via a remote login cleared the
problem. With the use of sar and other tools, I determined it was the
video card and/or NVIDIA's drivers (kernel modules). Switched back to
the system board's video (AMD) and the problem went away.

Hope this helps.

-- 
William (Bill) Moss
billm...@acm.org
NY (USA)
Those who will not reason, are bigots,
those who cannot, are fools,
and those who dare not, are slaves.
Lord Byron

Justice will not be served until those who are
unaffected are as outraged as those who are.
Benjamin Franklin

When the people fear the government there is
tyranny, when the government fears the people
there is liberty.
John Basil Barnhill
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng


[DNG] random sudden stops

2021-08-25 Thread Hendrik Boom
For the past few months my home server (running an ascii installation 
physically moved from another computer) has been suddenly stopping all 
processing about once a month. apparently at random.  It seems to stop 
instantly, leaving power on and becoming completely responsive to ping,
existing ssh connexions and use of the physical keyboard.

The system log, after a reboot, shows nothing unusual except of course 
that there are no log entries for a shut-down.

Can anyone provide ideas about tracking this down?

It could of course be a random rare intermittent hardware error.

-- hendrik
___
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng