Re: Buster System hangs, requires hard reboot

2020-06-19 Thread Ralph Katz
I've added my experience to the existing bug 846296.

#846296 [e2fsprogs] ext4 checksum error
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=846296



Re: Buster System hangs, requires hard reboot

2020-04-22 Thread Ralph Katz
On 4/22/20 7:24 PM, Tom Dial wrote:
> 
> 
> On 4/20/20 19:44, Ralph Katz wrote:
>> Hi -- Please help me diagnose and fix this problem.
>>
>> My five month old Dell laptop with updated firmware and new up-to-date
>> Buster completely hangs and requires a hard reboot after 7-40 days
>> uptime.  While reading something onscreen or away from the laptop, the
>> system hangs completely: screen freezes, keyboard is unresponsive, lid
>> close fails to sleep, can't ssh in, pings fail.  Hard reboot is required.
>>
>> Actions taken:
>>
>> - re-installed Buster twice over several months
>> - ran fsck.ext4 -y /dev/sda2  when fsck failed on boot
>> - ran bad blocks
>> - run smartmontools long test weekly; no errors reported in logs.
>>
>> Sometimes there are errors in syslog like this before a crash:
>>
>>> Apr  2 06:04:18 spike3 kernel: [539637.882916] EXT4-fs error (device sda2): 
>>> ext4_lookup:1590: inode #55838123: comm updatedb.mlocat: iget: checksum 
>>> invalid
>>
>> Today there were no syslog errors for weeks before the system hung.
>> After rebooting, errors like these appeared:
>>
>>> Apr 20 16:05:57 spike3 kernel: [  887.007328] EXT4-fs error (device sda2): 
>>> ext4_lookup:1590: inode #55842004: comm GMPThread: iget: checksum invalid
>>> Apr 20 16:08:53 spike3 kernel: [ 1062.821504] EXT4-fs error (device sda2): 
>>> ext4_lookup:1590: inode #55842002: comm DOM Worker: iget: checksum invalid
>>
>> Any ideas?
> 
> These errors seem to indicate that data in three inodes (and probably
> more) are invalid: they contain a checksum different from that
> calculated in function ext4_iget, at (or near) line 1590 in file
> inode.c. It looks like the device block(s) containing the inodes were
> read successfully, indicating they are intact and consistent. The data
> within them, however, are not. The three inodes are located fairly close
> together and may have been written to the block device by the same
> physical operation.
> 
> The two messages issued within a period of 3 minutes, and the hang
> without a logged message, suggest that the errors logged were symptoms
> rather than causes. However, an unlogged error of the same type (for
> instance, reading and then using bad data that has no built in
> checksum), seems plausible.
> 
>>From the logged errors:
> 
> The checksum computed by the OS for the data read from the block device
> differs from the checksum computed for the data at the time it was sent
> to the block device.
> 
> No block read error is reported. If true, that implies that the data on
> the device is unchanged from what was written by the device firmware.
> 
> Which implies, in turn, that the inode data were incorrect when received
> by the block device or were corrupted on the block device before
> completion of the write operation.
> 
> The first indicates a bug in the ext4 file system. That seems a stretch
> in view of the maturity and widespread use of ext4 (including by me) on
> Gnu/Linux systems. Still, a file system is an extremely complex and
> subtle piece of code, probably running on multiple CPU hardware that may
> present unique issues. It might be worth looking for ext4 bug reports
> that resonate with this. If there is (but I know of no reason to suspect
> it), installing on a different file system could be a solution. In
> addition to ext[2,3,4] I have used jfs and xfs on systems for quite a
> few years, and found them stable reliable. The last time I looked, they
> were available installer choices. For a laptop used portably, ZFS (from
> buster-backports) also is a reasonable candidate with built in
> encryption capability, although installation requires quite a bit more
> effort than the installer.
> 
> The second indicates a problem, probably in firmware, within the block
> device. I have seen such, and it could be worth looking into whether the
> device manufacturer has released firmware updates, and applying the
> latest if different from what now is present on the device. I lean
> toward that rather than a file system bug.
> 
> A five month old machine should be under warranty, although I do not
> know whether installing Linux would affect that. It would be worth
> looking into and should offload firmware upgrade for or replacement of
> the block device.
> 
> 
> Regards,
> Tom Dial

Tom, thanks for your comprehensive review!  It is under warranty and
bringing it in is probably my next step.  This laptop supports ubuntu
from the factory, so there is no concern with Linux.

Dell website shows no firmware updates for my laptop service tag other
than BIOS, which I have applied earlier.  Searching for the drive model
returns nothing @ dell nor @ support.toshiba.com:
> Your entry doesn’t appear to be valid. Please double-check that your product 
> is from the US or Latin America and try again.

>From smartctl:
Device Model: TOSHIBA MQ04ABF100
Firmware Version: JU000D

Thanks again!
Ralph






Re: Buster System hangs, requires hard reboot

2020-04-22 Thread Tom Dial



On 4/20/20 19:44, Ralph Katz wrote:
> Hi -- Please help me diagnose and fix this problem.
> 
> My five month old Dell laptop with updated firmware and new up-to-date
> Buster completely hangs and requires a hard reboot after 7-40 days
> uptime.  While reading something onscreen or away from the laptop, the
> system hangs completely: screen freezes, keyboard is unresponsive, lid
> close fails to sleep, can't ssh in, pings fail.  Hard reboot is required.
> 
> Actions taken:
> 
> - re-installed Buster twice over several months
> - ran fsck.ext4 -y /dev/sda2  when fsck failed on boot
> - ran bad blocks
> - run smartmontools long test weekly; no errors reported in logs.
> 
> Sometimes there are errors in syslog like this before a crash:
> 
>> Apr  2 06:04:18 spike3 kernel: [539637.882916] EXT4-fs error (device sda2): 
>> ext4_lookup:1590: inode #55838123: comm updatedb.mlocat: iget: checksum 
>> invalid
> 
> Today there were no syslog errors for weeks before the system hung.
> After rebooting, errors like these appeared:
> 
>> Apr 20 16:05:57 spike3 kernel: [  887.007328] EXT4-fs error (device sda2): 
>> ext4_lookup:1590: inode #55842004: comm GMPThread: iget: checksum invalid
>> Apr 20 16:08:53 spike3 kernel: [ 1062.821504] EXT4-fs error (device sda2): 
>> ext4_lookup:1590: inode #55842002: comm DOM Worker: iget: checksum invalid
> 
> Any ideas?

These errors seem to indicate that data in three inodes (and probably
more) are invalid: they contain a checksum different from that
calculated in function ext4_iget, at (or near) line 1590 in file
inode.c. It looks like the device block(s) containing the inodes were
read successfully, indicating they are intact and consistent. The data
within them, however, are not. The three inodes are located fairly close
together and may have been written to the block device by the same
physical operation.

The two messages issued within a period of 3 minutes, and the hang
without a logged message, suggest that the errors logged were symptoms
rather than causes. However, an unlogged error of the same type (for
instance, reading and then using bad data that has no built in
checksum), seems plausible.

>From the logged errors:

The checksum computed by the OS for the data read from the block device
differs from the checksum computed for the data at the time it was sent
to the block device.

No block read error is reported. If true, that implies that the data on
the device is unchanged from what was written by the device firmware.

Which implies, in turn, that the inode data were incorrect when received
by the block device or were corrupted on the block device before
completion of the write operation.

The first indicates a bug in the ext4 file system. That seems a stretch
in view of the maturity and widespread use of ext4 (including by me) on
Gnu/Linux systems. Still, a file system is an extremely complex and
subtle piece of code, probably running on multiple CPU hardware that may
present unique issues. It might be worth looking for ext4 bug reports
that resonate with this. If there is (but I know of no reason to suspect
it), installing on a different file system could be a solution. In
addition to ext[2,3,4] I have used jfs and xfs on systems for quite a
few years, and found them stable reliable. The last time I looked, they
were available installer choices. For a laptop used portably, ZFS (from
buster-backports) also is a reasonable candidate with built in
encryption capability, although installation requires quite a bit more
effort than the installer.

The second indicates a problem, probably in firmware, within the block
device. I have seen such, and it could be worth looking into whether the
device manufacturer has released firmware updates, and applying the
latest if different from what now is present on the device. I lean
toward that rather than a file system bug.

A five month old machine should be under warranty, although I do not
know whether installing Linux would affect that. It would be worth
looking into and should offload firmware upgrade for or replacement of
the block device.


Regards,
Tom Dial

>
> Thanks in advance!
> Ralph
> 



Re: Buster System hangs, requires hard reboot

2020-04-22 Thread Gene Heskett
On Wednesday 22 April 2020 03:36:22 to...@tuxteam.de wrote:

> On Tue, Apr 21, 2020 at 05:58:41PM -0600, Tom Dial wrote:
> > On 4/21/20 06:24, Ralph Katz wrote:
> > > On 4/21/20 2:47 AM, Gene Heskett wrote:
> > >> On Tuesday 21 April 2020 03:07:31 to...@tuxteam.de wrote:
>
> [...]
>
> > The fact that the inode numbers mentioned in the initial post all
> > are near each other
>
> D'oh!

I missed that too, shame on me.  Can I blame it on oldtimers?
>
> >  hints, to me, that the storage device is the most
> > likely source of difficulty. I would try
>
> [...]
>
> Good catch. See? Just being old means nothing, in itself ;-)
>
> Cheers
> -- t


Cheers,to you too, Tomas, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page 



Re: Buster System hangs, requires hard reboot

2020-04-22 Thread tomas
On Tue, Apr 21, 2020 at 06:50:03PM -0600, Ralph Katz wrote:
> On 4/21/20 1:05 PM, deloptes wrote:
> > to...@tuxteam.de wrote:
> > 
> >> No, but a connector. And depending on the mechanical environment,
> >> those sometimes loosen too.
> > 
> > Yes noticed on few that if disk is not mounted properly it starts sliding
> > out. So reinserting the disk after checking connector states and mounting
> > it properly could be leading to such problems in theory.
> > 
> 
> Even with the service manual, I am totally intimidated by the fragile,
> paper-thin components in modern laptops.  I was unable to disconnect the
> drive cable from the board; did not want to exert excess pressure.

I know. You need the right combo of determination and patience. And
then, you still sometimes break things :-)

I have little choice, because, among my friends, I'm the "computer guy"
and don't want to disappoint them. So I get some practice...

>So I
> was unable to get to the disk itself to remove/reinsert it.  But I was
> able to press hard on the cable connector, if that makes any difference
> to the existing connection.  A service visit is probably going to be
> needed for this.

Next time you get a refurbished Thinkpad. Extracting hard disk is just
one screw :-)

> I received a private email about using smartctl, but as mentioned in my
> original post, weekly long tests and drive error logs show no errors at all.

I see. Tom Dial has posted something to the same effect. His observation
that the errors seem to cluster about repeated inodes hints at a faulty
(part of) the hard disk (unless it's just a statistical thing about your
operating system always hitting the same small set of files).

> Thanks for your help guys!

You're welcome!

Cheers
-- t


signature.asc
Description: Digital signature


Re: Buster System hangs, requires hard reboot

2020-04-22 Thread tomas
On Tue, Apr 21, 2020 at 05:58:41PM -0600, Tom Dial wrote:
> 
> 
> On 4/21/20 06:24, Ralph Katz wrote:
> > On 4/21/20 2:47 AM, Gene Heskett wrote:
> >> On Tuesday 21 April 2020 03:07:31 to...@tuxteam.de wrote:

[...]

> The fact that the inode numbers mentioned in the initial post all are
> near each other

D'oh!

>  hints, to me, that the storage device is the most likely
> source of difficulty. I would try

[...]

Good catch. See? Just being old means nothing, in itself ;-)

Cheers
-- t


signature.asc
Description: Digital signature


Re: Buster System hangs, requires hard reboot

2020-04-21 Thread Ralph Katz
On 4/21/20 1:05 PM, deloptes wrote:
> to...@tuxteam.de wrote:
> 
>> No, but a connector. And depending on the mechanical environment,
>> those sometimes loosen too.
> 
> Yes noticed on few that if disk is not mounted properly it starts sliding
> out. So reinserting the disk after checking connector states and mounting
> it properly could be leading to such problems in theory.
> 

Even with the service manual, I am totally intimidated by the fragile,
paper-thin components in modern laptops.  I was unable to disconnect the
drive cable from the board; did not want to exert excess pressure.  So I
was unable to get to the disk itself to remove/reinsert it.  But I was
able to press hard on the cable connector, if that makes any difference
to the existing connection.  A service visit is probably going to be
needed for this.

I received a private email about using smartctl, but as mentioned in my
original post, weekly long tests and drive error logs show no errors at all.

Thanks for your help guys!

Regards,
Ralph






Re: Buster System hangs, requires hard reboot

2020-04-21 Thread Tom Dial



On 4/21/20 06:24, Ralph Katz wrote:
> On 4/21/20 2:47 AM, Gene Heskett wrote:
>> On Tuesday 21 April 2020 03:07:31 to...@tuxteam.de wrote:
>>
>>> On Mon, Apr 20, 2020 at 10:59:49PM -0400, Gene Heskett wrote:
 On Monday 20 April 2020 21:44:10 Ralph Katz wrote:
> Hi -- Please help me diagnose and fix this problem.
>
> My five month old Dell laptop with updated firmware and new
> up-to-date
>>>
>>>^^
>>>
 How old, and what color are your sata cables? [...]
>>>
>>> I know you like the red ones :-)
>>>
>> Chuckle, I have known about that dye color since we started importing cb 
>> radios from the J.A.Pan company in about 1973, the mike cable failure 
>> rate at about a years service was 100%. A one point in the leadup to our 
>> centennial in '76, we had 20 cables on backorder from every supplier 
>> claiming to stock them. The shop area at Norfolk 2-way Radio was stacked 
>> up with radios that needed cables. When they finally started to show up, 
>> I had Vi return quite a pile of them because they were the exact copies 
>> of the high failure rate product, and we wound up paying 2x for a Belden 
>> cable that was way too strong a coil, but if it was tied down, it didn't 
>> fail. I was a couple months getting caught up with that fiasco.
>>
>> You folks all think I'm nuts  over this, but you forget I have 1st phone, 
>> and C.E.T. cards in my card carrier and a 70+ year history of fixing 
>> things electronic.  All on a 8th grade education, beyond that I am self 
>> educated. I've often laughed over the (google for it) the kid with the 
>> Nack, that was me.  The Iowa test rated me at 147 back about '48, but as 
>> Korea was getting hot, a 98 on the AFQT got me 4F'd for life cause they 
>> knew I'd not take orders at all well. Suffice to say, the next best 
>> score among 130 some boys that day was 36. But I can't do that today as 
>> I've stared down the guy with the scythe twice now, and the first time, 
>> a pulmonary embolism cost me some points. Survival rate for those is 
>> about 2% and I can testify that its one hell of a scary way to die. But 
>> I've also quite a list of BTDT's I can talk about.
>>
>>> But this is a laptop, so...
>>>
>>> That said, given the log errors, I'd try first to reseat, then to
>>> change the laptop's disk. This looks like flaky connector/hardware
>>
>> Absolutely.
>>
>>> Hoping it ain't the motherboard, though.
>>>
>>> And... backup. Backup :-)
>>>
>>> Cheers
>>> -- t
>>
>>
>> Cheers, Gene Heskett
>>
> 
> Ah, wisdom of the elders!  I usually consult the younger set for
> computer stuff...  but will take your advice.  Even though I opened and
> ran a computer store in 1980, I haven't opened a computer case to touch
> the hardware in 20 yrs or so at least.  If I don't break anything, I'll
> report back to the list.  Thanks much for the suggestions!

The fact that the inode numbers mentioned in the initial post all are
near each other hints, to me, that the storage device is the most likely
source of difficulty. I would try

 smartctl -t long
  as root, or
 sudo /usr/sbin/smartctl -t long
  as an ordinary user authorized to use privileged commands.

(after installing the smartmontools if it is not present)

You can do that without disrupting normal processing. It may take a long
time to complete (and will tell you when you start it). After it
completes you can run

 smartctl -a (or sudo /usr/sbin/smartctl -a)

and see a report that will confirm (or not) a problem,


Regards

Tom Dial


> 
> Regards,
> Ralph
> 



Re: Buster System hangs, requires hard reboot

2020-04-21 Thread deloptes
to...@tuxteam.de wrote:

> No, but a connector. And depending on the mechanical environment,
> those sometimes loosen too.

Yes noticed on few that if disk is not mounted properly it starts sliding
out. So reinserting the disk after checking connector states and mounting
it properly could be leading to such problems in theory.



Re: Buster System hangs, requires hard reboot

2020-04-21 Thread tomas
On Tue, Apr 21, 2020 at 08:33:18PM +0200, deloptes wrote:
> to...@tuxteam.de wrote:
> 
> >> How old, and what color are your sata cables? [...]
> > 
> > I know you like the red ones :-)
> > 
> > But this is a laptop, so...
> > 
> > That said, given the log errors, I'd try first to reseat, then to
> > change the laptop's disk. This looks like flaky connector/hardware
> 
> But this is laptop - what kind of sata cable do you have there? None!

No, but a connector. And depending on the mechanical environment,
those sometimes loosen too.

> This notebook if not in warranty needs being opened up and inspected in
> detail. Perhaps first try replacing the disk only.

Given a good backup (!) taking out & reseating the disk is halfway to
replacing, that's why I recommended that first. Of course, if the
laptop is so critical that you can't afford taking that chance...

> Such errors are common for spinning disks. Can't imagine newer notebooks
> have such anymore, but who knows.

I'd expect an error in the disk itself to be more persistent, but that
kind of errors can be idiosyncratic indeed. Some off-spec thing getting
triggered by low voltage; temperature; whatnot.

> The others (SDD) usually just die, but could be also there a bad capacitor
> or whatever.

Exactly: it's a whatever :)

Cheers
-- t


signature.asc
Description: Digital signature


Re: Buster System hangs, requires hard reboot

2020-04-21 Thread deloptes
to...@tuxteam.de wrote:

>> How old, and what color are your sata cables? [...]
> 
> I know you like the red ones :-)
> 
> But this is a laptop, so...
> 
> That said, given the log errors, I'd try first to reseat, then to
> change the laptop's disk. This looks like flaky connector/hardware

But this is laptop - what kind of sata cable do you have there? None!

This notebook if not in warranty needs being opened up and inspected in
detail. Perhaps first try replacing the disk only.
Such errors are common for spinning disks. Can't imagine newer notebooks
have such anymore, but who knows.

The others (SDD) usually just die, but could be also there a bad capacitor
or whatever.





Re: Buster System hangs, requires hard reboot

2020-04-21 Thread tomas
On Tue, Apr 21, 2020 at 06:24:08AM -0600, Ralph Katz wrote:

> Ah, wisdom of the elders!  I usually consult the younger set for
> computer stuff...  but will take your advice.  Even though I opened and
> ran a computer store in 1980, I haven't opened a computer case to touch
> the hardware in 20 yrs or so at least.  If I don't break anything, I'll
> report back to the list.  Thanks much for the suggestions!

What tips me off is the error's seemingly sporadic nature.

Cheers
-- t


signature.asc
Description: Digital signature


Re: Buster System hangs, requires hard reboot

2020-04-21 Thread Ralph Katz
On 4/21/20 2:47 AM, Gene Heskett wrote:
> On Tuesday 21 April 2020 03:07:31 to...@tuxteam.de wrote:
> 
>> On Mon, Apr 20, 2020 at 10:59:49PM -0400, Gene Heskett wrote:
>>> On Monday 20 April 2020 21:44:10 Ralph Katz wrote:
 Hi -- Please help me diagnose and fix this problem.

 My five month old Dell laptop with updated firmware and new
 up-to-date
>>
>>^^
>>
>>> How old, and what color are your sata cables? [...]
>>
>> I know you like the red ones :-)
>>
> Chuckle, I have known about that dye color since we started importing cb 
> radios from the J.A.Pan company in about 1973, the mike cable failure 
> rate at about a years service was 100%. A one point in the leadup to our 
> centennial in '76, we had 20 cables on backorder from every supplier 
> claiming to stock them. The shop area at Norfolk 2-way Radio was stacked 
> up with radios that needed cables. When they finally started to show up, 
> I had Vi return quite a pile of them because they were the exact copies 
> of the high failure rate product, and we wound up paying 2x for a Belden 
> cable that was way too strong a coil, but if it was tied down, it didn't 
> fail. I was a couple months getting caught up with that fiasco.
> 
> You folks all think I'm nuts  over this, but you forget I have 1st phone, 
> and C.E.T. cards in my card carrier and a 70+ year history of fixing 
> things electronic.  All on a 8th grade education, beyond that I am self 
> educated. I've often laughed over the (google for it) the kid with the 
> Nack, that was me.  The Iowa test rated me at 147 back about '48, but as 
> Korea was getting hot, a 98 on the AFQT got me 4F'd for life cause they 
> knew I'd not take orders at all well. Suffice to say, the next best 
> score among 130 some boys that day was 36. But I can't do that today as 
> I've stared down the guy with the scythe twice now, and the first time, 
> a pulmonary embolism cost me some points. Survival rate for those is 
> about 2% and I can testify that its one hell of a scary way to die. But 
> I've also quite a list of BTDT's I can talk about.
> 
>> But this is a laptop, so...
>>
>> That said, given the log errors, I'd try first to reseat, then to
>> change the laptop's disk. This looks like flaky connector/hardware
> 
> Absolutely.
> 
>> Hoping it ain't the motherboard, though.
>>
>> And... backup. Backup :-)
>>
>> Cheers
>> -- t
> 
> 
> Cheers, Gene Heskett
> 

Ah, wisdom of the elders!  I usually consult the younger set for
computer stuff...  but will take your advice.  Even though I opened and
ran a computer store in 1980, I haven't opened a computer case to touch
the hardware in 20 yrs or so at least.  If I don't break anything, I'll
report back to the list.  Thanks much for the suggestions!

Regards,
Ralph



Re: Buster System hangs, requires hard reboot

2020-04-21 Thread Gene Heskett
On Tuesday 21 April 2020 03:07:31 to...@tuxteam.de wrote:

> On Mon, Apr 20, 2020 at 10:59:49PM -0400, Gene Heskett wrote:
> > On Monday 20 April 2020 21:44:10 Ralph Katz wrote:
> > > Hi -- Please help me diagnose and fix this problem.
> > >
> > > My five month old Dell laptop with updated firmware and new
> > > up-to-date
>
>^^
>
> > How old, and what color are your sata cables? [...]
>
> I know you like the red ones :-)
>
Chuckle, I have known about that dye color since we started importing cb 
radios from the J.A.Pan company in about 1973, the mike cable failure 
rate at about a years service was 100%. A one point in the leadup to our 
centennial in '76, we had 20 cables on backorder from every supplier 
claiming to stock them. The shop area at Norfolk 2-way Radio was stacked 
up with radios that needed cables. When they finally started to show up, 
I had Vi return quite a pile of them because they were the exact copies 
of the high failure rate product, and we wound up paying 2x for a Belden 
cable that was way too strong a coil, but if it was tied down, it didn't 
fail. I was a couple months getting caught up with that fiasco.

You folks all think I'm nuts  over this, but you forget I have 1st phone, 
and C.E.T. cards in my card carrier and a 70+ year history of fixing 
things electronic.  All on a 8th grade education, beyond that I am self 
educated. I've often laughed over the (google for it) the kid with the 
Nack, that was me.  The Iowa test rated me at 147 back about '48, but as 
Korea was getting hot, a 98 on the AFQT got me 4F'd for life cause they 
knew I'd not take orders at all well. Suffice to say, the next best 
score among 130 some boys that day was 36. But I can't do that today as 
I've stared down the guy with the scythe twice now, and the first time, 
a pulmonary embolism cost me some points. Survival rate for those is 
about 2% and I can testify that its one hell of a scary way to die. But 
I've also quite a list of BTDT's I can talk about.

> But this is a laptop, so...
>
> That said, given the log errors, I'd try first to reseat, then to
> change the laptop's disk. This looks like flaky connector/hardware

Absolutely.

> Hoping it ain't the motherboard, though.
>
> And... backup. Backup :-)
>
> Cheers
> -- t


Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page 



Re: Buster System hangs, requires hard reboot

2020-04-21 Thread tomas
On Mon, Apr 20, 2020 at 10:59:49PM -0400, Gene Heskett wrote:
> On Monday 20 April 2020 21:44:10 Ralph Katz wrote:
> 
> > Hi -- Please help me diagnose and fix this problem.
> >
> > My five month old Dell laptop with updated firmware and new up-to-date
   ^^
> How old, and what color are your sata cables? [...]

I know you like the red ones :-)

But this is a laptop, so...

That said, given the log errors, I'd try first to reseat, then to
change the laptop's disk. This looks like flaky connector/hardware

Hoping it ain't the motherboard, though.

And... backup. Backup :-)

Cheers
-- t


signature.asc
Description: Digital signature


Re: Buster System hangs, requires hard reboot

2020-04-20 Thread Gene Heskett
On Monday 20 April 2020 21:44:10 Ralph Katz wrote:

> Hi -- Please help me diagnose and fix this problem.
>
> My five month old Dell laptop with updated firmware and new up-to-date
> Buster completely hangs and requires a hard reboot after 7-40 days
> uptime.  While reading something onscreen or away from the laptop, the
> system hangs completely: screen freezes, keyboard is unresponsive, lid
> close fails to sleep, can't ssh in, pings fail.  Hard reboot is
> required.
>
> Actions taken:
>
> - re-installed Buster twice over several months
> - ran fsck.ext4 -y /dev/sda2  when fsck failed on boot
> - ran bad blocks
> - run smartmontools long test weekly; no errors reported in logs.
>
> Sometimes there are errors in syslog like this before a crash:
> > Apr  2 06:04:18 spike3 kernel: [539637.882916] EXT4-fs error (device
> > sda2): ext4_lookup:1590: inode #55838123: comm updatedb.mlocat:
> > iget: checksum invalid
>
> Today there were no syslog errors for weeks before the system hung.
>
> After rebooting, errors like these appeared:
> > Apr 20 16:05:57 spike3 kernel: [  887.007328] EXT4-fs error (device
> > sda2): ext4_lookup:1590: inode #55842004: comm GMPThread: iget:
> > checksum invalid Apr 20 16:08:53 spike3 kernel: [ 1062.821504]
> > EXT4-fs error (device sda2): ext4_lookup:1590: inode #55842002: comm
> > DOM Worker: iget: checksum invalid
>
> Any ideas?
>
> Thanks in advance!
> Ralph

How old, and what color are your sata cables?  There is something in the 
dye used for "hot red" cables that converts the copper of the conductors 
into a brown rust like powder which of coarse is a very poor conductor. 
Reboot, put a tail on the syslog, and touch the cable so it is moved a 
bit. If it blows up the syslog with more such messages, replace the 
cable, but replace it with any color but red..

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page 



Buster System hangs, requires hard reboot

2020-04-20 Thread Ralph Katz
Hi -- Please help me diagnose and fix this problem.

My five month old Dell laptop with updated firmware and new up-to-date
Buster completely hangs and requires a hard reboot after 7-40 days
uptime.  While reading something onscreen or away from the laptop, the
system hangs completely: screen freezes, keyboard is unresponsive, lid
close fails to sleep, can't ssh in, pings fail.  Hard reboot is required.

Actions taken:

- re-installed Buster twice over several months
- ran fsck.ext4 -y /dev/sda2  when fsck failed on boot
- ran bad blocks
- run smartmontools long test weekly; no errors reported in logs.

Sometimes there are errors in syslog like this before a crash:

> Apr  2 06:04:18 spike3 kernel: [539637.882916] EXT4-fs error (device sda2): 
> ext4_lookup:1590: inode #55838123: comm updatedb.mlocat: iget: checksum 
> invalid

Today there were no syslog errors for weeks before the system hung.
After rebooting, errors like these appeared:

> Apr 20 16:05:57 spike3 kernel: [  887.007328] EXT4-fs error (device sda2): 
> ext4_lookup:1590: inode #55842004: comm GMPThread: iget: checksum invalid
> Apr 20 16:08:53 spike3 kernel: [ 1062.821504] EXT4-fs error (device sda2): 
> ext4_lookup:1590: inode #55842002: comm DOM Worker: iget: checksum invalid

Any ideas?

Thanks in advance!
Ralph