On 1/21/24 16:13, David Christensen wrote:
On 1/21/24 03:47, gene heskett wrote:
On 1/21/24 01:33, David Christensen wrote:
I am still uncertain if those are internal SSD errors or SATA errors. Please check if you see matching errors in dmesg(1).

There aren't any. Those hours would very closely correspond to my attempts to rsync and the OOM deamon killed the machine, which it did around 10 times.  So logging by then had been killed. That to me is the smoking gun.


Kernel ring buffer is renewed with each boot and newer messages overwrite older messages.  So, you will want to save or clear the ring buffer with demsg(1), save a SMART full report, exercise the disk with dd(1) and/or a SMART test, save the ring buffer, save a SMART full report, and analyze everything to see if you have disk problems, SATA problems, and/or system problems.  Once everything passes without error, the disk is ready to be put into service.


2T is enough /home for the nonce. so I'll do the rsync thing going the other direction, using it for a backup of /home until I'm ready for trixie.

However I am tempted to zero the drives an recreate the raid w/o formatting since the mdadm seems capable to installing itw own filesystems to use the whole drive unpartitioned, giving me a backup that sizewise is about the same as the single 2T drive has now.

And although my single experience with lvm over a decade ago was a total disaster, made out of used spinning rust I may now see how the other 4 2T's assembled as a lvm for amandas vtapes as an 8T lvm to backup the whole system, which in addition to the 4 cnc'd machines, has over the last 5 years seen a train of 3d printers go by. If all 3, currently a WIP, get rebuilt, the smallest is 305 by, the largest is 400 by.  And all I hope will lay plastic at 200+ mm a second.  Normal consumer stuff is 40 to 60.

Obviously I have an eclectic choice of too many hobbies. ;o)>
Now if curiosity doesn't kill this cat, I need to find some breakfast and git to it.


This and other threads have led me to the conclusion that consumer SSD's are meant for devices that are off most of the time -- e.g. notepad, laptop, desktop, and workstation computers.  If you put them into a NAS/ file server and run them 24x7, they will die sometime after 2 years.

That has not been my experience at all David, I bought a 4 pack of 120G ssd's when they were the biggest available and replaced 3 spinning rust drives that had 50-70k hours on them with these. My cnc machines are all wired so power for the mill/lathe/what have you is totally controlled by the enable key, f2, so if f2 is off only the computer is running. That was at least 6 years ago. Then I installed a 240G as an extra drive on the rpi4 that runs my biggest lathe and made a buildbot out of it to pull linuxcnc-master from github and build it, also armhf kernels for linuxcnc's realtime needs. The 120G disappeared in about a year, replaced the adapter with a startech, drive was and is just fine. There is now at least 5 years on everyone of those original 120's, zero SSD problems in the whole lot.

So, I suggest:

1.  Build a storage server using NAS or enterprise HDD's.  Use an enterprise SSD or DOM for the OS.  Run it 24x7 or shut it down as you like.

2.  Use your Asus PRIME Z370-A II as a workstation.  Install the WD Black M.2 NVMe PCIe SSD.  Connect the optical drive to the first motherboard SATA port.  Install Debian onto the WD Black.  Then, connect the five Samsung EVO 870's to the remaining motherboard SATA ports.  Set them up as a 5-way mirror (RAID1).  Use the Samsung RAID as a scratch disk for your 3-D work.  As the Samsung's die off, replace them with the Gigastones.  Shut it down when you are not using it.

3.  For Amanda, either add more HDD's to the storage server or build another server.  If another server, shut it down when you are not using it.

Speaking as someone who has used amanda for about 25 years:

People don't always understand that one of Amanda prime directives is to balance the size of an individual back up run by advancing the level 3 scheduled for tonight, by advancing it to level 0 if this run is only going to be small. The only guarantee is that if you have a 10 day schedule, all machines/dle's, will get that level 0 backup not more than 10 days after the last one. You choose how many days long that cycle is. I adjust it so the storage is around 75 to 80% used after the schedule has stabilized. This may take quite a few such cycles. Designed to run every night when things are relatively quiet, how this works well depends on the other machines it is backing up be available.

Machines missing at backup time can and will muck things up for this efficient scheduling. Corporate users of Amanda, used to doing it their way, backing the weeks business on friday nights just don't understand that the Amanda way gets them a 100% coverage backup by backing up only the differences from the previous run of that dle every night is far superior to their fridey night when most of the offices machines are turned off for the weekend. For those cases we recommend composing two or more dle files and rigging cron to run them at more appropriate scheduling. This can be fairly easily done but MBA's watching every watt on the power bill just don't see it that way. They will get burnt when their place burns.

Thanks David.

Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis

Reply via email to