On 1/21/24 16:13, David Christensen wrote:
On 1/21/24 03:47, gene heskett wrote:
On 1/21/24 01:33, David Christensen wrote:
I am still uncertain if those are internal SSD errors or SATA errors.
Please check if you see matching errors in dmesg(1).
There aren't any. Those hours would very closely correspond to my
attempts to rsync and the OOM deamon killed the machine, which it did
around 10 times. So logging by then had been killed. That to me is
the smoking gun.
Kernel ring buffer is renewed with each boot and newer messages
overwrite older messages. So, you will want to save or clear the ring
buffer with demsg(1), save a SMART full report, exercise the disk with
dd(1) and/or a SMART test, save the ring buffer, save a SMART full
report, and analyze everything to see if you have disk problems, SATA
problems, and/or system problems. Once everything passes without error,
the disk is ready to be put into service.
2T is enough /home for the nonce. so I'll do the rsync thing going the
other direction, using it for a backup of /home until I'm ready for
trixie.
However I am tempted to zero the drives an recreate the raid w/o
formatting since the mdadm seems capable to installing itw own
filesystems to use the whole drive unpartitioned, giving me a backup
that sizewise is about the same as the single 2T drive has now.
And although my single experience with lvm over a decade ago was a
total disaster, made out of used spinning rust I may now see how the
other 4 2T's assembled as a lvm for amandas vtapes as an 8T lvm to
backup the whole system, which in addition to the 4 cnc'd machines,
has over the last 5 years seen a train of 3d printers go by. If all 3,
currently a WIP, get rebuilt, the smallest is 305 by, the largest is
400 by. And all I hope will lay plastic at 200+ mm a second. Normal
consumer stuff is 40 to 60.
Obviously I have an eclectic choice of too many hobbies. ;o)>
Now if curiosity doesn't kill this cat, I need to find some breakfast
and git to it.
This and other threads have led me to the conclusion that consumer SSD's
are meant for devices that are off most of the time -- e.g. notepad,
laptop, desktop, and workstation computers. If you put them into a NAS/
file server and run them 24x7, they will die sometime after 2 years.
That has not been my experience at all David, I bought a 4 pack of 120G
ssd's when they were the biggest available and replaced 3 spinning rust
drives that had 50-70k hours on them with these. My cnc machines are all
wired so power for the mill/lathe/what have you is totally controlled by
the enable key, f2, so if f2 is off only the computer is running. That
was at least 6 years ago. Then I installed a 240G as an extra drive on
the rpi4 that runs my biggest lathe and made a buildbot out of it to
pull linuxcnc-master from github and build it, also armhf kernels for
linuxcnc's realtime needs. The 120G disappeared in about a year,
replaced the adapter with a startech, drive was and is just fine. There
is now at least 5 years on everyone of those original 120's, zero SSD
problems in the whole lot.
So, I suggest:
1. Build a storage server using NAS or enterprise HDD's. Use an
enterprise SSD or DOM for the OS. Run it 24x7 or shut it down as you like.
2. Use your Asus PRIME Z370-A II as a workstation. Install the WD
Black M.2 NVMe PCIe SSD. Connect the optical drive to the first
motherboard SATA port. Install Debian onto the WD Black. Then, connect
the five Samsung EVO 870's to the remaining motherboard SATA ports. Set
them up as a 5-way mirror (RAID1). Use the Samsung RAID as a scratch
disk for your 3-D work. As the Samsung's die off, replace them with the
Gigastones. Shut it down when you are not using it.
3. For Amanda, either add more HDD's to the storage server or build
another server. If another server, shut it down when you are not using it.
Speaking as someone who has used amanda for about 25 years:
People don't always understand that one of Amanda prime directives is to
balance the size of an individual back up run by advancing the level 3
scheduled for tonight, by advancing it to level 0 if this run is only
going to be small. The only guarantee is that if you have a 10 day
schedule, all machines/dle's, will get that level 0 backup not more than
10 days after the last one. You choose how many days long that cycle
is. I adjust it so the storage is around 75 to 80% used after the
schedule has stabilized. This may take quite a few such cycles.
Designed to run every night when things are relatively quiet, how this
works well depends on the other machines it is backing up be available.
Machines missing at backup time can and will muck things up for this
efficient scheduling. Corporate users of Amanda, used to doing it their
way, backing the weeks business on friday nights just don't understand
that the Amanda way gets them a 100% coverage backup by backing up only
the differences from the previous run of that dle every night is far
superior to their fridey night when most of the offices machines are
turned off for the weekend. For those cases we recommend composing two
or more dle files and rigging cron to run them at more appropriate
scheduling. This can be fairly easily done but MBA's watching every watt
on the power bill just don't see it that way. They will get burnt when
their place burns.
Thanks David.
Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis