On 3/17/20 10:14 AM, Rich Freeman wrote:
On Tue, Mar 17, 2020 at 1:59 AM <tu...@posteo.de> wrote:

Finally, ALL DRIVES FAIL.  It doesn't matter what the underlying
storage technology is.  I've seen hard drives fail in less than a
year, with the warranty replacement drive failing less than a year
after that.  I think next warranty replacement (still in the original
warranty period) lasted 5+ years of near-continuous use.  The typical
failure modes of hard drives and solid state storage are different,
but they all fail.  You can't perfectly predict WHEN they will fail
either.  Most drives have SMART and sometimes it can detect failure
conditions before failure, but not always.


Hello Rich, et al.

I have deleted most, because I agree with the thread details, you get what you pay for, but excess payment is rarely rewarded...


HEAT is the enemy of all electronics and mechanical things, computer drives/memory are no exception. There are a myriad of interfaces/codes on modern motherboards, and quite a few on legacy motherboards that track heat. Some are not very accurate, but most, are reasonable.

Hopefully, you kept your mobo book. A section somewhere talks about temperature sensors. If the cpu is loaded, the drives are most likely getting hot. If the fans are running on a relatively high speed, the system is generating tons of heat. If the GPU(s) are running ho9t, the drives are hot. tools that scan the hardware for sensors are great, use them!


I now install 'water coolers' from thermaltake on all my chassis based system. new or large video cards have tons of processing going on inside the GPUs; thus a large source of heat. Systems with lots of GPU cards, are like ovens. All of this heat, regardless of source, KILLS all forms of memory, especially 'drives'. Keep everything monitored, well vented and in a room, cool as possible. Many server farm rooms run below 50 degrees F, to extend the performance and life of electronics, particularly HDD and other forms of memory. Many chipsets, scale down, upon increased heat, auto-magically.


Another (indirect) way to monitor heat, is to monitor the power consumption of a component. (relatively) large power draw, is entwined with heat production. Heat kills drives and memory.... no exceptions!


Here are few one-liners I use to monitor
(use/load==heat):

watch -n12  sensors -f

dstat -tcndylp --top-cpu  10

htop

What would be great, is if folks just list what they use to monitor the workload (and therefor heat indirectly) or the actual temperatures of given chipsets and "smart drives"? Perhaps we can then cull the responses and update of the gentoo help pages online with more detailed examples, scripts and tools to better organize heat, current and other relative performance parameters.


hth,
James

Reply via email to