On 07/25/2015 05:12 AM, lee wrote:
> Daniel Frey <djqf...@gmail.com> writes:
> 
>> Well, I sure haven't had much luck with SSDs. This will be the third one
>> I've lost.
> 
> + Buy good hardware.
> + Never store anything on only a single disk (with very few exceptions).
> + Do not put swap partitions on single disks, either.
> + Disks always come in pairs at least.
> 
> SSDs, I'd currently buy Samsung 850 pro or evo, depending on how they
> are going to be used.

Samsung firmware is glitchy. When I bought the replacement I read that
the firmware in the 840/850 may be glitchy causing random blocks of
valid data to be erased during TRIM operations. (Reported June 2015.)

I was going to get a Samsung until I read of this new firmware glitch,
as the store I went to had them in stock. I was originally going to get
an Intel SSD but nobody here had stock.

> 
>> yesterday bought a new SSD, this time a SanDisk model. It was cheap and
>> I hope I don't regret this in the future.
> 
> Well, you get what you pay for.  To me, all the hassle a failed disk (or
> other hardware) will give me isn't worth saving a bit of money on it.

I maintain full backup images (or stage4, is that what they call them?)
so I just unpack, reinstall grub, and reboot. This is for a remote
mythtv frontend so there are no other files on it. The databases and
everything it needs are run on a server with RAID and backups of the
database on said server to a NAS nightly.

> 
> Since you got it cheap, why not buy another one and use RAID-1 (and/or
> zfs)?  When one fails, shutdown, replace the failed disk, restart --- no
> hassle involved.
> 
>> That aside, the drive that failed is a Crucial m4. I have done some
>> searching as how to run diagnostics on an SSD.
> 
> When you get sector errors reported in the log file when accessing the
> disk, the disk has failed (provided that the cabling and power supply
> are ok).  This goes for hard disks --- are SSDs any different in that?

I wasn't getting sector errors from the disk itself. I was experiencing
random kernel panics and apparently scrambled data. From reviews,
Crucial seems to make decent SSDs. I've lost Crucial and Kingston SSDs
in the past, all in this machine (and one in the server, now I find
they're too unreliable for that use so I don't use them in my server
anymore.)

> 
> Other than that, I don't need any more diagnostics.  It would only tell
> me what I already know.

Well, after lots of scratching head, I decided to

-run smartctl tests (no real info)
-I used shred on it, then used fstrim, checked results
-Updated the firmware, another shred and fstrim

I guess for SSDs there's really no actual check. Best you can do is
shred/fstrim using smartctl (check drive stats before and after
operations so you can compare them.) My SSD with the new firmware didn't
trip any more errors.

> 
>> I usually send them back for warranty, but this time I'm curious.
> 
> Without physically destroying it, I won't give any disk out of hand
> which has had my data on it.  Unfortunately, that probably means that
> there is no warranty on disks.  I only take the duration of the warranty
> as some indicator of what the manufacturer entrusts the disk with, as in
> "5 years may be better than 3".

There's no data on it that I care about, as I said, it's a remote mythtv
frontend. For drives in my main workstation I destroy them.

> 
> In practice, hard disks either fail not long after new, or after about 3
> years, or virtually never because they are replaced for other reasons
> before they fail.  SSDs might be different; I don't have much experience
> with them yet.
> 

In my experience SSDs just randomly fail with no warning whatsoever.
Random issues/crashes/segfaults, kernel panics. smartctl reports nothing.

I'd actually forgotten I'd posted this. What happened in the end is that
I called the manufacturer and they asked me to leave the SSD plugged in
without a SATA cable attached to do manual garbage collection. That got
me thinking, I checked the machine and the discard option wasn't set in
fstab. I do recall reading newer kernels are supposed to use TRIM
automatically and so I didn't explicitly set it, but maybe that's with
other distros.

I suspect it was a configuration error on my part. I have added the
discard option and am going to convert /boot to ext4 so I can use the
discard option there too, and install grub2 to take care of the booting.
I also set up anacron with fstrim to run weekly as I found recommended
elsewhere to try to resolve the problem. I'm going to convert vixie-cron
and anacron to cronie with the anacron USE set as well, so I can set the
MAILFROM var in crontab as when machines email me I can't figure out
which machine it came from.

I don't have another machine to try this Crucial drive in yet, but I'll
find something. It'll probably be fine now.

Reply via email to