So, last night, the resilvering was still running on the only (original)
drive in the zpool. When I checked in at 5:00 AM, and it was finished.
(There were "too many errors" on the disk, and said to run a zpool
clear. I'm thinking now that I should have done a dd right away, but
foggy with sleep, I started the zpool clear. This is still running, and
the HDD Activity blinkenlight is on steady. I will take a look when it
finishes (an ls in another terminal window doesn't return).
Fingers crossed, sacrifices to the computer gods, and several "Hail
Cthulus."
Rainer
On 28/10/2015 9:55 PM, Rainer Heilke wrote:
On 28/10/2015 1:47 PM, jason matthews wrote:
Let me apologize in advanced for inter-mixing comments.
Ditto.
I am not trying to be a dick (it happens naturally), but if you cant
afford to backup terabytes of data, then you cant afford to have
terabytes of data.
That is a meaningless statement, that reflects nothing in real-world
terms.
The true cost of a byte of data that you care about is the money you pay
for the initial storage, and then the money you pay to back it up. For
work, my front line databases have 64TB of mirrored net storage.
When you said "you," it implied (to me, at least) a home system, since
we're talking about a home system from the start. Certainly, if it is a
system that a company uses for its data, all of what you say is correct.
But a company, regardless of size, can write these expenses off.
Individuals cannot do that with their home systems. For them, this
paradigm is much more vague if it exists at all.
So, while I was talking apples, you were talking parsnips. My apologies
for not making that clearer. (All of that said, the DVD drive has been
acting up. Perhaps a writable Blu-Ray is in the wind. Since the price of
them has dropped further than the price of oil, that may make backups of
the more important data possible.)
The
costs dont stop there. There is another 200TB of net storage dedicated
to holding enough log data to rebuild the last 18 months from scratch. I
also have two sets of slaves that snapshot themselves frequently. One
set is a single disk, the other is raidz. These are not just backups.
One set runs batch jobs, one runs the front end portal, and the masters
are in charge of data ingestions.
Don't forget the costs added on by off-site storage, etc. I don't care
how many times the data is backed up, if it's all in the same building
that just burned to the ground... That is, unless your zfs sends are
going to a different site...
If you dont backup, you set yourself up for unrecoverable problems. In
I believe this may be the first time (for me) that simply replacing a
failed drive resulted in data corruption in a zpool. I've certainly
never seen this level of mess before.
That said, instead of running mirrors run loose disks and backup to the
second pool at a frequency you are comfortable with. You need to
prioritize your resources against your risk tolerance. It is tempting to
do mirrors because it is sexy but that might not be the best strategy.
That is something for me to think about. (I don't do *anything* on
computers because it's "sexy." I did mirrors for security (remember,
they hadn't failed for me at such a monumental level previously).
That's an arrogant statement, presuming that if a person doesn't have
gobs of money, they shouldn't bother with computers at all.
I didnt write anything like that. What I am saying is you need to get
more creative on how to protect your data. Yes, money makes it easier
but you have options.
My apologies; on its own, it came across that way.
I am not complaining about the time it takes; I know full well how
long it can take. I am complaining that the "resilvering" stops dead.
(More on this below.)
When the scrub is stopped dead, what does "iostat -nMxC 1" look like?
Are there drives indicating 100% busy? high wait or asvc_t times?
sudo iostat -nMxC 1
extended device statistics
r/sw/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.00.00.00.0 0.0 0.00.00.0 0 0 c4
0.00.00.00.0 0.0 0.00.00.0 0 0 c4t0d0
23.3 55.20.60.3 0.2 0.32.14.5 5 27 c3d0
0.00.00.00.0 0.0 0.00.00.2 0 0 c3d1
0.00.00.00.0 0.0 0.00.05.5 0 0 c6d1
360.1 13.1 29.00.1 1.3 1.53.44.0 48 82 c6d0
9.7 330.90.0 29.1 0.1 0.60.31.6 9 52 c7d1
359.9 354.6 28.3 28.5 30.2 3.4 42.24.7 85 85 data
23.2 34.90.60.3 6.2 0.3 106.95.6 6 12 rpool
extended device statistics
r/sw/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.00.00.00.0 0.0 0.00.00.0 0 0 c4
0.00.00.00.0 0.0 0.00.00.0 0 0 c4t0d0
0.0 112.10.00.3 0.0 0.40.04.0 0 45 c3d0