[osol-discuss] ZFS dedup destroy woes!

Mark Pace Wed, 09 Mar 2011 15:16:09 -0800

I know there are lots of threads on this issue, but I figured I start a new one 
to share my 3 day horror story.  My hopes are that someone won't repeat my 
mistakes and, if they do, they might find some hope of how to determine how 
long it will take to get your pool back.


I began my adventure with the EON storage package (great package I highly 
recommend it) several months ago.  I ran a lot of tests with the hopes of 
understanding the storage system pretty well by the time we started putting it 
into production.  This, with some bumps, went pretty well.

During the testing, I ran a dataset with deduplication to see how well this 
would do.  We do a lot of backups of the same kind of data, and this turned out 
to be brilliant for it.  In the long run, the math didn't work out -- hard 
drives were a lot cheaper than system RAM.  Between that and the relatively 
new-ness of dedup, we decided that, while it was pretty slick, we'd be old 
school and just drop in more drives.  Compression, btw, was also doing wonders, 
which we decided to stick with.

So we put the system into limited production *without* deleting the 
deduplicated dataset.  Seemed harmless enough and I didn't drop in here to see 
if there was any discussion of "dedup zfs destroy".  Had I done that, I might 
not have this story to share.

On Sunday night (2011.03.06), I decided to do a final clean up since we were 
going to go out of limited production with the box to full bore this week.  I 
decided to run "zfs destroy mydedupdataset" and well, that was obviously a bad 
idea.

The first thing that happened was my NFS daemon became unresponsive.  This took 
the limited production of the unit offline and sent a bunch of servers into 
alert mode.  After staring at the non-decreasing dataset size for about 30 
minutes, we decided to swap the production over to our secondary storage (we 
had a replica, I love ZFS send).

I then committed my second error.  I rebooted the storage.  The system came 
back and would not mount the pool.  Worse than that, any zpool command that you 
ran ended up freezing that process (no ctrl-c, ctrl-z, no kill -9).  So now, 
instead of just having no NFS, I had no primary storage and seemingly no way of 
mounting the pool anymore.

I bugged my trusty fall back info source, Andre at EON (who has been 
instrumental in our success with the ZFS storage) and he told me to be very 
very patient.  This was also the consensus of many horror stories about doing a 
zfs destroy on dedup'd data.  I was fortunate to not be suffering from one of 
the common symptoms of total system lock up, I was just not able to mount the 
zpool and not and to run any zpool commands.  The bad news was that the time to 
recover a pool of my size (1.27TB of dedup'd data, 10TB of total pool size) was 
looking to be anywhere from 4 hours to a week!

This news sucked.  I was tempted to do all sorts of crazy things at this point 
because I could find no way of seeing if there was actually progress being made 
or not.  Fine, I can wait a week if I'm actually going to get my data back, but 
waiting a week to find out that it hasn't done anything isn't at all 
acceptable.  I went against all my internal desires and decided to stick it out.

During this I think I found one way of telling if the destroy is functioning:

iostat -x 5

That shows that the driver were chugging away.  On Sunday night, the drivers 
were running at about 50% busy on average.  On Monday night, they were running 
at 35% busy, Tuesday night 25% busy, and this morning, Weds 2011.03.09, almost 
three days later, it was recovered and mounted.  I rebooted for good measure 
and it imported the zpool much faster than it had done previously (no more 
dedup data to load up for a while).

So, maybe the %b on the iostat will give someone else the patience to see the 
waiting through to the end.  If you give it a day (yeah, that is a ridiculous 
statement), you can even get a guesstimate of how much longer it is going to 
take you!

I also strongly encourage people to just stay away from dedup -- seems awesome 
in a lot of respects, but days worth of zfs destroy, etc are absolutely insane 
even if you don't make the blunders that I did.  Apparently, if you read enough 
of the shared horrors of this, you should delete your dedup'd data before doing 
the zfs destroy.  I'm not going to test that for you, but if you're running 
preproduction and you've got a super compelling reason to dedup, then be sure 
to try that out while you have the option of just nuking your pool instead of 
waiting forever.


pace
-- 
This message posted from opensolaris.org
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

[osol-discuss] ZFS dedup destroy woes!

Reply via email to