Re: Progress of device deletion?

2013-10-01 Thread Duncan
Chris Murphy posted on Mon, 30 Sep 2013 23:26:16 -0600 as excerpted:

 The other thing, clearly the OP is surprised it's taking anywhere near
 this long. Had he known in advance, he probably would have made a
 different choice.

I had a longer version that I wrote first, but decided was /too/ long to 
post as it was.  In it I compared the time of seconds to a couple minutes 
to do a full btrfs balance here, to the time of days I'm seeing reported 
on-list.  That's down to two reasons, AFAIK, the fact that I'm running 
SSDs, and the fact that I partition things up so that even for my backups 
on spinning rust, I'm looking at perhaps a couple hours, not days, for a 
full balance or pre-btrfs, an mdraid (1) return from degraded.

The point there was that when a balance or raid rebuild takes seconds, 
minutes or hours, it's feasible and likely to do it as a test or as part 
of routine maintenance, before things get as bad as terabytes of over-
allocation.  As I result, I actually know what my normal balance times 
look like since I do them reasonably often, something that someone on a 
system where it's going to take days isn't likely to have the option or 
luxury of doing.

So there's a benefit in if possible partitioning, even if SSDs aren't a 
current option, or otherwise managing the data scale so maintenance time 
scales are at very worst on the scale of hours, not days, and preferably 
at the low end of that range (based on my mdraid days, a practical limit 
for me is a couple hours, before it's simply too long to do routinely 
enough to be familiar with the scale times, etc).

Of course that can't be done for all use cases, but if it's at all 
possible, simply managing the scale helps a *LOT*.  As I said, the 
practical wall before things go off the not routinely manageable end for 
me is about two hours.  A pre-deployment test can give one an idea of 
time scale, and from there... at least I'd have some idea whether it'd 
take a day or longer, and if it did there was definitely something wrong.

Of course if this /is/ part of that pre-deployment testing or even btrfs 
development testing, as is appropriate at this stage of btrfs 
development, then points to him for doing that testing and finding 
there's either something badly wrong or he's simply off the practical end 
of the size scale before actual deployment. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Progress of device deletion?

2013-09-30 Thread Chris Murphy

On Sep 29, 2013, at 1:13 AM, Fredrik Tolf fred...@dolda2000.com wrote:
 
 Is there any way I can find out what's going on?

For whatever reason, it started out with every drive practically full, in terms 
of chunk allocation.

e.g.devid5 size 2.73TB used 2.71TB path /dev/sdh1

I don't know if the code works this way, but it needs to do a balance (or 
partial one) to make room before it can start migrating actual complete chunks 
from 4 disks to 2 disks. And my guess is that it's not yet done balancing sdg1 
still.

Post kernel version, btrfs-progs version, and dmesg from the time the delete 
command was initiated.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Progress of device deletion?

2013-09-30 Thread Chris Murphy

On Sep 30, 2013, at 8:27 AM, Chris Murphy li...@colorremedies.com wrote:

 
 On Sep 29, 2013, at 1:13 AM, Fredrik Tolf fred...@dolda2000.com wrote:
 
 Is there any way I can find out what's going on?
 
 For whatever reason, it started out with every drive practically full, in 
 terms of chunk allocation.
 
 e.g.devid5 size 2.73TB used 2.71TB path /dev/sdh1
 
 I don't know if the code works this way, but it needs to do a balance (or 
 partial one) to make room before it can start migrating actual complete 
 chunks from 4 disks to 2 disks. And my guess is that it's not yet done 
 balancing sdg1 still.
 
 Post kernel version, btrfs-progs version, and dmesg from the time the delete 
 command was initiated.

Without knowing more information about how it's expected to behave (now and 
near future), I think if I were you I would have added a couple of drives to 
the volume first so that it had more maneuvering room. It probably seems weird 
to add drives to remove drives, but sometimes (always?) Btrfs really gets a bit 
piggish about allocating a lot more chunks than there is data. Or maybe it's 
not deallocating space as aggressively as it could. So it can get to a point 
where even though there isn't that much data in the volume (in your case 1.5x 
the drive size, and you have 4 drives) yet all of it's effectively allocated. 
So to back out of that takes free space. Then once the chunks are better 
allocated, you'd have been able to remove the drives. Open question if I would 
have removed 2, then another 2. Or removed 4.

And for all I know adding one drive might be enough even though it's raid0. 
*shrug* New territory.



Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Progress of device deletion?

2013-09-30 Thread Duncan
Chris Murphy posted on Mon, 30 Sep 2013 19:05:36 -0600 as excerpted:

 It probably seems weird to add drives to remove drives, but sometimes
 (always?) Btrfs really gets a bit piggish about allocating a lot more
 chunks than there is data. Or maybe it's not deallocating space as
 aggressively as it could. So it can get to a point where even though
 there isn't that much data in the volume (in your case 1.5x the drive
 size, and you have 4 drives) yet all of it's effectively allocated. So
 to back out of that takes free space. Then once the chunks are better
 allocated, you'd have been able to remove the drives.

As I understand things and from what I've actually observed here, btrfs 
only allocates chunks on-demand, but doesn't normally DEallocate them at 
all, except during balance, etc, when it rewrites all the (meta)data that 
matches the filters, compacting all those data holes that were opened 
up thru deletion in the process, by filling chunks as it rewrites the 
(meta)data.

So effectively, allocated chunks should always be the high-water-mark 
(rounded up to the nearest chunk size) of usage since the last balance 
effectively compacted chunk usage, because chunk allocation is automatic 
but chunk deallocation requires a balance.

This is actually a fairly reasonable approach in the normal case, since 
it's reasonable to assume that even if the size of the data has reduced 
substantially, if it once reached a particular size, it's likely to reach 
it again, and particularly the deallocation process has a serious time 
cost to rewrite the remaining active data to other chunks, so best to 
just let it be unless an admin decides it's worth eating that cost to get 
the lower chunk allocation and invokes a balance to effect that.


So as you were saying, the most efficient way to delete a device could be 
to add one first if chunk allocation is almost maxed out and well above 
actual (meta)data size, then do a balance to rewrite all those nearly 
empty chunks to nearly full ones and shrink the number of allocated 
chunks to something reasonable as a result, and only THEN, when there's 
some reasonable amount of unallocated space available, attempt the device 
delete.


Meanwhile, I really do have to question the use case where the risks of a 
single dead device killing a raid0 (or for that matter, running still 
experimental btrfs) are fine, but spending days doing data maintenance on 
data not valuable enough to put on anything but experimental btrfs raid0, 
is warranted over simply blowing the data away and starting with brand 
new mkfs-ed filesystems.  That a strong hint to me that either the raid0 
use case is wrong, or the days of data move and reshape instead of 
blowing it away and recreating brand new filesystems, is wrong, and that 
one or the other should be reevaluated.  However, I'm sure there must be  
use cases for which it's appropriate, and I simply don't have a 
sufficiently creative imagination, so I'll admit I could be wildly wrong 
on that.   If a sysadmin is sure he's on solid ground with his use case, 
for him, he very well could be.  =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Progress of device deletion?

2013-09-30 Thread Chris Murphy

On Sep 30, 2013, at 10:43 PM, Duncan 1i5t5.dun...@cox.net wrote:
 
 Meanwhile, I really do have to question the use case where the risks of a 
 single dead device killing a raid0 (or for that matter, running still 
 experimental btrfs) are fine, but spending days doing data maintenance on 
 data not valuable enough to put on anything but experimental btrfs raid0, 
 is warranted over simply blowing the data away and starting with brand 
 new mkfs-ed filesystems.

Yes of course. It must be a test case, and I think for non-experimental stable 
Btrfs it's reasonable to have reliable device delete regardless of the raid 
level because it's offered. And after all maybe the use case involves 
enterprise SSDs, each of which should have a less than 1% chance of failing 
during their service life. (Naturally, that's going to go a lot faster than 
days.)


 That a strong hint to me that either the raid0 
 use case is wrong, or the days of data move and reshape instead of 
 blowing it away and recreating brand new filesystems, is wrong, and that 
 one or the other should be reevaluated.

I think it's the wrong use case today, except for testing it. It's legit to try 
and blow things up, simply because it's offered functionality, so long as the 
idea is I really would like to have this workflow actually work in 2-5 years. 
Otherwise it is sortof a rat hole.

The other thing, clearly the OP is surprised it's taking anywhere near this 
long. Had he known in advance, he probably would have made a different choice.

Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html