> Replace doesn't need to do a balance, it's largely just a block level copy of 
> the device being replaced, but with some special handling so that the 
> filesystem is consistent throughout the whole operation.  This is most of why 
> it's so much more efficient than add/delete.

Thanks for this correction. In the mean time I experienced myself that replace 
is pretty fast…

Last time I wrote I thought the initial 4 day "remove missing" was 
successful/complete, but as it turned out that device was still missing. Maybe 
that Ctrl+C I tried after a few days did work after all. I only checked/noticed 
this after the 8 TB drive was zeroed and encrypted.

Luckily, most of the "missing" data was already rebuilt onto the remaining 2 
drives, and only 1.27 TiB were still "missing".

In hindsight I should probably have repeated "remove missing" here, but to 
completion. What I did instead was a "replace -r" onto the 8 TB drive. This did 
successfully rebuild the missing 1.27 TiB of data onto the 8 TB drive, at a 
speedy ~144 MiB/s no less!

So I was back to a 4-drive raid1, with 3x 6 TB drives and 1x 8 TB drive (though 
that 8 TB drive had very little data on it). Then I tried to "remove" (without 
"-r" this time) the 6 TB drive with the least amount of data on it (one had 4.0 
TiB, where the other two had 5.45 TiB each). This failed after a few minutes 
because of "no space left on device". 

Austin's mail reminded me to resize due to the larger disk, which I then did, 
but that device still couldn't be removed, same error message.
I then consulted the wiki, which mentions that space for metadata might be 
rather full (11.91 used of 12.66 GiB total here), and to try a "balance" with a 
low "dusage" in such cases.

For now I avoided that by removing one of the other two (rather full) 6 TB 
drives at random, and this has been going on for the last 20 hours or so. 
Thanks to running it in a screen I can check the progress this time around, and 
it's doing its thing at ~41 MiB/s, or ~7 hours per TiB, on average.

Maybe the "no data left on device" will sort itself out during this "remove"'s 
balance, otherwise I'll do it manually later.

> The most efficient way of converting the array online without adding any more 
> disks than you have to begin with is:
> 1. Delete one device from the array with device delete.
> 2. Physically switch the now unused device with one of the new devices.
> 3. Use btrfs replace to replace one of the devices in the array with the 
> newly connected device (and make sure to resize to the full size of the new 
> device).
> 4. Repeat from step 2 until you aren't using any of the old devices in the 
> array.
> 5. You should have one old device left unused, physically switch it for a new 
> device.
> 6. Use btrfs device add to add the new device to the array, then run a full 
> balance.
> 
> This will result in only two balances being needed (one implicit in the 
> device delete, and the explicit final one to restripe across the full array), 
> and will result in the absolute minimum possible data transfer.

Thank you for these very explicit/succinct instructions! Also thanks to Henk 
and Duncan! I will definitely do a full balance when all disks are replaced.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to