> Replace doesn't need to do a balance, it's largely just a block level copy of > the device being replaced, but with some special handling so that the > filesystem is consistent throughout the whole operation. This is most of why > it's so much more efficient than add/delete.
Thanks for this correction. In the mean time I experienced myself that replace is pretty fast… Last time I wrote I thought the initial 4 day "remove missing" was successful/complete, but as it turned out that device was still missing. Maybe that Ctrl+C I tried after a few days did work after all. I only checked/noticed this after the 8 TB drive was zeroed and encrypted. Luckily, most of the "missing" data was already rebuilt onto the remaining 2 drives, and only 1.27 TiB were still "missing". In hindsight I should probably have repeated "remove missing" here, but to completion. What I did instead was a "replace -r" onto the 8 TB drive. This did successfully rebuild the missing 1.27 TiB of data onto the 8 TB drive, at a speedy ~144 MiB/s no less! So I was back to a 4-drive raid1, with 3x 6 TB drives and 1x 8 TB drive (though that 8 TB drive had very little data on it). Then I tried to "remove" (without "-r" this time) the 6 TB drive with the least amount of data on it (one had 4.0 TiB, where the other two had 5.45 TiB each). This failed after a few minutes because of "no space left on device". Austin's mail reminded me to resize due to the larger disk, which I then did, but that device still couldn't be removed, same error message. I then consulted the wiki, which mentions that space for metadata might be rather full (11.91 used of 12.66 GiB total here), and to try a "balance" with a low "dusage" in such cases. For now I avoided that by removing one of the other two (rather full) 6 TB drives at random, and this has been going on for the last 20 hours or so. Thanks to running it in a screen I can check the progress this time around, and it's doing its thing at ~41 MiB/s, or ~7 hours per TiB, on average. Maybe the "no data left on device" will sort itself out during this "remove"'s balance, otherwise I'll do it manually later. > The most efficient way of converting the array online without adding any more > disks than you have to begin with is: > 1. Delete one device from the array with device delete. > 2. Physically switch the now unused device with one of the new devices. > 3. Use btrfs replace to replace one of the devices in the array with the > newly connected device (and make sure to resize to the full size of the new > device). > 4. Repeat from step 2 until you aren't using any of the old devices in the > array. > 5. You should have one old device left unused, physically switch it for a new > device. > 6. Use btrfs device add to add the new device to the array, then run a full > balance. > > This will result in only two balances being needed (one implicit in the > device delete, and the explicit final one to restripe across the full array), > and will result in the absolute minimum possible data transfer. Thank you for these very explicit/succinct instructions! Also thanks to Henk and Duncan! I will definitely do a full balance when all disks are replaced. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html