On 2018-08-23 10:04, Stefan Malte Schumacher wrote:
Hallo,

I originally had RAID with six 4TB drives, which was more than 80
percent full. So now I bought
a 10TB drive, added it to the Array and gave the command to remove the
oldest drive in the array.

  btrfs device delete /dev/sda /mnt/btrfs-raid

I kept a terminal with "watch btrfs fi show" open and It showed that
the size of /dev/sda had been set to zero and that data was being
redistributed to the other drives. All seemed well, but now the
process stalls at 8GB being left on /dev/sda/. It also seems that the
size of the drive has been reset the original value of 3,64TiB.

Label: none  uuid: 1609e4e1-4037-4d31-bf12-f84a691db5d8
         Total devices 7 FS bytes used 8.07TiB
         devid    1 size 3.64TiB used 8.00GiB path /dev/sda
         devid    2 size 3.64TiB used 2.73TiB path /dev/sdc
         devid    3 size 3.64TiB used 2.73TiB path /dev/sdd
         devid    4 size 3.64TiB used 2.73TiB path /dev/sde
         devid    5 size 3.64TiB used 2.73TiB path /dev/sdf
         devid    6 size 3.64TiB used 2.73TiB path /dev/sdg
         devid    7 size 9.10TiB used 2.50TiB path /dev/sdb

I see no more btrfs worker processes and no more activity in iotop.
How do I proceed? I am using a current debian stretch which uses
Kernel 4.9.0-8 and btrfs-progs 4.7.3-1.

How should I proceed? I have a Backup but would prefer an easier and
less time-comsuming way out of this mess.

Not exactly what you asked for, but I do have some advice on how to avoid this situation in the future:

If at all possible, use `btrfs device replace` instead of an add/delete cycle. The replace operation requires two things. First, you have to be able to connect the new device to the system while all the old ones except the device you are removing are present. Second, the new device has to be at least as big as the old one. Assuming both conditions are met and you can use replace, it's generally much faster and is a lot more reliable than an add/delete cycle (especially when the array is near full). This is because replace just copies the data that's on the old device directly (or rebuilds it directly if it's not present anymore or corrupted), whereas the add/delete method implicitly re-balances the entire array (which takes a long time and may fail if the array is mostly full).


Now, as far as what's actually going on here, I'm unfortunately not quite sure, and therefore I'm really not the best person to be giving advice on how to fix it. I will comment that having info on the allocations for all the devices (not just /dev/sda) would be useful in debugging, but even with that I don't know that I personally can help.

Reply via email to