On Fri, Jul 5, 2019 at 6:05 PM Vladimir Panteleev <thecybersha...@gmail.com> wrote:
> On 05/07/2019 21.43, Chris Murphy wrote: > > But I can't tell from the > > above exactly when each drive was disconnected. In this scenario you > > need to convert to raid1 first, wait for that to complete successfully > > before you can do a device remove. That's clear. Also clear is you > > must use 'btrfs device remove' and it must complete before that device > > is disconnected. > > Unfortunately as mentioned before that wasn't an option. I was > performing this operation on a DM snapshot target backed by a file that > certainly could not fit the result of a RAID10-to-RAID1 rebalance. Then the total operation isn't possible. Maybe you could have made the volume a seed, and then create a single device sprout on a new single target, and later convert that sprout to raid1. But I'm not sure of the state of multiple device seeds. > > > What I've never tried, but the man page implies, is you can specify > > two devices at one time for 'btrfs device remove' if the profile and > > the number of devices permits it. > > What I found surprising, was that "btrfs device delete missing" deletes > exactly one device, instead of all missing devices. But, that might be > simply because a device with RAID10 blocks should not have been > mountable rw with two missing drives in the first place. It's a really good question for developers if there is a good reason to permit rw mount of a volume that's missing two or more devices for raid 1, 10, or 5; and missing three or more for raid6. I cannot think of a good reason to allow degraded,rw mounts for a raid10 missing two devices. > > This is actually worse, potentially because it means there's only one > > copy of the system chunk on sdd1. It has not been replicated to sdf1, > > but is on the missing device. > > I'm sorry, but that's not right. As I mentioned in my second email, if I > use btrfs device replace, then it successfully rebuilds all missing > data. So, there is no lost data with no remaining copies; btrfs is > simply having some trouble moving it off of that device. > > Here is the filesystem info with a loop device replacing the missing drive: > > https://dump.thecybershadow.net/9a0c88c3720c55bcf7fee98630c2a8e1/00%3A02%3A17-upload.txt Wow that's really interesting. So you did 'btrfs replace start' for one of the missing drive devid's, with a loop device as the replacement, and that worked and finished?! Does this three device volume mount rw and not degraded? I guess it must have because 'btrfs fi us' worked on it. devid 1 size 7.28TiB used 2.71TiB path /dev/sdd1 devid 2 size 7.28TiB used 22.01GiB path /dev/loop0 devid 3 size 7.28TiB used 2.69TiB path /dev/sdf1 OK so what happens now if you try to 'btrfs device remove /dev/loop0' ? > > > Depending on degraded operation for this task is the wrong strategy. > > You needed to 'btrfs device delete/remove' before physically > > disconnecting these drives. > > > > OK you definitely did this incorrectly if you're expecting to > > disconnect two devices at the same time, and then "btrfs device delete > > missing" instead of explicitly deleting drives by ID before you > > physically disconnect them. > > I don't disagree in general, however, I did make sure that all data was > accessible with two devices before proceeding with this endeavor. Well there's definitely something screwy if Btrfs needs something on a missing drive, which is indicated by its refusal to remove it from the volume, and yet at same time it's possible to e.g. rsync every file to /dev/null without any errors. That's a bug somewhere. > >> OK so what did you do, in order, each command, interleaving the > >> physical device removals. > > Well, at this point, I'm still quite confident that the BTRFS kernel bug > is unrelated to this entire RAID10 thing, but I'll do so if you like. > Unfortunately I do not have an exact record of this, but I can do my > best to reconstruct it from memory. I'm not a developer but a dev very well might need to have a simple reproducer for this in order to locate the problem. But the call trace might tell them what they need to know. I'm not sure. > > The reason I'm doing this in the first place is that I'm trying to split > a 4-drive RAID10 array that was getting full. The goal was to move some > data off of it to a new array, then delete it from its original > location. I couldn't use rsync because most of the data was in > snapshots, and I couldn't use btrfs send/receive because it bugs out > with the old "chown oXXX-XXXXXXX-0 failed: No such file or directory" > bug. So, my idea was: I'm not familiar with that bug. That sounds like a receive side bug not a send side bug. I wonder if receive will continue if you use the -E 0 option, and the result will just be wrong owner on a few files. > > 1. Use device mapper to create a COW copy of all four devices, and > operate on those (make the SATA devices read-only to ensure they're not > touched) > 2. Use btrfs-tune to change the UUID of the new filesystem > 3. Delete 75%-ish of data off of the COW copy > 4. Somehow convert the 4-disk RAID10 to 2-disk RAID1 without incurring a > ton of writes to the COW copies > 5. dd the contents of the COW copies to two new real disks > 6. After ensuring the remaining data is safe on the new disks, delete it > from the original array. > > For steps 2 and 3, I needed to specify the exact devices to work with. > It's possible to specify the device list when mounting with -o device=, > but for btrfstune, I had to bind-mount a fake partitions file over > /proc/partitions. I can share the scripts I used for all this if you like. No, it's fine. > Have you had a chance to look at the kernel stack trace yet? It looks > like it's running out of temporary space to perform a relocation. I > think that is where we should be concentrating on. I've looked at it but I can't really follow it. The comments in the code don't really tell me much either other than Btrfs is confused, and so you're seeing the warning and then error -28. It may really be running out of global reserve for this operation, I can't really tell. Qu will understand this better. -- Chris Murphy