On 2017-10-12 12:57, Chris Murphy wrote:
On Sun, Oct 8, 2017 at 10:58 AM, Kai Hendry <hen...@iki.fi> wrote:
Hi there,

My /mnt/raid1 suddenly became full somewhat expectedly, so I bought 2
new USB 4TB hard drives (one WD, one Seagate) to upgrade to.

After adding sde and sdd I started to see errors in dmesg [2].
https://s.natalian.org/2017-10-07/raid1-newdisks.txt
[2] https://s.natalian.org/2017-10-07/btrfs-errors.txt

I'm not sure what the call traces mean exactly but they seem
non-fatal. The entire dmesg might be useful to see if there are device
or bus related errors.

I have a similar modeled NUC and I can tell you for sure it does not
provide enough USB bus power for 2.5" laptop drives. They must be
externally powered, or you need a really good USB hub with an even
better power supply that can handle e.g. 4 drives at the same time to
bus power them. I had lots of problems before I fixed this, but Btrfs
managed to recover gracefully once I solved the power issue.
Same here on a pair of 3 year old NUC's. Based on the traces and the other information, I'd be willing to bet this is probably the root cause of the issues.>>
I assumed it had to perhaps with the USB bus on my NUC5CPYB being maxed
out, and to expedite the sync, I tried to remove one of the older 2TB
sdc1.  However the load went crazy and my system went completely
unstable. I shutdown the machine and after an hour I hard powered it
down since it seemed to hang (it's headless).

I've notice recent kernels hanging under trivial scrub and balance
with hard drives. It does complete, but they are really laggy and
sometimes unresponsive to anything else unless the operation is
cancelled. I haven't had time to do regression testing. My assertion
about this is in the archives, about versions I think it started with.

Sidenote: I've since learnt that removing a drive actually deletes the
contents of the drive? I don't want that. I was hoping to put that drive
into cold storage. How do I remove a drive without losing data from a
RAID1 configuration?

I'm pretty sure, but not certain of the following:  device
delete/remove is replicating chunk by chunk cow style. The entire
operation is not atomic. The chunk operations themselves are atomic. I
expect that metadata is updated as each chunk is properly replicated
so I don't think what you want is possible.
This is correct. Deleting a device first marks that device as zero size so nothing tries to allocate data there, and then runs a balance operation to force chunks onto other devices (I'm not sure if it only moves chunks that are on the device being removed though). This results in two particularly important differences from most other RAID systems:

1. The device being removed is functionally wiped (it will appear to be empty), but not physically wiped (most of the data is still there, you just can't get to it through BTRFS). 2. The process as a whole is not atomic, but as a result of how it works, it is generally possible to restart it if it got stopped part way through (and you won't usually lose much progress).

That said, even if it was technically possible to remove the drive without messing things up, it would be of limited utility. You couldn't later reconnect it and expect things to just work (you would have generation mismatches, which would hopefully cause the old disk to effectively be updated to match the new one, _IF_ the old disk even registered properly as part of the filesystem), and it would be non-trivial to get data off of it safely too (you would have to connect it to a different system, and hope that BTRFS doesn't choke on half a filesystem).

Again, pretty sure about this too, but not certain: device replace is
an atomic operation, the whole thing succeeds or fails, and at the end
merely the Btrfs signature is wiped from the deleted device(s). So you
could restore that signature and the device would be valid again;
HOWEVER it's going to have the same volume UUID as the new devices.
Even though the device UUIDs are unique, and should prevent confusion,
maybe confusion is possible.
Also correct. This is part of why it's preferred to use the replace command instead of deleting and then adding a device to replace it (the other reason being that it's significantly more efficient, especially if the filesystem isn't full).

A better way, which currently doesn't exist, is to make the raid1 a
seed device, and then add two new devices and remove the seed. That
way you get the replication you want, the instant the sprout is
mounted rw, it can be used in production (all changes go to the
sprout), while the chunks from the seed are replicated. The reason
this isn't viable right now is the tools aren't mature enough to
handle multiple devices yet. Otherwise with a single device seed to a
single sprout, this works and would be the way to do what you want.
Indeed, although it's worth noting that even with a single seed and single sprout, things aren't as well tested as most of the rest of BTRFS.

A better way that does exist is to setup an overlay for the two
original devices. Mount the overlay devices, add the new devices,
delete the overlays. So the overlay devices get the writes that cause
those devices to be invalidated. The original devices aren't really
touched. There's a way to do this with dmsetup like how live boot
media work, and there's another way I haven't ever used before that's
described here:

https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
Using block-level overlays with BTRFS is probably a bad idea for the same reasons that block-level copies are a bad idea, even with the dmsetup methods (also, most live boot media does it at the filesystem level, not the block level, it's safer and more efficient that way). Your safest bet is probably seed devices, though that of course is not very well documented.

After a reboot it failed, namely because "nofail" wasn't in my fstab and
systemd is pedantic by default. After managing to get it booting into my
system without /mnt/raid1 I faced these "open ctree failed" issues.
After running btrfs check on all the drives and getting nowhere, I
decided to unplug the new drives and I discovered that when I take out
the new 4TB WD drive, I could mount it with -o degraded.

dmesg errors with the WD include "My Passport" Wrong diagnostic page;
asked for 1 got 8 "Failed to get diagnostic page 0xffffffea" which
raised my suspicions. The model number btw is WDBYFT0040BYI-WESN

Anyway, I'm back up and running with 2x2TB  (one of them didn't finish
removing, I don't know which) & 1x4TB.


Be aware that you are likely in a very precarious position now.
Anytime raid1 volumes are mounted rw,degraded, one or more of the
devices will end up with new empty single chunks (there is a patch to
prevent this, I'm not sure if it's in 4.13). The consequence of these
new empty single chunks is that they will prevent any subsequent
degraded rw mount. You get a one time degraded,rw. Any subsquent
attempt will require ro,degraded to get it to mount. If you end up
snared in this, there are patches in the archives to inhibit the
kernels protection to allow mounting of such volumes. Super annoying.
You'll have to build a custom kernel.

My opinion is you should update backups before  you do anything else,
just in cas
Next, you have to figure out a way to get all devices to be used in
this volume healthy. Tricky as you technically have a 4 device raid 1
with a missing device. I propose first to check if you have single
chunks with either 'btrfs fi us' or 'btrfs fi df' and if so, get rid
of them with a filtered balance 'btrfs balance start
-mconvert=raid1,soft -dconvert=raid1,soft' and then in theory you
should be able to do 'btrfs delete missing' to end up with a valid
three device btrfs raid 1, which you can use until you get your USB
power supply issues sorted.
I absolutely concur with Chris here, get your backups updated, and then worry about repairing the filesystem. Or, alternatively, get your backups updated and then nuke the filesystem and rebuild it from scratch (this may be more work, but it's guaranteed to work).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to