> However the raid1 term only describes replication. It doesn't describe > any policy. yep you're right, but the most sysadmin expect some 'policies'.
If I use RAID1 I expect that if one drive failed, I can still boot _without_ boot issues, just some warnings etc, because I use raid1 to have simple 1device tolerance if one fails (which can happen). I can check/monitor the BTRFS RAID status by 'btrfs fi sh' or '(or by 'btrfs dev stat'). I also expect that if a device came back it will sync automatically and if I replace a device it will automatically rebalance the raid1 (which btrfs does, so far). I think a lot of sysadmins feel the same way. On Thursday, February 7, 2019 3:19:01 PM CET Chris Murphy wrote: > On Thu, Feb 7, 2019 at 10:37 AM Martin Steigerwald <mar...@lichtvoll.de> > wrote: > > > > Chris Murphy - 07.02.19, 18:15: > > > > So please change the normal behavior > > > > > > In the case of no device loss, but device delay, with 'degraded' set > > > in fstab you risk a non-deterministic degraded mount. And there is no > > > automatic balance (sync) after recovering from a degraded mount. And > > > as far as I know there's no automatic transition from degraded to > > > normal operation upon later discovery of a previously missing device. > > > It's just begging for data loss. That's why it's not the default. > > > That's why it's not recommended. > > > > Still the current behavior is not really user-friendly. And does not > > meet expectations that users usually have about how RAID 1 works. I know > > BTRFS RAID 1 is no RAID 1, although it is called like this. > > I mentioned the user experience is not good, in both my Feb 2 and Feb > 5 responses, compared to mdadm and lvm raid1 in the same situation. > > However the raid1 term only describes replication. It doesn't describe > any policy. And whether to fail to mount or mount degraded by default, > is a policy. Whether and how to transition from degraded to normal > operation when a formerly missing device reappears, is a policy. And > whether, and how, and when to rebuild data after resuming normal > operation is a policy. A big part of why these policies are MIA is > because they require features that just don't exist yet. And perhaps > don't even belong in btrfs kernel code or user space tools; but rather > a system service or daemon that manages such policies. However, none > of that means Btrfs raid1 is not raid1. There's a wrong assumption > being made about policies and features in mdadm and LVM, that they are > somehow attached to the definition of raid1, but they aren't. > > > > I also somewhat get that with the current state of BTRFS the current > > behavior of not allowing a degraded mount may be better… however… I see > > clearly room for improvement here. And there very likely will be > > discussions like this on this list… until BTRFS acts in a more user > > friendly way here. > > And it's completely appropriate if someone wants to update the Btrfs > status page to make more clear what features/behaviors/policies apply > to Btrfs raid of all types, or to have a page that summarizes their > differences among mdadm and/or LVM raid levels, so users can better > assess their risk taking, and choose the best Linux storage technology > for their use case. > > But at least developers know this is the case. > > And actually, you could mitigate some decent amount of Btrfs missing > features with server monitoring tools; including parsing kernel > messages. Because right now you aren't even informed of read or write > errors, device or csums mismatches or fixups, unless you're checking > kernel messages. Where mdadm has the option for emailing notifications > to an admin for such things, and lvm has a monitor that I guess does > something I haven't used it. Literally Btrfs will only complain about > failed writes that would cause immediate ejection of the device by md. > > > >