> However the raid1 term only describes replication. It doesn't describe
> any policy.
yep you're right, but the most sysadmin expect some 'policies'. 

If I use RAID1 I expect that if one drive failed, I can still  boot _without_ 
boot issues, just some warnings etc, because I use raid1 to have simple 1device 
tolerance if one fails (which can happen). I can check/monitor the BTRFS RAID 
status by 'btrfs fi sh' or '(or by 'btrfs dev stat'). I also expect that if a 
device came back it will sync automatically and if I replace a device it will 
automatically rebalance the raid1 (which btrfs does, so far). I think a lot of 
sysadmins feel the same way.


On Thursday, February 7, 2019 3:19:01 PM CET Chris Murphy wrote:
> On Thu, Feb 7, 2019 at 10:37 AM Martin Steigerwald <mar...@lichtvoll.de> 
> wrote:
> >
> > Chris Murphy - 07.02.19, 18:15:
> > > > So please change the normal behavior
> > >
> > > In the case of no device loss, but device delay, with 'degraded' set
> > > in fstab you risk a non-deterministic degraded mount. And there is no
> > > automatic balance (sync) after recovering from a degraded mount. And
> > > as far as I know there's no automatic transition from degraded to
> > > normal operation upon later discovery of a previously missing device.
> > > It's just begging for data loss. That's why it's not the default.
> > > That's why it's not recommended.
> >
> > Still the current behavior is not really user-friendly. And does not
> > meet expectations that users usually have about how RAID 1 works. I know
> > BTRFS RAID 1 is no RAID 1, although it is called like this.
> 
> I mentioned the user experience is not good, in both my Feb 2 and Feb
> 5 responses, compared to mdadm and lvm raid1 in the same situation.
> 
> However the raid1 term only describes replication. It doesn't describe
> any policy. And whether to fail to mount or mount degraded by default,
> is a policy. Whether and how to transition from degraded to normal
> operation when a formerly missing device reappears, is a policy. And
> whether, and how, and when to rebuild data after resuming normal
> operation is a policy. A big part of why these policies are MIA is
> because they require features that just don't exist yet. And perhaps
> don't even belong in btrfs kernel code or user space tools; but rather
> a system service or daemon that manages such policies. However, none
> of that means Btrfs raid1 is not raid1. There's a wrong assumption
> being made about policies and features in mdadm and LVM, that they are
> somehow attached to the definition of raid1, but they aren't.
> 
> 
> > I also somewhat get that with the current state of BTRFS the current
> > behavior of not allowing a degraded mount may be better… however… I see
> > clearly room for improvement here. And there very likely will be
> > discussions like this on this list… until BTRFS acts in a more user
> > friendly way here.
> 
> And it's completely appropriate if someone wants to  update the Btrfs
> status page to make more clear what features/behaviors/policies apply
> to Btrfs raid of all types, or to have a page that summarizes their
> differences among mdadm and/or LVM raid levels, so users can better
> assess their risk taking, and choose the best Linux storage technology
> for their use case.
> 
> But at least developers know this is the case.
> 
> And actually, you could mitigate some decent amount of Btrfs missing
> features with server monitoring tools; including parsing kernel
> messages. Because right now you aren't even informed of read or write
> errors, device or csums mismatches or fixups, unless you're checking
> kernel messages. Where mdadm has the option for emailing notifications
> to an admin for such things, and lvm has a monitor that I guess does
> something I haven't used it. Literally Btrfs will only complain about
> failed writes that would cause immediate ejection of the device by md.
> 
> 
> 
> 

Reply via email to