On Fri, Feb 8, 2019 at 12:33 AM Stefan K <shado...@gmx.net> wrote: > > > However the raid1 term only describes replication. It doesn't describe > > any policy. > yep you're right, but the most sysadmin expect some 'policies'.
A sysadmin expecting policies is fine, but assuming they exist makes them a questionable sysadmin. >> If I use RAID1 I expect that if one drive failed, I can still boot >> _without_ boot issues, just some warnings etc, because I use raid1 to have >> simple 1device tolerance if one fails (which can happen). OK and we've already explained that btrfs doesn't work that way yet, which is why it has the defaults it has, but then you go on to assert that Btrfs should have the defaults YOU want based on YOUR assumptions. It's absurd. >I can check/monitor the BTRFS RAID status by 'btrfs fi sh' or '(or by 'btrfs >dev stat'). I also expect that if a device came back it will sync >automatically and if I replace a device it will automatically rebalance the >raid1 (which btrfs does, so far). I think a lot of sysadmins feel the same way. OK what you just wrote there is sufficiently incomplete that it's wrong. I and others have already described part of this behavior so if you were really comprehending what people are saying, you wouldn't have just written the above paragraph. If a missing device reappears, it is not synced automatically. If you have a two device raid1 with a missing device, and mounted degraded, data is highly likely to get written to the single remaining drive as single profile chunks; which means when you do either 'btrfs replace' or 'btrfs device add' followed by 'btrfs device remove' the data in those single chunks will *not* be replicated automatically to the replacement drive. You will have to do a manual balance and explicitly convert single chunks to raid1. If it's 3+ drives, a device replacement (of either method) should cause data to be replicated. I see a lot of sysadmins make the wrong assumptions on the linux-raid list and on LVM list, and I often read about data loss when they do that. What matters is how things actually work. When you make assumptions about how they work, you're unwittingly begging for user induced data loss, and all the complaining about missing features won't help get the data back. Over and over again telling people, you didn't understand how it worked, you didn't understand what you were doing, and yeah sorry the data is just gone. It's your responsibility to understand how things really work and fail. It isn't possible for the code to understand your expectations and act accordingly. At least you're discovering the limitations before you end up in trouble. The job of a sysadmin is to find out the difference between expectations and actual feature set, because maybe the technology being evaluated isn't a good match for the use case. -- Chris Murphy