On Fri, Feb 8, 2019 at 12:33 AM Stefan K <shado...@gmx.net> wrote:
>
> > However the raid1 term only describes replication. It doesn't describe
> > any policy.
> yep you're right, but the most sysadmin expect some 'policies'.

A sysadmin expecting policies is fine, but assuming they exist makes
them a questionable sysadmin.

>> If I use RAID1 I expect that if one drive failed, I can still  boot 
>> _without_ boot issues, just some warnings etc, because I use raid1 to have 
>> simple 1device tolerance if one fails (which can happen).

OK and we've already explained that btrfs doesn't work that way yet,
which is why it has the defaults it has, but then you go on to assert
that Btrfs should have the defaults YOU want based on YOUR
assumptions. It's absurd.


>I can check/monitor the BTRFS RAID status by 'btrfs fi sh' or '(or by 'btrfs 
>dev stat'). I also expect that if a device came back it will sync 
>automatically and if I replace a device it will automatically rebalance the 
>raid1 (which btrfs does, so far). I think a lot of sysadmins feel the same way.

OK what you just wrote there is sufficiently incomplete that it's
wrong. I and others have already described part of this behavior so if
you were really comprehending what people are saying, you wouldn't
have just written the above paragraph.

If a missing device reappears, it is not synced automatically.

If you have a two device raid1 with a missing device, and mounted
degraded, data is highly likely to get written to the single remaining
drive as single profile chunks; which means when you do either 'btrfs
replace' or 'btrfs device add' followed by 'btrfs device remove' the
data in those single chunks will *not* be replicated automatically to
the replacement drive. You will have to do a manual balance and
explicitly convert single chunks to raid1. If it's 3+ drives, a device
replacement (of either method) should cause data to be replicated.

I see a lot of sysadmins make the wrong assumptions on the linux-raid
list and on LVM list, and I often read about data loss when they do
that. What matters is how things actually work. When you make
assumptions about how they work, you're unwittingly begging for user
induced data loss, and all the complaining about missing features
won't help get the data back. Over and over again telling people, you
didn't understand how it worked, you didn't understand what you were
doing, and yeah sorry the data is just gone. It's your responsibility
to understand how things really work and fail. It isn't possible for
the code to understand your expectations and act accordingly.

At least you're discovering the limitations before you end up in
trouble. The job of a sysadmin is to find out the difference between
expectations and actual feature set, because maybe the technology
being evaluated isn't a good match for the use case.


-- 
Chris Murphy

Reply via email to