Re: device balance times

Zygo Blaxell Fri, 24 Oct 2014 09:07:52 -0700

On Fri, Oct 24, 2014 at 06:58:25AM -0400, Rich Freeman wrote:
> On Thu, Oct 23, 2014 at 10:35 PM, Zygo Blaxell
> <ce3g8...@umail.furryterror.org> wrote:
> >
> >                 - single profile: we can tolerate zero missing disks,
> >                 so we don't allow rw mounts even if degraded.
> 
> That seems like the wrong logic here.  By all means mount read-only by
> default for safety, but there should be a way to force a read-write
> mount on any filesystem, precisely because the RAID modes can be mixed
> and even if you lose two devices on a RAID1 system not ALL the data is
> lost if you have more than two drives.


I agree, but https://bugzilla.kernel.org/show_bug.cgi?id=60594 does not:

        Stefan Behrens 2013-08-23 13:42:16 UTC
        The way out is to mount read-only, copy the data aside and be
        happy that no data was lost.

        The #1 goal (IMO) is to avoid data loss. Therefore the filesystem
        goes read-only if less devices are functional for writing than
        required by the selected RAID levels. And in order to avoid
        the surprise of a filesystem going read-only 30 seconds after
        mounting it, this is also enforced at mount time. [...]

        We could also leave this as an option to the user "mount -o
        degraded-and-I-want-to-lose-my-data", but in my opinion the use
        case is very, very exceptional.

IMHO the use case is common any time restoring the entire filesystem
from backups is inconvenient.  That covers a *lot* of users.  I never
have a machine with more than 50% of its raw disk space devoted to btrfs
because I need raw space on the disk to do mkfs+rsync from the broken
read-only btrfs filesystems.

Somewhere in the future for btrfs is online fsck; however, we're not there
yet.  The kernel still blows up over relatively minor structural errors.

FWIW I'd like to be able to mount a broken btrfs read-write, add more
storage (either grow existing disks or add new ones), and then use the new
storage as temporary space to build a cleaned copy of the old metadata
with unreachable or broken objects dropped (preferably leaving some
object behind that returns EIO when read, but can be written or deleted).
Once there is clean metadata, we can rebuild free space maps (possibly
collecting allocated orphan extents into lost+found), then the surviving
data can be rebalanced or moved fairly easily.  The grown/added disks
can be shrunk/removed at the end.

> By all means return an error when reading a file that is completely
> missing.  By all means have an extra fsck mode that goes ahead and
> deletes all the missing files (assuming it has metadata) or perhaps
> moves them all to a new "lost+notfound" subvolume or something.
> 
> Indeed, if the lost device just happens to not actually contain any
> data you might be lucky and not lose any data at all when losing a
> single device in a filesystem that entirely uses the single profile.
> That would be a bit of an edge case though, but one that is
> automatically handled if you give the admin the ability to force
> read-write/etc.
> 
> --
> Rich

signature.asc
Description: Digital signature

Re: device balance times

Reply via email to