Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

Kai Krakow Tue, 02 May 2017 13:00:47 -0700

Am Tue, 2 May 2017 05:01:02 +0000 (UTC)
schrieb Duncan <1i5t5.dun...@cox.net>:


> Of course on-list I'm somewhat known for my arguments propounding the 
> notion that any filesystem that's too big to be practically
> maintained (including time necessary to restore from backups, should
> that be necessary for whatever reason) is... too big... and should
> ideally be broken along logical and functional boundaries into a
> number of individual smaller filesystems until such point as each one
> is found to be practically maintainable within a reasonably practical
> time frame. Don't put all the eggs in one basket, and when the bottom
> of one of those baskets inevitably falls out, most of your eggs will
> be safe in other baskets. =:^)

Hehe... Yes, you're a fan of small filesystems. I'm more from the
opposite camp, preferring one big filesystem to not mess around with
size constraints of small filesystems fighting for the same volume
space. It also gives such filesystems better chances for data locality
of data staying in totally different parts across your fs mounts and
can reduce head movement. Of course, much of this is not true if you
use different devices per filesystem, or use SSDs, or SAN where you
have no real control over the physical placement of image stripes
anyway. But well...

In an ideal world, subvolumes of btrfs would be totally independent of
each other, just only share the same volume and dynamically allocating
chunks of space from it. If one is broken, it is simply not usable and
it should be destroyable. A garbage collector would grab the leftover
chunks from the subvolume and free them, and you could recreate this
subvolume from backup. In reality, shared extents will cross subvolume
borders so it is probably not how things could work anytime in the near
of far future.

This idea is more like having thinly provisioned LVM volumes which
allocate space as the filesystems on top need them, much like doing
thinly provisioned images with a VM host system. The problem here is,
unlike subvolumes, those chunks of space could never be given back to
the host as it doesn't know if it is still in use. Of course, there's
implementations available which allow thinning the images by passing
through TRIM from the guest to the host (or by other means of
communication channels between host and guest), but that is usually not
giving good performance, if even supported.

I tried once to exploit this in VirtualBox and hoped it would translate
guest discards into hole punching requests on the host, and it's even
documented to work that way... But (a) it was horrible slow, and (b) it
was incredibly unstable to the point of being useless. OTOH, it's not
announced as a stable feature and has to be enabled by manually editing
the XML config files.

But I still like the idea: Is it possible to make btrfs still work if
one subvolume gets corrupted? Of course it should have ways of telling
the user which other subvolumes are interconnected through shared
extents so those would be also discarded upon corruption cleanup - at
least if those extents couldn't be made any sense of any longer. Since
corruption is an issue mostly of subvolumes being written to, snapshots
should be mostly safe.

Such a feature would also only make sense if btrfs had an online repair
tool. BTW, are there plans for having an online repair tool in the
future? Maybe one that only scans and fixes part of the filesystems
(for obvious performance reasons, wrt Duncans idea of handling
filesystems), i.e. those parts that the kernel discovered having
corruptions? If I could then just delete and restore affected files,
this would be even better than having independent subvolumes like above.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

Reply via email to