Re: [PATCH] btrfs: Fix no space bug caused by removing bg

Austin S Hemmelgarn Tue, 22 Sep 2015 10:35:24 -0700

On 2015-09-22 11:39, Hugo Mills wrote:

On Tue, Sep 22, 2015 at 10:54:45AM -0400, Austin S Hemmelgarn wrote:

On 2015-09-22 10:36, Hugo Mills wrote:

On Tue, Sep 22, 2015 at 04:23:33PM +0200, David Sterba wrote:

On Tue, Sep 22, 2015 at 01:41:31PM +0000, Hugo Mills wrote:

On Tue, Sep 22, 2015 at 03:36:43PM +0200, Holger Hoffstätte wrote:

On 09/22/15 14:59, Jeff Mahoney wrote:
(snip)

So if they way we want to prevent the loss of raid type info is by
maintaining the last block group allocated with that raid type, fine,
but that's a separate discussion.  Personally, I think keeping 1GB


At this point I'm much more surprised to learn that the RAID type can
apparently get "lost" in the first place, and is not persisted
separately. I mean..wat?


    It's always been like that, unfortunately.

    The code tries to use the RAID type that's already present to work
out what the next allocation should be. If there aren't any chunks in
the FS, the configuration is lost, because it's not stored anywhere
else. It's one of the things that tripped me up badly when I was
failing to rewrite the chunk allocator last year.


Yeah, right now there's no persistent default for the allocator. I'm
still hoping that the object properties will magically solve that.


    There's no obvious place that filesystem-wide properties can be
stored, though. There's a userspace tool to manipulate the few current
FS-wide properties, but that's all special-cased to use the
"historical" ioctls for those properties, with no generalisation of a
property store, or even (IIRC) any external API for them.

    We're nominally using xattrs in the btrfs: namespace on directories
and files, and presumably on the top directory of a subvolume for
subvol-wide properties, but it's not clear where the FS-wide values
should go: in the top directory of subvolid=5 would be confusing,
because then you couldn't separate the properties for *that subvol*

>from the ones for the whole FS (say, the default replication policy,

where you might want the top subvol to have different properties from
everything else).

Possibly do special names for the defaults and store them there?  In
general, I personally see little value in having some special
'default' properties however.


    That would work.

The way I would expect things to work is that a new subvolume
inherits it's properties from it's parent (if it's a snapshot),


    Definitely this.

or
from the next higher subvolume it's nested in.


    I don't think I like this. I'm not quite sure why, though, at the
moment.

    It definitely makes the process at the start of allocating a new
block group much more complex: you have to walk back up through an
arbitrary depth of nested subvols to find the one that's actually got
a replication policy record in it. (Because after this feature is
brought in, there will be lots of filesystems without per-subvol
replication policies in them, and we have to have some way of dealing
with those as well).

ro-compat flag perhaps?


    With an FS default policy, you only need check the current subvol,
and then fall back to the FS default if that's not found.

    These things are, I think, likely to be lightly used: I would be
reasonably surprised to find more than two or possibly three storage
policies in use on any given system with a sane sysadmin.

    I'm actually not sure what the interactions of multiple storage
policies are going to be like. It's entirely possible, particularly
with some of the more exotic (but useful) suggestions I've thought of,
that the behaviour of the FS is dependent on the order in which the
block groups are allocated. (i.e. "20 GiB to subvol-A, then 20 GiB to
subvol-B" results in different behaviour than "1 GiB to subvol-A then
1 GiB to subvol-B and repeat"). I tried some simple Monte-Carlo
simulations, but I didn't get any concrete results out of it before
the end of the train journey. :)

Yeah, I could easily see that getting complicated when you add in the (hopefully soon) possibility of n-copy replication.

  This would obviate
the need for some special 'default' properties, and would be
relatively intuitive behavior for a significant majority of people.


    Of course, you shouldn't be nesting subvolumes anyway. It makes
it much harder to manage them.

That depends though, I only ever do single nesting (ie, a subvolume in a subvolume), and I use it to exclude stuff from getting saved in snapshots (mostly stuff like clones of public git trees, or other stuff that's easy to reproduce without a backup). Beyond that though, there are other inherent issues of course.

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PATCH] btrfs: Fix no space bug caused by removing bg

Reply via email to