Re: Standards Problems [Was: [PATCH v2 1/3] Btrfs: get more accurate output in df command.]

Robert White Wed, 17 Dec 2014 20:08:29 -0800

I don't disagree with the _ideal_ of your patch. I just think that it'simpossible to implement it without lying to the user or making thingsjust as bad in a different way. I would _like_ you to be right. But mything is finding and quantifying failure cases and the entire questionis full of fail.

This is not an attack on you personally, it's a mismatch between thestorage and file system paradigms that we've seen first because we arethe first to really blend the two fairly.


Here is a completely legal BTRFS working set. (it's a little extreme.)


/dev/sda :: '|Sf|Sf|Sp|0f|1f|0p|0p|Mf|Mf|Mp|1p|1.25GiB-unallocated|
/dev/sdb :: '|0f|1f|0p|0p|Mp|1p| 4.75GiB-unalloated               |


Legend
p == partial, about half full.
f == full, or full enough to treat as full.
S == Single allocated chunk
0 == RAID=0 allocated chunk
1 == RAID=1 allocated chunk
M == metadata chunk

History: This filesystem started out on a single drive, then it'sbounced between RAID-0 and RAID-1 at least twice. The owner has _never_let a conversion finish. Indeed this user has just changed modes acouple times.


The current filesystem flag says RAID-1.

But we currently have .5GiB of "single" slack, 2GiB of RAID-0 slack,1GiB of RAID-1 slack, 2GiB of space where a total of 1GiB more RIAD1extents can be created, and we have 3GiB of space on /dev/sdb that _can__not_ be allocated. We have room for 1 more metadata extent on eachdrive, but if we allocate two more metadat extents on each drive we willburn up 1.25 GiB by reducing it to 0.75GiB.


First, a question.

Will a BTRFS in RAID1 mode add file data to extents that are in othermodes? That is, will the filesystem _use_ the 2.5GiB of available"single" and "RAID0" data? If no, then that's 2.5GiB of "phantomconsumption" space that insn't "used" but also isn't usable.

The size of the store is 20GiB. The default of 2x10GiB you propose wouldbe 10GiB. But how do you identify the 3GiB "missing" because of thelopsided allocation history?


Seem unlikely? The rotten cod example I've given is unlikely.

But a more even case is downright common and likely. Say you run a niceold-fashoned MUTT mail-spool. "most" of your files are small enough tolive in metadata. You start with one drive. and allocate 2 single-dataand 10 metatata (5xDup). Then you add a second drive of equal size. (themetadata just switched to DUP-as-RAID1-alike mode) And then you do adconvert=raid0.

That uneven allocation of metadata will be a 2GiB difference between thetwo drives forever.


So do you shave 2GiB off of your @size?
Do you shave @2GiB off your @available?

Do you overreport your available by @2GiB and end up _still_ havingthings "available" when you get your ENOSPC?


How about this ::

/dev/sda == |Sf|Sf|Mf|Mf|Mf|Mf|Sf|Sf|Sp|Mp|Mp| .5GiB free|
/dev/sdb == |10 GiB free                                 |

Operator fills his drive, then adds a second one, then _foolishly_ triesto convert it to RAID0 when the power fails. In order to check the FS heboots with no_balance. Then his maintenance window closes and he has togo back into production, at which point he forgets (or isn't allowed) todo the balance. The flags are set but now no more extents can be allocated.


Size is 20GiB, slack is 10.5GiB. Operator is about to get ENOSPACE.


Yes a balance would fix it, but that's not the question.

In the meantime what does your patch report?

Or...

/dev/sda == |Sf|Sf|Mf|Mf|Mf|Mf|Sf|Sf|Sp|Mp|Mp| .5GiB free|
/dev/sdb == |10 GiB free                                 |
/dev/sdc == |10 GiB free                                 |

Does a -dconvert=raid5 and immediately gets ENOSPC for all the blocks.According to the flags we've got 10GiB free...

Or we end up with an egregious metadata history from lots of small filesand we've got a perfectly fine RAID1 with several GiB of slack but noneof that slack is 1GiB contiguous. All the slack has just come fromreclaiming metadata.


/dev/sda == |Sf|Sf|Mp|Mp|Rx|Rx|Mp|Mp|Rx|Rx|Mp|Mp| N-free slack|

(R == reclaimed, e.g. avalable to extent-tree.c for allocation)

We have a 1.5GB of "poisoned" space here; it can hold metadata but notdata. So is that 1.5 in your @available calculation? How do you mark itup as used.

...

And I've been ingoring the Mp(s) completely. What if I've got a good twoGiB of partial space in the metadata, but that's all I've got. You writea file of any size and you'll get ENOSPC even though you've got thatGiB. Was it in @size? Is it in @avail?

...

See you keep giving me these examples where the history of thefilesystem is uniform. It was made a certain way and stayed that way.But in real life this sort of thing is going to happen and your patchsimply report's a _different_ _wrong_ number. A _friendlier_ wrongnumber, I'll grant you that, but still wrong.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Standards Problems [Was: [PATCH v2 1/3] Btrfs: get more accurate output in df command.]

Reply via email to