mkfs: Add extra condition for rootdir option

Austin S. Hemmelgarn Mon, 25 Sep 2017 04:54:42 -0700

On 2017-09-22 11:07, Qu Wenruo wrote:

On 2017年09月22日 21:33, Austin S. Hemmelgarn wrote:
On 2017-09-22 08:32, Qu Wenruo wrote:
On 2017年09月22日 19:38, Austin S. Hemmelgarn wrote:
On 2017-09-22 06:39, Qu Wenruo wrote:
As I already stated in an other thread, if you want to shrink, doit in another command line tool.Do one thing and do it simple. (Although Btrfs itself is alreadyout of the UNIX way)
Unless I'm reading the code wrong, the shrinking isn't happening ina second pass, so this _is_ doing one thing, and it appears to bedoing it as simply as possible (although arguably not correctlybecause of the 1MB reserved area being used).
If you're referring to my V1 implementation of shrink, that's doing*one* thing.
But the original shrinking code? Nope, or we won't have the customchunk allocator at all.
What I really mean is, if one wants to shrink, at least don't couplethe shrink code into "mkfs.btrfs".
Do shrink in its own tool/subcommand, not in a really unrelated tool.
There are two cases for shrinking a filesystem:
1. You're resizing it to move to a smaller disk (or speed up copyingto another disk).2. You're generating a filesystem image that needs to be as small aspossible.
I would argue there is no meaning of creating *smallest* image. (Ofcourse it exists).

There is an exact meaning given the on-disk layout. It's an image whosesize is equal to the sum of:

1. 1MB (for the reserved space at the beginning).
2. However many superblocks it should have given the size.

3. The total amount of file data and extended attribute data to beincluded, rounding up for block size4. The exact amount of metadata space needed to represent the tree from3, also rounding up for block size.5. The exact amount of system chunk space needed to handle 3 and 4, plusenough room to allocate at least one more chunk of each type (toultimately allow for resizing the filesystem if desired).

6. Exactly enough reserved metadata space to resize the FS.

We could put tons of code to implement, and more (or less) test cases toverify it.
But the demand doesn't validate the effort.

And how much effort has been put into ripping this out completelytogether with the other fixes? How much more would it have been to justmove it to another option and fix the reserved area usage?


All my points are clear for this patchset:
I know I removed one function, and my reason is:
1) No or little usage
    And it's anti intuition.

So split it to a separate tool (mkimage maybe?), and fix mkfs to behavesensibly. I absolutely agree on the fact that it's non-intuitive. Itshould either be it's own option (with a dependency on -r being passedof course), or a separate tool if you're so worried about mkfs being toocomplex.

As to usage, given the current data, there is no proof that I'm the onlyone using it, but there is also no proof that anybody other than me isusing it, which means that you can't reasonably base an argument onactual usage of this option, since you can't prove anything about usage.All you know is that you have one person who uses it, and one who wasconfused by it (but appears to want to use it in a different way). It'sa niche use case though, and when dealing with something like this,there is a threshold of usage below which you won't see much in the wayof discussion of the option on the list, since only a reasonably smallpercentage of BTRFS users are actually subscribed.

2) Dead code (not tested nor well documented)

<rant>It _IS NOT_ dead code. It is absolutely reachable from codeexternal to itself. It's potentially unused code, but that is not thesame thing as dead code.</rant>

That aside, I can fix up the documentation, and I've actually tested itreasonably thoroughly (I use it every month or so when I update stuff Ihave using seed devices, and it also gets used by my testinginfrastructure when generating images pre-loaded with files for tests tosave time). I'll agree it hasn't been rigorously tested, but it doesappear to work as (not) advertised, even when used in odd ways.

3) Possible workaround

There are three options for workarounds, and both of them are sub-par tothis even aside from the reduced simplicity it offers to userspace:1. Resize after mkfs. This is impractical both because there is nooffline resize (having to mount the FS RW prior to use as a seed devicemeans that you don't have a guaranteed reproducible image, which is apretty common request for container usage there days), and it will endup with wasted space (the smallest possible filesystem created through aresize is consistently larger (by more than 1MB) than what the -r optionto mkfs generates).2. Use a binary search to determine the smallest size to a reasonablemargin. This is impractical simply because it takes too long, and againcan't reliably get the smallest possible image.3. Attempt to compute the smallest possible image without using a binarysearch, pre-create the file, and then call mkfs. This is non-trivialwithout knowledge of the internal workings of mkfs, and is liable tobreak when something changes in mkfs (unless you want to consider theblock-level layout generated by the --rootdir option to be part of theABI and something that shouldn't change, but that is something you wouldneed to discuss with the other developers).

IOW, this is like saying that duct tape is a workaround for not havingsuper glue. It will technically work, but not anywhere near as well.

I can add several extra reasons as I stated before, but number ofreasons won't validate anything anyway.
Building software is always trading one thing for another.
I understand there may be some need for this function, but it doesn'tvalidate the cost.
And I think the fact that until recently a mail reported about theshrinking behavior has already backed up my point.

The only information it gives is that until now nobody who tried thatoption either cared enough to complain about it, or needed it to behaveany other way.

IOW, as stated above, given the current data, there is no proof that I'mthe only one using it, but there is also no proof that anybody otherthan me is using it, which means that you can't reasonably base yourargument on actual usage of this option.

Thanks,
Qu
Case 1 is obviously unrelated to creating a filesystem. Case 2however is kind of integral to the creation of the filesystem imageitself by definition, especially for a CoW filesystem because it's notpossible to shrink to the absolute smallest size due to theGlobalReserve and other things.
Similarly, there are two primary use cases for pre-loading thefilesystem with data:1. Avoiding a copy when reprovisioning storage on a system. Forexample, splitting a directory out to a new filesystem, you could usethe -r option to avoid having to copy the data after mounting thefilesystem.
2. Creating base images for systems.
The first case shouldn't need the shrinking functionality, but thesecond is a very common use case together with the second usage forshrinking a filesystem.
It may be offline shrink/balance. But not to further complexing the--rootdir option now. >And you also said that, the shrink feature is not a popular feature*NOW*, then I don't think it's worthy to implment it *NOW* either.
Implement future feature in the future please.
I'm not sure about you, but I could have sworn that he meant seeddevices weren't a popular feature right now,
Oh, sorry for my misunderstanding.
not that the shrinking is. As a general rule, the whole option ofpre-loading a filesystem with data as you're creating it is not apopular feature, because most sysadmins are much more willing totrust adding data after the filesystem is created.
Personally, given the existence of seed devices, I would absolutelyexpect there to be a quick and easy way to generate a minimalisticimage using a single command (because realistic usage of seeddevices implies minimalistic images). I agree that it shouldn't bethe default behavior, but I don't think it needs to be removedcompletely.
Just like I said in cover letter, even for ext*, it's provided bygenext2fs, not mke2fs.
Then maybe this should get split out into a separate tool instead ofjust removing it completely? There is obviously at least someinterest in this functionality.
I totally understand end-user really want a do-it-all solution.
But from developers' view, the old UNIX way is better to maintaincode clean and easy to read.
What the code is doing should have near zero impact on readability.If it did, then the BTRFS code in general is already way beyond mostpeople.
In fact, you can even create your script to do the old behavior ifyou don't care that the result may not fully take use of the space,just by:
1) Calculate the size of the whole directory
"du" command can do it easily, and it does things better than us!For
    years!
Um, no it actually doesn't do things better in all cases. it doesn'taccount for extended attributes, or metadata usage, or any number ofother things that factor into how much space a file or directory willtake up on BTRFS. It's good enough for finding what's using most ofyour space, but it's not reliable for determining how much space youneed to store that data (especially once you throw in in-linecompression).
2) Multiple the value according to the meta/data profile
    Take care of small files, which will be inlined.
    And don't forget size for data checksum.
    (BTW, there is no way to change the behavor of inlined data and data
     checksum for mkfs. unlike btrfs-convert)
This is where the issue lies. It's not possible for a person tocalculate this with reasonable accuracy, and you arguably can't evendo it for certain programmatically without some serious work.
3) Create a file with size calculated by step 2)

4) Execute "mkfs.btrfs -d <dir> <created file>"
The main issues here are that it wasn't documented well (like manyother things in BTRFS), and it didn't generate a filesystem that wasproperly compliant with the on-disk format (because it used space inthe 1MB reserved area at the beginning of the FS). Fixing thoseissues in no way requires removing the feature.
Yes, 1MB can be fixed easily (although not properly). But the wholecustomized chunk allocator is the real problem.The almost dead code is always bug-prone. Replace it with updatedgeneric chunk allocator is the way to avoid later whac-a-mole, andshould be done asap.
Agreed, but that doesn't preclude having the option to keep thegenerated image to the minimum size.
And further more, even following the existing shrink behavior, youstill need to truncate the file all by yourself.Which is no better than creating a good sized file and then mkfs onit.
Only if you pre-create the file. If the file doesn't exist, it getscreated at the appropriate size. That's part of why the chunkallocations are screwed up and stuff gets put in the first 1MB, itgenerates the FS on-the-fly and writes it out as it's generating it.
Nope, even you created the file in advance, it will still occupy thefirst 1M.
Because it doesn't assume that the file is there to begin with. It'snot trying O_CREAT and falling back to some different code if thatfails. The code assumes that the file won't be there, and handlesthings accordingly albeit incorrectly (it should seek past the first1MB, write the initial SB, and then start chunk allocation). IOW, thecode takes a shortcut in that it doesn't check for the file, and therest is written to account for that by assuming there wasn't a file.The lack of truncation just means it doesn't try to trim things downby itself if the file is already there (it assumes that you knew whatyou were doing).
Put differently, I'm fairly certain that the current -r option removesthe total size check unless the target is a device (although it mayremove the check there too and just fail when it tries to write pastthe end of the device), and will thus extend existing files to therequired size to hold the data.
BTW, you can get back the size calculation for shrink, but you willsoon find that it's just the start of a new nightmare.
Because there is no easy way to calculate the real metadata usage.

The result (and the old calculator) will be no better than guessing it.
(Well, just multiply the dir size by 2 will never go wrong)
No, it can go wrong depending on what you count as part of the size.
Thanks,
Qu
Thanks,
Qu

Sent: Friday, September 22, 2017 at 5:24 PM
From: "Anand Jain" <anand.j...@oracle.com>
To: "Qu Wenruo" <quwenruo.bt...@gmx.com>, linux-btrfs@vger.kernel.org
Cc: dste...@suse.cz
Subject: Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extracondition for rootdir option
+WARNING: Before v4.14 btrfs-progs, *--rootdir* will shrink thefilesystem,
+prevent user to make use of the remaining space.
+In v4.14 btrfs-progs, this behavior is changed, and will notshrink the fs.
+The result should be the same as `mkfs`, `mount` and then `cp -r`. +
Hmm well. Shrink to fit exactly to the size of the given
files-and-directory is indeed a nice feature. Which would help tocreatea golden-image btrfs seed device. Its not popular as of now, but atsome
point it may in the cloud environment.

Replacing this feature instead of creating a new option is not a good
idea indeed. I missed something ?

Thanks, Anand
+Also, if destination file/block device does not exist,*--rootdir* will not
+create the image file, to make it follow the normal mkfs behavior.
--
To unsubscribe from this list: send the line "unsubscribelinux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option

Reply via email to