Re: Feature requests: online backup - defrag - change RAID level

Austin S. Hemmelgarn Tue, 10 Sep 2019 12:22:13 -0700

On 2019-09-09 15:26, webmas...@zedlx.com wrote:

This post is a reply to Remi Gauvin's post, but the email got lost so Ican't reply to him directly.


Remi Gauvin wrote on 2019-09-09 17:24 :


On 2019-09-09 11:29 a.m., Graham Cobb wrote:

 and does anyone really care about
defrag any more?).



Err, yes, yes absolutely.

I don't have any issues with the current btrfs defrag implementions, but
it's *vital* for btrfs. (which works just as the OP requested, as far as
I can tell, recursively for a subvolume)

Just booting Windows on a BTRFS virtual image, for example, will create
almost 20,000 file fragments.  Even on SSD's, you get into problems
trying to work with files that are over 200,000 fragments.

Another huge problem is rsync --inplace.  which is perfect backup
solution to take advantage of BTRFS snapshots, but fragments larges
files into tiny pieces (and subsequently creates files that are very

slow to read.).. for some reason, autodefrag doesn't catch that oneeither.


But the wiki could do a beter job of trying to explain that the snapshot
duplication of defrag only affects the fragmented portions.  As I
understand, it's really only a problem when using defrag to change
compression.



Ok, a few things.

First, my defrag suggestion doesn't EVER unshare extents. The defragshould never unshare, not even a single extent. Why? Because it violatesthe expectation that defrag would not decrease free space.

No, it should by default not unshare, but still allow the possibility ofunsharing extents. Sometimes completely removing all fragmentation ismore important than space usage.

Defrag may break up extents. Defrag may fuse extents. But it shouln'tever unshare extents.

Actually, spitting or merging extents will unshare them in a largemajority of cases.

Therefore, I doubt that the current defrag does "just as the OPrequested". Nonsense. The current implementation does the unsharing allthe time.
Second, I never used btrfs defrag in my life, despite mananging at least10 btrfs filesystems. I can't. Because, all my btrfs volumes have lot ofsubvolumes, so I'm afraid that defrag will unshare much more than I cantolerate. In my subvolumes, over 90% of data is shared. If allsubvolumes were to be unshared, the disk usage would likely increasetenfold, and that I cannot afford.
I agree that btrfs defrag is vital. But currently, it's unusable formany use cases.
Also, I don't quite understand what the poster means by "the snapshotduplication of defrag only affects the fragmented portions". Possibly itmeans approximately: if a file wasn't modified in the current (latest)subvolume, it doesn't need to be unshared. But, that would still unshareall the log files, for example, even all files that have been appended,etc... that's quite bad. Even if just one byte was appended to a logfile, then defrag will unshare the entire file (I suppose).

What it means is that defrag will only ever touch a file if that filehas extents that require defragmentation, and will then only touchextents that are smaller than the target extent size (32M by default,configurable at run-time with the `-t` option for the defrag command)and possibly those directly adjacent to such extents (because it mightmerge the small extents into larger neighbors, which will in turnrewrite the larger extent too).


This, in turn, leads to a couple of interesting behaviors:

* If you have a subvolume with snapshots , it may or may not breakreflinks between that subvolume and it's snapshots, but will not breakany of the reflinks between the snapshots themselves.* When dealing with append-only files that are significantly larger thanthe target extent size which are defragmented regularly, only extentsnear the end of the file are likely to be unshared by the operation.* If you fully defragment a subvolume, then snapshot it, then defrag itagain, the second defrag will not unshare anything unless you werewriting to the subvolume or snapshot while the second defrag was running.* There's almost no net benefit to not defragmenting when dealing withvery large files that mostly see internal rewrites (VM disk images,large databases, etc) because every internal rewrite will implicitlyunshare extents anyway.

Re: Feature requests: online backup - defrag - change RAID level

Reply via email to