On 2017年12月11日 07:44, Qu Wenruo wrote: > > > On 2017年12月10日 19:27, Tomasz Pala wrote: >> On Mon, Dec 04, 2017 at 08:34:28 +0800, Qu Wenruo wrote: >> >>>> 1. is there any switch resulting in 'defrag only exclusive data'? >>> >>> IIRC, no. >> >> I have found a directory - pam_abl databases, which occupy 10 MB (yes, >> TEN MEGAbytes) and released ...8.7 GB (almost NINE GIGAbytes) after >> defrag. After defragging files were not snapshotted again and I've lost >> 3.6 GB again, so I got this fully reproducible. >> There are 7 files, one of which is 99% of the space (10 MB). None of >> them has nocow set, so they're riding all-btrfs. >> >> I could debug something before I'll clean this up, is there anything you >> want to me to check/know about the files? > > fiemap result along with btrfs dump-tree -t2 result. > > Both output has nothing related to file name/dir name, but only some > "meaningless" bytenr, so it should be completely OK to share them. > >> >> The fragmentation impact is HUGE here, 1000-ratio is almost a DoS >> condition which could be triggered by malicious user during a few hours >> or faster > > You won't want to hear this: > The biggest ratio in theory is, 128M / 4K = 32768. > >> - I've lost 3.6 GB during the night with reasonably small >> amount of writes, I guess it might be possible to trash entire >> filesystem within 10 minutes if doing this on purpose. > > That's a little complex. > To get into such situation, snapshot must be used and one must know > which file extent is shared and how it's shared. > > But yes, it's possible. > > While on the other hand, XFS, which also supports reflink, handles it > quite well, so I'm wondering if it's possible for btrfs to follow its > behavior. > >> >>>> 3. I guess there aren't, so how could I accomplish my target, i.e. >>>> reclaiming space that was lost due to fragmentation, without breaking >>>> spanshoted CoW where it would be not only pointless, but actually >>>> harmful? >>> >>> What about using old kernel, like v4.13? >> >> Unfortunately (I guess you had 3.13 on mind), I need the new ones and >> will be pushing towards 4.14. > > No, I really mean v4.13.
My fault, it is v3.13. What a stupid error... > > From btrfs(5): > --- > Warning > Defragmenting with Linux kernel versions < 3.9 or ≥ > 3.14-rc2 as > well as with Linux stable kernel versions ≥ 3.10.31, ≥ > 3.12.12 > or ≥ 3.13.4 will break up the ref-links of CoW data (for > example files copied with cp --reflink, snapshots or > de-duplicated data). This may cause considerable increase of > space usage depending on the broken up ref-links. > --- > >> >>>> 4. How can I prevent this from happening again? All the files, that are >>>> written constantly (stats collector here, PostgreSQL database and >>>> logs on other machines), are marked with nocow (+C); maybe some new >>>> attribute to mark file as autodefrag? +t? >>> >>> Unfortunately, nocow only works if there is no other subvolume/inode >>> referring to it. >> >> This shouldn't be my case anymore after defrag (==breaking links). >> I guess no easy way to check refcounts of the blocks? > > No easy way unfortunately. > It's either time consuming (used by qgroup) or complex (manually tree > search and do the backref walk by yourself) > >> >>> But in my understanding, btrfs is not suitable for such conflicting >>> situation, where you want to have snapshots of frequent partial updates. >>> >>> IIRC, btrfs is better for use case where either update is less frequent, >>> or update is replacing the whole file, not just part of it. >>> >>> So btrfs is good for root filesystem like /etc /usr (and /bin /lib which >>> is pointing to /usr/bin and /usr/lib) , but not for /var or /run. >> >> That is something coherent with my conclusions after 2 years on btrfs, >> however I didn't expect a single file to eat 1000 times more space than it >> should... >> >> >> I wonder how many other filesystems were trashed like this - I'm short >> of ~10 GB on other system, many other users might be affected by that >> (telling the Internet stories about btrfs running out of space). > > Firstly, no other filesystem supports snapshot. > So it's pretty hard to get a baseline. > > But as I mentioned, XFS supports reflink, which means file extent can be > shared between several inodes. > > From the message I got from XFS guys, they free any unused space of a > file extent, so it should handle it quite well. > > But it's quite a hard work to achieve in btrfs, needs years development > at least. > >> >> It is not a problem that I need to defrag a file, the problem is I don't >> know: >> 1. whether I need to defrag, >> 2. *what* should I defrag >> nor have a tool that would defrag smart - only the exclusive data or, in >> general, the block that are worth defragging if space released from >> extents is greater than space lost on inter-snapshot duplication. >> >> I can't just defrag entire filesystem since it breaks links with snapshots. >> This change was a real deal-breaker here... > > IIRC it's better to add a option to make defrag snapshot-aware. > (Don't break snapshot sharing but only to defrag exclusive data) > > Thanks, > Qu > >> >> Any way to fed the deduplication code with snapshots maybe? There are >> directories and files in the same layout, this could be fast-tracked to >> check and deduplicate. >> >
signature.asc
Description: OpenPGP digital signature