Oliver Freyermuth posted on Sun, 28 Aug 2016 05:38:00 +0200 as excerpted:

> Dear btrfs experts,
> 
> I just tried to make use of btrfs send / receive for incremental backups
> (using btrbk to simplify the process).
> It seems that on my two machines, btrfs send gets stuck after
> transferring some GiB - it's not fully halted, but instead of making
> full use of the available I/O, I get something < 500 kiB on average,
> which are just some "full speed spikes" with many seconds / minutes of
> no I/O in between.
> 
> During this "halting", btrfs send eats one full CPU core.
> A "perf top" shows this is spent in "find_parent_nodes" and
> "__merge_refs" inside the kernel.
> I am using btrfs-progs 4.7 and kernel 4.7.0.
> 
> I googled a bit and found related patchwork
> (https://patchwork.kernel.org/patch/9238987/) which seems to workaround
> high load in this area and mentions a real solution is proposed but not
> yet there.
> 
> Since this affects two machines of mine and backupping my root volume
> would take about 80 hours in case I can extrapolate the average rate,
> this means btrfs send is unusable to me.
> 
> Can I assume this is a common issue which will be fixed in a later
> kernel release (4.8, 4.9) or can I do something to my FS's to workaround
> this issue?
> 
> One FS is only two weeks old, the other one now about 1 year. I did some
> balancing at some points of time to have more unallocated space for
> trimming,
> and used duperemove regularly to free space. One FS has skinny extents,
> the other has not.

The problem is as the patch says, multiple references per extent 
increases process time geometrically.

And dupremove works by doing just that, pointing multiple duplications to 
the same extents, increasing the reference count per extent, thereby 
exacerbating the problem on your system, if dupremove is actually finding 
a reasonable number of duplicates to reflink to the same extents.

The other common multi-reflink usage is snapshots, since each snapshot 
creates another reflink to each extent it snapshots.  However, being just 
a list regular and btrfs user, not a dev, and using neither dedupe nor 
snapshots nor send/receive in my own use-case, I'm not absolutely sure 
whether other snapshot references affect send/receive or whether it's 
only multiple reflinks per sent snapshot.  Either way, over a few hundred 
snapshots per subvolume or a couple thousand snapshots per filesystem, 
they do seriously affect scaling of balance and fsck, even if they don't 
actually affect send/receive so badly.

So a workaround would be reducing your duperemove usage and possibly 
rewriting (for instance via defrag) the deduped files to kill the 
multiple reflinks.  Or simply delete the additional reflinked copies, if 
your use-case allows it.

And thin down your snapshot retention if you have many snapshots per 
subvolume.  With the geometric scaling issues, thinning to under 300 per 
subvolume should be quite reasonable in nearly all circumstances, and 
thinning to under 100 per subvolume may be possible and should result in 
dramatically reduced scaling issues.

Note that the current patch doesn't really workaround the geometric 
scaling issues or extreme cpu usage bottlenecking send/receive, but 
rather, addresses the soft lockups problem due to not scheduling often 
enough to give other threads time to process.  You didn't mention 
problems with soft lockups, so it's likely to be of limited help for the 
send/receive problem.

As for the longer term, yes, it should be fixed, eventually, but keep in 
mind that btrfs isn't considered fully stable and mature yet, so this 
sort of problem isn't unexpected and indeed scaling issues like this are 
known to still be an issue, and while I haven't been tracking that red/
black tree work, in general it can be noted that btrfs fixes for this 
sort of problem often take rather longer than might be expected, so a fix 
may be more like a year or two out than a kernel cycle or two out.

Unless of course you see otherwise from someone working on this problem 
specifically, and even then, sometimes the first fix doesn't get it quite 
right, and the problem may remain for some time as more is learned about 
the ultimate issue via multiple attempts to fix it.  This has happened to 
the quota code a number of times for instance, as it as turned out to be 
a /really/ hard problem, with multiple rewrites necessary, such that even 
now, the practical recommendation is often to either just turn off quotas 
and not worry about them if you don't need them, or use a more mature 
filesystem where the quota code is known to be stable and mature, if your 
use-case depends on them.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to