Re: [PATCH v2] btrfs: balance dirty metadata pages in btrfs_finish_ordered_io

ethanlien Fri, 14 Dec 2018 00:07:32 -0800

Martin Raiber 於 2018-12-12 23:22 寫到:

On 12.12.2018 15:47 Chris Mason wrote:

On 28 May 2018, at 1:48, Ethan Lien wrote:


It took me a while to trigger, but this actually deadlocks ;)  More
below.

[Problem description and how we fix it]
We should balance dirty metadata pages at the end of
btrfs_finish_ordered_io, since a small, unmergeable random write can

potentially produce dirty metadata which is multiple times largerthan

the data itself. For example, a small, unmergeable 4KiB write may
produce:

16KiB dirty leaf (and possibly 16KiB dirty node) in subvolumetree

    16KiB dirty leaf (and possibly 16KiB dirty node) in checksum tree
    16KiB dirty leaf (and possibly 16KiB dirty node) in extent tree

Although we do call balance dirty pages in write side, but in the

buffered write path, most metadata are dirtied only after we reachthedirty background limit (which by far only counts dirty data pages)andwakeup the flusher thread. If there are many small, unmergeablerandom

writes spread in a large btree, we'll find a burst of dirty pages
exceeds the dirty_bytes limit after we wakeup the flusher thread -
which

is not what we expect. In our machine, it caused out-of-memoryproblem

since a page cannot be dropped if it is marked dirty.

Someone may worry about we may sleep in
btrfs_btree_balance_dirty_nodelay,
but since we do btrfs_finish_ordered_io in a separate worker, it will
not
stop the flusher consuming dirty pages. Also, we use different worker
for
metadata writeback endio, sleep in btrfs_finish_ordered_io help us
throttle
the size of dirty metadata pages.

In general, slowing down btrfs_finish_ordered_io isn't ideal becauseit

adds latency to places we need to finish quickly.  Also,
btrfs_finish_ordered_io is used by the free space cache.  Even though
this happens from its own workqueue, it means completing free space
cache writeback may end up waiting on balance_dirty_pages, something
like this stack trace:

[..]

Eventually, we have every process in the system waiting on

balance_dirty_pages(), and nobody is able to make progress on paclearpage's writebackge

writeback.

I had lockups with this patch as well. If you put e.g. a loop device on
top of a btrfs file, loop sets PF_LESS_THROTTLE to avoid a feed back
loop causing delays. The task balancing dirty pages in
btrfs_finish_ordered_io doesn't have the flag and causes slow-downs. In
my case it managed to cause a feedback loop where it queues other
btrfs_finish_ordered_io and gets stuck completely.


The data writepage endio will queue a work for
btrfs_finish_ordered_io() in a separate workqueue and clear page's
writeback, so throttling in btrfs_finish_ordered_io() should not slow
down flusher thread. One suspicious point is while the caller is
waiting a range of ordered_extents to complete, they will be
blocked until balance_dirty_pages_ratelimited() make some
progress, since we finish ordered_extents in
btrfs_finish_ordered_io().
Do you have call stack information for stuck processes or using
fsync/sync frequently? If this is the case, maybe we should pull
this thing out and try balance dirty metadata pages somewhere.

Regards,
Martin Raiber

Re: [PATCH v2] btrfs: balance dirty metadata pages in btrfs_finish_ordered_io

Reply via email to