David Sterba 於 2018-11-02 02:02 寫到:
On Thu, Nov 01, 2018 at 02:49:03PM +0800, Ethan Lien wrote:
Snapshot is expected to be fast. But if there are writers steadily
create dirty pages in our subvolume, the snapshot may take a very long
time to complete. To fix the problem, we use tagged writepage for
snapshot
flusher as we do in the generic write_cache_pages(), so we can ommit
pages
dirtied after the snapshot command.
We do a simple snapshot speed test on a Intel D-1531 box:
fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G
--direct=0 --thread=1 --numjobs=1 --time_based --runtime=120
--filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio
original: 1m58sec
patched: 6.54sec
This is the best case for this patch since for a sequential write
case,
we omit nearly all pages dirtied after the snapshot command.
For a multi writers, random write test:
fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G
--direct=0 --thread=1 --numjobs=4 --time_based --runtime=120
--filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio
original: 15.83sec
patched: 10.35sec
The improvement is less compared with the sequential write case, since
we omit only half of the pages dirtied after snapshot command.
Signed-off-by: Ethan Lien <ethanl...@synology.com>
This looks nice, thanks. I agree with the The suggestions from Nikolay,
please update and resend.
I was bit curious about the 'livelock', what you describe does not seem
to be one. System under heavy IO can make the snapshot dead slow but
can
recover from that once the IO stops.
I'm not sure if this is indeed the case of 'livelock'. I learn the term
from commit:
f446daaea9d4a420d, "mm: implement writeback livelock avoidance using
page tagging".
If this is not the case, I can use another term.
Regarding the sync semantics, there's AFAIK no change to the current
state where the sync is done before snapshot but without further other
guarantees. From that point I think it's safe to select only subset of
pages and make things faster.
As the requested changes are not functional I'll add the patch to
for-next for testing.