On Wed 19 Aug 2020 05:07:11 PM CEST, Kevin Wolf wrote: >> I checked with xfs on my computer. I'm not very familiar with that >> filesystem so I was using the default options and I didn't tune >> anything. >> >> What I got with my tests (using fio): >> >> - Using extent_size_hint didn't make any difference in my test case (I >> do see a clear difference however with the test case described in >> commit ffa244c84a). > > Hm, interesting. What is your exact fio configuration? Specifically, > which iodepth are you using? I guess with a low iodepth (and O_DIRECT), > the effect of draining the queue might not be as visible.
fio --filename=/dev/vdb --direct=1 --randrepeat=1 --eta=always --ioengine=libaio --iodepth=32 --numjobs=1 --name=test --size=25G --io_limit=25G --ramp_time=5 --rw=randwrite --bs=4k --runtime=60 >> - preallocation=off is still faster than preallocation=metadata. > > Brian, can you help us here with some input? > > Essentially what we're having here is a sparse image file on XFS that > is opened with O_DIRECT (presumably - Berto, is this right?), and > Berto is seeing cases where a random write benchmark is faster if > we're doing the 64k ZERO_RANGE + 4k pwrite when touching a 64k cluster > for the first time compared to always just doing the 4k pwrite. This > is with a 1 MB extent size hint. A couple of notes: - Yes, it's O_DIRECT (the image is opened with cache=none and fio uses --direct=1). - The extent size hint is the default one, I didn't change or set anything for this test (or should I have?). > From the discussions we had the other day [1][2] I took away that your > suggestion is that we should not try to optimise things with > fallocate(), but just write the areas we really want to write and let > the filesystem deal with the sparse parts. Especially with the extent > size hint that we're now setting, I'm surprised to hear that doing a > ZERO_RANGE first still seems to improve the performance. > > Do you have any idea why this is happening and what we should be doing > with this? > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1850660 > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1666864 > >> If I disable handle_alloc_space() (so there is no ZERO_RANGE used) >> then it is much slower. > > This makes some sense because then we're falling back to writing > explicit zero buffers (unless you disabled that, too). Exactly, this happens on both ext4 and xfs. >> - With preallocation=falloc I get the same results as with >> preallocation=metadata. > > Interesting, this means that the fallocate() call costs you basically > no time. I would have expected preallocation=falloc to be a little > faster. I would expect preallocation=falloc to be at least as fast as preallocation=off (and it is, on ext4). However on xfs it seems to be slower (?). It doesn't make sense to me. >> - preallocation=full is the fastest by far. > > I guess this saves the conversion of unwritten extents to fully > allocated ones? However it is *much* *much* faster. I assume I must be missing something on how the filesystem works. I ran the test again on a newly created filesystem just to make sure, here are the full results (numbers are IOPS): |----------------------+-------+-------| | preallocation | ext4 | xfs | |----------------------+-------+-------| | off | 11688 | 6981 | | off (w/o ZERO_RANGE) | 2780 | 3196 | | metadata | 9132 | 5764 | | falloc | 13108 | 5727 | | full | 16351 | 40759 | |----------------------+-------+-------| Berto