[Qemu-devel] [Bug 595117] Re: qemu-nbd slow and missing "writeback" cache option

2010-12-10 Thread Stephane Chazelas
For the record, there's more on that bug at
http://thread.gmane.org/gmane.linux.ubuntu.bugs.server/36923

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/595117

Title:
  qemu-nbd slow and missing "writeback" cache option

Status in QEMU:
  Invalid
Status in “qemu-kvm” package in Ubuntu:
  Expired

Bug description:
  Binary package hint: qemu-kvm

dpkg -l | grep qemu
ii  kvm  
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional 
pacakge from kvm to qemu-
ii  qemu 0.12.3+noroms-0ubuntu9 
   dummy transitional pacakge from qemu to qemu
ii  qemu-common  0.12.3+noroms-0ubuntu9 
   qemu common functionality (bios, documentati
ii  qemu-kvm 0.12.3+noroms-0ubuntu9 
   Full virtualization on i386 and amd64 hardwa
ii  qemu-kvm-extras  0.12.3+noroms-0ubuntu9 
   fast processor emulator binaries for non-x86
ii  qemu-launcher1.7.4-1ubuntu2 
   GTK+ front-end to QEMU computer emulator
ii  qemuctl  0.2-2  
   controlling GUI for qemu

lucid amd64.

qemu-nbd is a lot slower when writing to disk than say nbd-server.

It appears it is because by default the disk image it serves is open with 
O_SYNC. The --nocache option, unintuitively, makes matters a bit better because 
it causes the image to be open with O_DIRECT instead of O_SYNC.

The qemu code allows an image to be open without any of those flags, but 
unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow 
the image to be open with both O_SYNC and O_DIRECT though).

The default of qemu-img (of using O_SYNC) is not very sensible because anyway, 
the client (the kernel) uses caches (write-back), (and "qemu-nbd -d" doesn't 
flush those by the way). So if for instance qemu-nbd is killed, regardless of 
whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not 
be consistent anyway, unless "syncs" are done by the client (like fsync on the 
nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those 
"sync"s will be extremely slow.

Attached is a patch that adds a --cache={off,none,writethrough,writeback} 
option to qemu-nbd.

--cache=off is the same as --nocache (that is use O_DIRECT), writethrough is 
using O_SYNC and is still the default so this patch doesn't change the 
functionality. writeback is none of those flags, so is the addition of this 
patch. The patch also does an fsync upon "qemu-nbd -d" to make sure data is 
flushed to the image before removing the nbd.

Consider this test scenario:

dd bs=1M count=100 of=a < /dev/null
qemu-nbd --cache= -c /dev/nbd0 a
cp /dev/zero /dev/nbd0
time perl -MIO::Handle -e 'STDOUT->sync or die$!' 1<> /dev/nbd0

With cache=writethrough (the default), it takes over 10 minutes to write those 
100MB worth of zeroes. Running a strace, we see the recvfrom and sentos delayed 
by each 1kb write(2)s to disk (10 to 30 ms per write).

With cache=off, it takes about 30 seconds.

With cache=writeback, it takes about 3 seconds, which is similar to the 
performance you get with nbd-server

Note that the cp command runs instantly as the data is buffered by the client 
(the kernel), and not sent to qemu-nbd until the fsync(2) is called.





[Qemu-devel] Re: [Bug 595117] Re: qemu-nbd slow and missing "writeback" cache option

2010-06-17 Thread Stephane Chazelas
2010-06-16 20:36:00 -, Dustin Kirkland:
[...]
> Could you please send that patch to the qemu-devel@ mailing list?
> Thanks!
[...]

Hi Dustin, it looks like qemu-devel is subscribed to bugs in
there, so the bug report is on the list already.

Note that I still consider it as a bug because:
  - slow performance for no good reason
  - --nocache option is misleading
  - no fsync on "-d" which to my mind is a bug.

Cheers,
Stephane

-- 
qemu-nbd slow and missing "writeback" cache option
https://bugs.launchpad.net/bugs/595117
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Invalid
Status in “qemu-kvm” package in Ubuntu: Incomplete

Bug description:
Binary package hint: qemu-kvm

dpkg -l | grep qemu
ii  kvm  
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional 
pacakge from kvm to qemu-
ii  qemu 0.12.3+noroms-0ubuntu9 
   dummy transitional pacakge from qemu to qemu
ii  qemu-common  0.12.3+noroms-0ubuntu9 
   qemu common functionality (bios, documentati
ii  qemu-kvm 0.12.3+noroms-0ubuntu9 
   Full virtualization on i386 and amd64 hardwa
ii  qemu-kvm-extras  0.12.3+noroms-0ubuntu9 
   fast processor emulator binaries for non-x86
ii  qemu-launcher1.7.4-1ubuntu2 
   GTK+ front-end to QEMU computer emulator
ii  qemuctl  0.2-2  
   controlling GUI for qemu

lucid amd64.

qemu-nbd is a lot slower when writing to disk than say nbd-server.

It appears it is because by default the disk image it serves is open with 
O_SYNC. The --nocache option, unintuitively, makes matters a bit better because 
it causes the image to be open with O_DIRECT instead of O_SYNC.

The qemu code allows an image to be open without any of those flags, but 
unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow 
the image to be open with both O_SYNC and O_DIRECT though).

The default of qemu-img (of using O_SYNC) is not very sensible because anyway, 
the client (the kernel) uses caches (write-back), (and "qemu-nbd -d" doesn't 
flush those by the way). So if for instance qemu-nbd is killed, regardless of 
whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not 
be consistent anyway, unless "syncs" are done by the client (like fsync on the 
nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those 
"sync"s will be extremely slow.

Attached is a patch that adds a --cache={off,none,writethrough,writeback} 
option to qemu-nbd.

--cache=off is the same as --nocache (that is use O_DIRECT), writethrough is 
using O_SYNC and is still the default so this patch doesn't change the 
functionality. writeback is none of those flags, so is the addition of this 
patch. The patch also does an fsync upon "qemu-nbd -d" to make sure data is 
flushed to the image before removing the nbd.

Consider this test scenario:

dd bs=1M count=100 of=a < /dev/null
qemu-nbd --cache= -c /dev/nbd0 a
cp /dev/zero /dev/nbd0
time perl -MIO::Handle -e 'STDOUT->sync or die$!' 1<> /dev/nbd0

With cache=writethrough (the default), it takes over 10 minutes to write those 
100MB worth of zeroes. Running a strace, we see the recvfrom and sentos delayed 
by each 1kb write(2)s to disk (10 to 30 ms per write).

With cache=off, it takes about 30 seconds.

With cache=writeback, it takes about 3 seconds, which is similar to the 
performance you get with nbd-server

Note that the cp command runs instantly as the data is buffered by the client 
(the kernel), and not sent to qemu-nbd until the fsync(2) is called.





Re: [Qemu-devel] [Bug 595117] Re: qemu-nbd slow and missing "writeback" cache option

2010-07-07 Thread Stephane Chazelas
2010-06-24 00:16:03 -, Jamie Lokier:
> Serge Hallyn wrote:
> > The default of qemu-img (of using O_SYNC) is not very sensible
> > because anyway, the client (the kernel) uses caches (write-back),
> > (and "qemu-nbd -d" doesn't flush those by the way). So if for
> > instance qemu-nbd is killed, regardless of whether qemu-nbd uses
> > O_SYNC, O_DIRECT or not, the data in the image will not be
> > consistent anyway, unless "syncs" are done by the client (like fsync
> > on the nbd device or sync mount option), and with qemu-nbd's O_SYNC
> > mode, those "sync"s will be extremely slow.
> 
> Do the "client syncs" cause the nbd server to fsync or fdatasync the
> file?

The clients syncs cause the data to be sent to the server. The
server then writes it to disk and each write blocks until the
data is written physically on disk with O_SYNC.

> > It appears it is because by default the disk image it serves is open
> > with O_SYNC. The --nocache option, unintuitively, makes matters a
> > bit better because it causes the image to be open with O_DIRECT
> > instead of O_SYNC.
> [...]
> > --cache=off is the same as --nocache (that is use O_DIRECT),
> > writethrough is using O_SYNC and is still the default so this patch
> > doesn't change the functionality. writeback is none of those flags,
> > so is the addition of this patch. The patch also does an fsync upon
> > "qemu-nbd -d" to make sure data is flushed to the image before
> > removing the nbd.
> 
> I really wish qemu's options didn't give the false impression
> "nocache" does less caching than "writethrough".  O_DIRECT does
> caching in the disk controller/hardware, while O_SYNC hopefully does
> not, nowadays.
[...]

Note that I use the same "none", "writethrough", "writeback" as
another utility shipped with qemu for consistency (see vl.c in
the source), I don't mind about the words as long as the
"writeback" functionality is available.

Cheers,
Stephane

-- 
qemu-nbd slow and missing "writeback" cache option
https://bugs.launchpad.net/bugs/595117
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Invalid
Status in “qemu-kvm” package in Ubuntu: Incomplete

Bug description:
Binary package hint: qemu-kvm

dpkg -l | grep qemu
ii  kvm  
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9dummy transitional 
pacakge from kvm to qemu-
ii  qemu 0.12.3+noroms-0ubuntu9 
   dummy transitional pacakge from qemu to qemu
ii  qemu-common  0.12.3+noroms-0ubuntu9 
   qemu common functionality (bios, documentati
ii  qemu-kvm 0.12.3+noroms-0ubuntu9 
   Full virtualization on i386 and amd64 hardwa
ii  qemu-kvm-extras  0.12.3+noroms-0ubuntu9 
   fast processor emulator binaries for non-x86
ii  qemu-launcher1.7.4-1ubuntu2 
   GTK+ front-end to QEMU computer emulator
ii  qemuctl  0.2-2  
   controlling GUI for qemu

lucid amd64.

qemu-nbd is a lot slower when writing to disk than say nbd-server.

It appears it is because by default the disk image it serves is open with 
O_SYNC. The --nocache option, unintuitively, makes matters a bit better because 
it causes the image to be open with O_DIRECT instead of O_SYNC.

The qemu code allows an image to be open without any of those flags, but 
unfortunately qemu-nbd doesn't have the option to do that (qemu doesn't allow 
the image to be open with both O_SYNC and O_DIRECT though).

The default of qemu-img (of using O_SYNC) is not very sensible because anyway, 
the client (the kernel) uses caches (write-back), (and "qemu-nbd -d" doesn't 
flush those by the way). So if for instance qemu-nbd is killed, regardless of 
whether qemu-nbd uses O_SYNC, O_DIRECT or not, the data in the image will not 
be consistent anyway, unless "syncs" are done by the client (like fsync on the 
nbd device or sync mount option), and with qemu-nbd's O_SYNC mode, those 
"sync"s will be extremely slow.

Attached is a patch that adds a --cache={off,none,writethrough,writeback} 
option to qemu-nbd.

--cache=off is the same as --nocache (that is use O_DIRECT), writethrough is 
using O_SYNC and is still the default so this patch doesn't change the 
functionality. writeback is none of those flags, so is the addition of this 
patch. The patch also does an fsync upon "qemu-nbd -d" to make sure data is 
flushed to the image before removing the nbd.

Consider this test scenario:

dd bs=1M count=100 of=a < /dev/null
qemu-nbd --cache= -c /dev/nbd0 a
cp /dev/zero /dev/nbd0
time perl -MIO::Handle -e 'STDOUT->sync or die$!' 1<> /dev/nbd0

With cache=writethrough (the default), it takes over 10 minutes to write those 
100MB worth of zeroes.

[Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?

2017-02-02 Thread Stephane Chazelas
Hello,

since qemu-2.7.0, doing synchronised I/O in a VM (tested with
Ubuntu 16.04 amd64 VM)  while the disk is backed by a qcow2
file sitting on a ZFS filesystem (zfs on Linux on Debian jessie
(PVE)), the performances are dreadful:

# time dd if=/dev/zero count=1000  of=b oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 21.9908 s, 23.3 kB/s
dd if=/dev/zero count=1000 of=b oflag=dsync  0.00s user 0.04s system 0% cpu 
21.992 total

(22 seconds to write that half megabyte). Same with O_SYNC or
O_DIRECT, or doing fsync() or sync_file_range() after each
write().

I first noticed it for dpkg unpacking kernel headers where dpkg
does a sync_file_range() after each file is extracted.

Note that it doesn't happen when writing anything else than
zeroes (like tr '\0' x < /dev/zero | dd count=1000  of=b
oflag=dsync). In the case of the kernel headers, I suppose the
zeroes come from the non-filled parts of the ext4 blocks.

Doing strace -fc on the qemu process, 98% of the time is spent
in the lseek() system call.

That's the lseek(SEEK_DATA) followed by lseek(SEEK_HOLE) done by
find_allocation() called to find out whether sectors are within
a hole in a sparse file.

#0  lseek64 () at ../sysdeps/unix/syscall-template.S:81
#1  0x561287cf4ca8 in find_allocation (bs=0x7fd898d7, hole=, data=, start=)
at block/raw-posix.c:1702
#2  raw_co_get_block_status (bs=0x7fd898d7, sector_num=, 
nb_sectors=40, pnum=0x7fd80dd05aac, file=0x7fd80dd05ab0) at 
block/raw-posix.c:1765
#3  0x561287cfae92 in bdrv_co_get_block_status (bs=0x7fd898d7, 
sector_num=sector_num@entry=1303680, nb_sectors=40, 
pnum=pnum@entry=0x7fd80dd05aac,
file=file@entry=0x7fd80dd05ab0) at block/io.c:1709
#4  0x561287cfafaa in bdrv_co_get_block_status (bs=bs@entry=0x7fd898d66000, 
sector_num=sector_num@entry=33974144, nb_sectors=,
nb_sectors@entry=40, pnum=pnum@entry=0x7fd80dd05bbc, 
file=file@entry=0x7fd80dd05bc0) at block/io.c:1742
#5  0x561287cfb0bb in bdrv_co_get_block_status_above (file=0x7fd80dd05bc0, 
pnum=0x7fd80dd05bbc, nb_sectors=40, sector_num=33974144, base=0x0,
bs=) at block/io.c:1776
#6  bdrv_get_block_status_above_co_entry (opaque=opaque@entry=0x7fd80dd05b40) 
at block/io.c:1792
#7  0x561287cfae08 in bdrv_get_block_status_above (bs=0x7fd898d66000, 
base=base@entry=0x0, sector_num=, nb_sectors=nb_sectors@entry=40,
pnum=pnum@entry=0x7fd80dd05bbc, file=file@entry=0x7fd80dd05bc0) at 
block/io.c:1824
#8  0x561287cd372d in is_zero_sectors (bs=, start=, count=40) at block/qcow2.c:2428
#9  0x561287cd38ed in is_zero_sectors (count=, 
start=, bs=) at block/qcow2.c:2471
#10 qcow2_co_pwrite_zeroes (bs=0x7fd898d66000, offset=33974144, count=24576, 
flags=2724114573) at block/qcow2.c:2452
#11 0x561287cfcb7f in bdrv_co_do_pwrite_zeroes (bs=bs@entry=0x7fd898d66000, 
offset=offset@entry=17394782208, count=count@entry=4096,
flags=flags@entry=BDRV_REQ_ZERO_WRITE) at block/io.c:1218
#12 0x561287cfd0cb in bdrv_aligned_pwritev (bs=0x7fd898d66000, 
req=, offset=17394782208, bytes=4096, align=1, qiov=0x0,
flags=) at block/io.c:1320
#13 0x561287cfe450 in bdrv_co_do_zero_pwritev (req=, 
flags=, bytes=, offset=,
bs=) at block/io.c:1422
#14 bdrv_co_pwritev (child=0x15, offset=17394782208, bytes=4096, 
qiov=0x7fd8a25eb08d , qiov@entry=0x0, flags=231758512) at 
block/io.c:1492
#15 0x561287cefdc7 in blk_co_pwritev (blk=0x7fd898cad540, 
offset=17394782208, bytes=4096, qiov=0x0, flags=) at 
block/block-backend.c:788
#16 0x561287cefeeb in blk_aio_write_entry (opaque=0x7fd812941440) at 
block/block-backend.c:982
#17 0x561287d67c7a in coroutine_trampoline (i0=, 
i1=) at util/coroutine-ucontext.c:78

Now, performance is really bad on ZFS for those lseek().
I believe that's https://github.com/zfsonlinux/zfs/issues/4306

Until that's fixed in ZFS, I need to find a way to avoid those
lseek()s in the first place.

One way is to downgrade to 2.6.2 where those lseek()s are not
called. The change that introduced them seems to be:

https://github.com/qemu/qemu/commit/2928abce6d1d426d37c0a9bd5f85fb95cf33f709
(and there have been further changes to improve it later).

If I understand correctly, that change was about preventing data
from being allocated when the user is writing unaligned zeroes.

I suppose the idea is that if something is trying to write
zeroes in the middle of an _allocated_ qcow2 cluster, but the
corresponding sectors in the file underneath are in a hole, we
don't want to write those zeros as that would allocate the data
at the file level.

I can see it makes sense, but in my case, the little space
efficiency it brings is largely overshadowed by the sharp
decrease in performance.

For now, I work around it by changing the "#ifdef SEEK_DATA"
to "#if 0" in find_allocation().

Note that passing detect-zeroes=off or detect-zeroes=unmap (with
discard) doesn't help (even though FALLOC_FL_PUNCH_HOLE is
supported on ZFS on Linux).

Is there any other way I 

Re: [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?

2017-02-02 Thread Stephane Chazelas
2017-02-02 16:23:53 +0100, Laszlo Ersek:
[...]
> You didn't mention what qcow2 features you use -- vmstate, snapshots,
> backing files (chains of them), compression?
> 
> Since commit 2928abce6d1d only modifies "block/qcow2.c", you could
> switch / convert the images to "raw". "raw" still benefits from sparse
> files (which ZFS-on-Linux apparently supports). Sparse files (i.e., the
> disk space savings) are the most important feature to me at least.
[...]

Thanks for the feedback.

Sorry for not mentioning it in the first place, but I do need
the vmstate and snapshots (even non-linear snapshots which means
even ZFS zvol snapshots as done by Proxmox VE are not an option
either, neither is vmdk)


I hadn't tested before now, but what I observe with raw
devices and discard=on,detect-zeroes=unmap (and the virtio-scsi
interface), is that upon those "synced writes of zeroes" into
allocated data, qemu does some

[pid 10535] fallocate(14, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 136314880, 
4096) = 0

into the disk image.

(and no lseek(SEEK_DATA/SEEK_HOLE))

which I don't see when using qcow2 images.

If the qcow2 interface was updated to do the same (punch holes
regardless instead of checking if the data is allocated
beforehand), that would also solve my problem (anything that
avoid those lseek()s being called).

Another thing I've not mentioned clearly is the versions of qemu
I have been testing with: 2.7, 2.7.1 (those two on Proxmox VE
4.4 (based on Debian jessie)) and 2.8.0 (the latter for
verification on a Debian unstable system, not with zfs).

-- 
Stephane



Re: [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?

2017-02-08 Thread Stephane Chazelas
2017-02-08 00:43:18 +0100, Max Reitz:
[...]
> OTOH, it may make sense to offer a way for the user to disable
> lseek(SEEK_{DATA,HOLE}) in our "file" block driver. That way your issue
> would be solved, too, I guess. I'll look into it.
[...]

Thanks Max,

Yes, that would work for me and other users of ZFS. What I do
for now is recompile with those lseek(SEEK_{DATA,HOLE}) disabled
in the code and it's working fine.

As I already hinted, something that would also possibly work for
me and could benefit everyone (well at least Linux users on
filesystems supporting hole punching), is instead of checking
beforehand if the file is allocated, do a
fallocate(FALLOC_FL_PUNCH_HOLE), or IOW, tell the underlying
layer to deallocate the data.

That would be those two lseek() replaced by a fallocate(), and
some extra disk space being saved.

One may argue that's what one would expect would be done when
using detect-zeroes=unmap. 

I suppose that would be quite significant work as that would
imply a framework to pass those "deallocates" down and you'd
probably have to differenciate "deallocates" that zero (like
hole punching in a reguar file), and those that don't (like
BLKDISCARD on a SSD)

I also suppose that could cause fragmentation that  would be
unwanted in some contexts, so maybe it should be tunable as
well.

-- 
Stephane




Re: [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?

2017-02-08 Thread Stephane Chazelas
2017-02-08 00:43:18 +0100, Max Reitz:
[...]
> Therefore, the patch as it is makes sense. The fact that said lseek() is
> slow on ZFS is (in my humble opinion) the ZFS driver's problem that
> needs to be fixed there.
[...]

For the record, I've mentioned the qemu performance implication at
https://github.com/zfsonlinux/zfs/issues/4306#issuecomment-277000682

Not much more I can do at that point.

That issue was raised a year ago and has not been assigned any
milestone yet.

-- 
Stephane






Re: [Qemu-devel] [qcow2] how to avoid qemu doing lseek(SEEK_DATA/SEEK_HOLE)?

2017-02-08 Thread Stephane Chazelas
2017-02-08 15:27:11 +0100, Max Reitz:
[...]
> A bit of a stupid question, but: How is your performance when using
> detect-zeroes=off?
[...]

I did try that. See:

} Note that passing detect-zeroes=off or detect-zeroes=unmap (with
} discard) doesn't help (even though FALLOC_FL_PUNCH_HOLE is
} supported on ZFS on Linux).

In my original message. It makes no difference, I still see
those lseek()s being done.

-- 
Stephane