Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-08 Thread Paolo Bonzini
Il 08/01/2014 23:53, Richard W.M. Jones ha scritto:
> On Wed, Jan 08, 2014 at 11:45:39PM +0100, Paolo Bonzini wrote:
>> Il 08/01/2014 23:24, Richard W.M. Jones ha scritto:
>>> It's extremely difficult to know when it's safe to add this parameter.
>>> Qemu gives no indication of when using discard=.. is safe (ie. won't
>>> cause qemu to fail to start up or fail in some other way).  It's even
>>> worse when we have to go via libvirt which itself doesn't expose
>>> qemu's capabilities upwards.
>>
>> It is a bug that "-help" doesn't list discard=on, but QMP
>> query-command-line-options lists it correctly.
>>
>> libvirt could safely ignore discard if the underlying QEMU does not
>> support it, but that's not how it was implemented.  Currently,
>> explicitly specifying either discard='on' and discard='off' will cause
>> the VM to fail to start if QEMU does not support it.  There are
>> tradeoffs in both solutions...
> 
> That sucks .. for me ...
> 
> Can't we have an option like discard=ifpossible?  libguestfs would use
> this, since we'd always prefer to honour discard requests from our
> kernel.

That would be a libvirt option, not QEMU.

Paolo



Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-08 Thread Richard W.M. Jones
On Wed, Jan 08, 2014 at 11:45:39PM +0100, Paolo Bonzini wrote:
> Il 08/01/2014 23:24, Richard W.M. Jones ha scritto:
> > It's extremely difficult to know when it's safe to add this parameter.
> > Qemu gives no indication of when using discard=.. is safe (ie. won't
> > cause qemu to fail to start up or fail in some other way).  It's even
> > worse when we have to go via libvirt which itself doesn't expose
> > qemu's capabilities upwards.
> 
> It is a bug that "-help" doesn't list discard=on, but QMP
> query-command-line-options lists it correctly.
> 
> libvirt could safely ignore discard if the underlying QEMU does not
> support it, but that's not how it was implemented.  Currently,
> explicitly specifying either discard='on' and discard='off' will cause
> the VM to fail to start if QEMU does not support it.  There are
> tradeoffs in both solutions...

That sucks .. for me ...

Can't we have an option like discard=ifpossible?  libguestfs would use
this, since we'd always prefer to honour discard requests from our
kernel.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v



Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-08 Thread Paolo Bonzini
Il 08/01/2014 23:24, Richard W.M. Jones ha scritto:
> On Wed, Jan 08, 2014 at 11:11:35PM +0100, Paolo Bonzini wrote:
>> Is guestfish using "discard=on"?
> 
> No.
> 
> Adding the discard=on parameter does indeed fix this:
> 
> 13M/tmp/test1
> 17M/tmp/test2
> 
> However why isn't this the default?  Is there a case where discard=on
> would be undesirable?

It is always safe if supported.  However, it may cause performance
problems, because discarding data from images may make them fragmented,
or cause files to have a lot of extents.  Similarly for block devices
backed by some kind of thin-provisioning NAS (instead it should always
be okay for SSDs).

Unfortunately neither Linux nor the block devices really provide the
information you need to know whether discard can cause these problems.

> It's extremely difficult to know when it's safe to add this parameter.
> Qemu gives no indication of when using discard=.. is safe (ie. won't
> cause qemu to fail to start up or fail in some other way).  It's even
> worse when we have to go via libvirt which itself doesn't expose
> qemu's capabilities upwards.

It is a bug that "-help" doesn't list discard=on, but QMP
query-command-line-options lists it correctly.

libvirt could safely ignore discard if the underlying QEMU does not
support it, but that's not how it was implemented.  Currently,
explicitly specifying either discard='on' and discard='off' will cause
the VM to fail to start if QEMU does not support it.  There are
tradeoffs in both solutions...

Paolo



Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-08 Thread Richard W.M. Jones
On Wed, Jan 08, 2014 at 11:11:35PM +0100, Paolo Bonzini wrote:
> Is guestfish using "discard=on"?

No.

Adding the discard=on parameter does indeed fix this:

13M/tmp/test1
17M/tmp/test2

However why isn't this the default?  Is there a case where discard=on
would be undesirable?

It's extremely difficult to know when it's safe to add this parameter.
Qemu gives no indication of when using discard=.. is safe (ie. won't
cause qemu to fail to start up or fail in some other way).  It's even
worse when we have to go via libvirt which itself doesn't expose
qemu's capabilities upwards.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW



Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-08 Thread Paolo Bonzini
Il 07/01/2014 22:22, Richard W.M. Jones ha scritto:
> On Tue, Jan 07, 2014 at 09:48:17PM +0100, Paolo Bonzini wrote:
>> Il 07/01/2014 21:27, Richard W.M. Jones ha scritto:
>>> Not much more what I said in the original email (especially see the
>>> attached script which you can download from the bottom of this page:
>>> https://lists.gnu.org/archive/html/qemu-devel/2014-01/msg00084.html )
>>>
>>> Basically it tries to dd /dev/zero into the virtio-scsi device exposed
>>> by qemu, then calls sg_unmap (there are two devices, it only unmaps
>>> the first so we can hopefully see the difference), but it doesn't seem
>>> to have any effect on the underlying file.  The underlying file is a
>>> regular raw-format file on ext4.
>>>
>>> I called sg_readcap/sg_vpd and we seem to have all the right
>>> capability bits exposed.
>>>
>>> This script won't work with regular libguestfs.  I compiled a special
>>> appliance that had the sg tools included.
>>
>> Try again with the pull request of
>> http://permalink.gmane.org/gmane.comp.emulators.qemu/248421
> 
> No difference from before, as far as I can see.
> 
> Here is the output of sparsetest.sh:

Is guestfish using "discard=on"?  Here is my test:

$ qemu-img create test.img 32M
Formatting 'test.img', fmt=raw size=33554432
$ qemu-img map --output=json test.img
[{ "start": 0, "length": 33554432, "depth": 0, "zero": true, "data":
false, "offset": 0}]
$ qemu-system-x86_64 ~/rhel6.img \
  -drive if=none,cache=none,discard=on,file=test.img,id=test \
  -device virtio-scsi-pci -device scsi-disk,drive=test \
  --enable-kvm -m 512

  
  In guest
  

  # sg_readcap /dev/sdb
  Device size: 33554432 bytes, 32.0 MiB, 0.03 GB
  # cat /sys/block/sdb/device/scsi_disk/*/provisioning_mode
  unmap
  # yes | mkfs.ext4 /dev/sdb
  # mount /dev/sdb test
  # dd if=/dev/zero of=test/test bs=1M

$ du -sh test.img
32M test.img

  
  In guest
  

  # rm xfs/test
  (sync here if it does not work)
  # fstrim -v xfs/
  xfs/: 27891712 bytes were trimmed

$ du -sh test.img
5.2Mtest.img





Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-07 Thread Richard W.M. Jones
On Tue, Jan 07, 2014 at 09:48:17PM +0100, Paolo Bonzini wrote:
> Il 07/01/2014 21:27, Richard W.M. Jones ha scritto:
> > Not much more what I said in the original email (especially see the
> > attached script which you can download from the bottom of this page:
> > https://lists.gnu.org/archive/html/qemu-devel/2014-01/msg00084.html )
> > 
> > Basically it tries to dd /dev/zero into the virtio-scsi device exposed
> > by qemu, then calls sg_unmap (there are two devices, it only unmaps
> > the first so we can hopefully see the difference), but it doesn't seem
> > to have any effect on the underlying file.  The underlying file is a
> > regular raw-format file on ext4.
> > 
> > I called sg_readcap/sg_vpd and we seem to have all the right
> > capability bits exposed.
> > 
> > This script won't work with regular libguestfs.  I compiled a special
> > appliance that had the sg tools included.
> 
> Try again with the pull request of
> http://permalink.gmane.org/gmane.comp.emulators.qemu/248421

No difference from before, as far as I can see.

Here is the output of sparsetest.sh:

0/tmp/test1
0/tmp/test2
Read Capacity results:
   Protection: prot_en=0, p_type=0, p_i_exponent=0
   Logical block provisioning: lbpme=1, lbprz=0
   Last logical block address=204799 (0x31fff), Number of logical blocks=204800
   Logical block length=512 bytes
   Logical blocks per physical block exponent=0
   Lowest aligned logical block address=0
Hence:
   Device size: 104857600 bytes, 100.0 MiB, 0.10 GB
Block limits VPD page (SBC):
  Write same no zero (WSNZ): 1
  Maximum compare and write length: 0 blocks
  Optimal transfer length granularity: 0 blocks
  Maximum transfer length: 0 blocks
  Optimal transfer length: 0 blocks
  Maximum prefetch length: 0 blocks
  Maximum unmap LBA count: 2097152
  Maximum unmap block descriptor count: 255
  Optimal unmap granularity: 8
  Unmap granularity alignment valid: 0
  Unmap granularity alignment: 0
  Maximum write same length: 0x0 blocks

16M   /tmp/test1   <--- note both file disk
16M   /tmp/test2   <--- usages are the same

Those are raw files on ext4.  I'll try qcow2 and follow up.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v



Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-07 Thread Richard W.M. Jones
Using qcow2 format, it also doesn't appear to work:

$ /tmp/sparsetest.sh 
Formatting '/tmp/test1', fmt=qcow2 size=104857600 encryption=off 
cluster_size=65536 lazy_refcounts=off 
Formatting '/tmp/test2', fmt=qcow2 size=104857600 encryption=off 
cluster_size=65536 lazy_refcounts=off 
136K   /tmp/test1
136K   /tmp/test2
Read Capacity results:
   Protection: prot_en=0, p_type=0, p_i_exponent=0
   Logical block provisioning: lbpme=1, lbprz=0
   Last logical block address=204799 (0x31fff), Number of logical blocks=204800
   Logical block length=512 bytes
   Logical blocks per physical block exponent=0
   Lowest aligned logical block address=0
Hence:
   Device size: 104857600 bytes, 100.0 MiB, 0.10 GB
Block limits VPD page (SBC):
  Write same no zero (WSNZ): 1
  Maximum compare and write length: 0 blocks
  Optimal transfer length granularity: 0 blocks
  Maximum transfer length: 0 blocks
  Optimal transfer length: 0 blocks
  Maximum prefetch length: 0 blocks
  Maximum unmap LBA count: 2097152
  Maximum unmap block descriptor count: 255
  Optimal unmap granularity: 8
  Unmap granularity alignment valid: 0
  Unmap granularity alignment: 0
  Maximum write same length: 0x0 blocks

17M   /tmp/test1
17M   /tmp/test2

$ ll -h /tmp/test{1,2}
-rw-r--r--. 1 rjones rjones 17M Jan  7 21:24 /tmp/test1
-rw-r--r--. 1 rjones rjones 17M Jan  7 21:24 /tmp/test2
$ qemu-img info /tmp/test1
image: /tmp/test1
file format: qcow2
virtual size: 100M (104857600 bytes)
disk size: 16M
cluster_size: 65536
$ qemu-img info /tmp/test2
image: /tmp/test2
file format: qcow2
virtual size: 100M (104857600 bytes)
disk size: 16M
cluster_size: 65536

--

A frustrating aspect of this is there's no diagnostics or way to probe
if UNMAP is supported all the way through.

This will be critical for virt-sparsify, since we'd like to be able to
tell the user in advance whether or not in-place sparsification is
going to work, and even better, why not.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org



Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-07 Thread Paolo Bonzini
Il 07/01/2014 21:27, Richard W.M. Jones ha scritto:
> Not much more what I said in the original email (especially see the
> attached script which you can download from the bottom of this page:
> https://lists.gnu.org/archive/html/qemu-devel/2014-01/msg00084.html )
> 
> Basically it tries to dd /dev/zero into the virtio-scsi device exposed
> by qemu, then calls sg_unmap (there are two devices, it only unmaps
> the first so we can hopefully see the difference), but it doesn't seem
> to have any effect on the underlying file.  The underlying file is a
> regular raw-format file on ext4.
> 
> I called sg_readcap/sg_vpd and we seem to have all the right
> capability bits exposed.
> 
> This script won't work with regular libguestfs.  I compiled a special
> appliance that had the sg tools included.

Try again with the pull request of
http://permalink.gmane.org/gmane.comp.emulators.qemu/248421

Paolo



Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-07 Thread Richard W.M. Jones
On Tue, Jan 07, 2014 at 03:48:54PM +0100, Paolo Bonzini wrote:
> Il 02/01/2014 17:15, Richard W.M. Jones ha scritto:
> > 
> > My (possibly weak) understanding of the upstream qemu code is that
> > unmap/discard/trim is not supported in qcow2.  It is only supported in
> > raw files when using a POSIX-like host OS which has either of:
> > 
> >  - block devices supporting BLKDISCARDZEROES
> >  - files on XFS
> >  - files on other filesystems that support FALLOC_FL_PUNCH_HOLE (eg ext4)
> 
> It doesn't have to support BLKDISCARDZEROES, only BLKDISCARD.  I test it
> with scsi_debug using both lbprz=0 and lbprz=1 (which becomes
> BLKDISCARDZEROES unset and set respectively).
> 
> Otherwise this is correct.
> 
> > Having said that, I did some tests using libguestfs and I could not
> > show that unmap was working, either using raw or qcow2 (both on ext4),
> > with virtio-scsi, and recent kernel & qemu.  I did not see any errors,
> > but also I don't see what I'm doing wrong.
> 
> Can you share more?

Not much more what I said in the original email (especially see the
attached script which you can download from the bottom of this page:
https://lists.gnu.org/archive/html/qemu-devel/2014-01/msg00084.html )

Basically it tries to dd /dev/zero into the virtio-scsi device exposed
by qemu, then calls sg_unmap (there are two devices, it only unmaps
the first so we can hopefully see the difference), but it doesn't seem
to have any effect on the underlying file.  The underlying file is a
regular raw-format file on ext4.

I called sg_readcap/sg_vpd and we seem to have all the right
capability bits exposed.

This script won't work with regular libguestfs.  I compiled a special
appliance that had the sg tools included.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top



Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-07 Thread Paolo Bonzini
Il 02/01/2014 17:15, Richard W.M. Jones ha scritto:
> 
> My (possibly weak) understanding of the upstream qemu code is that
> unmap/discard/trim is not supported in qcow2.  It is only supported in
> raw files when using a POSIX-like host OS which has either of:
> 
>  - block devices supporting BLKDISCARDZEROES
>  - files on XFS
>  - files on other filesystems that support FALLOC_FL_PUNCH_HOLE (eg ext4)

It doesn't have to support BLKDISCARDZEROES, only BLKDISCARD.  I test it
with scsi_debug using both lbprz=0 and lbprz=1 (which becomes
BLKDISCARDZEROES unset and set respectively).

Otherwise this is correct.

> Having said that, I did some tests using libguestfs and I could not
> show that unmap was working, either using raw or qcow2 (both on ext4),
> with virtio-scsi, and recent kernel & qemu.  I did not see any errors,
> but also I don't see what I'm doing wrong.

Can you share more?

Paolo



Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-05 Thread Stefan Hajnoczi
On Mon, Dec 30, 2013 at 07:58:29PM +0800, Teng-Feng Yang wrote:
> I have been studying QCOW2 file format for a couple of days, and I am
> a little bit confused about whether QCOW2 supports UNMAP or not.

Discard is an area that has seen a lot of development activity over the
past year or two.  That means it's still relatively new, you may find
outdated information online, etc.

> As I surf through internet, some mailing list discussion had mentioned
> that qemu-nbd and nbd module both support UNMAP command.

Yes:
 * qemu-nbd since QEMU 1.1 supports TRIM
 * nbd.ko since Linux 3.7 supports discard

> So I follow the steps below on my machine (Ubuntu 13.10 with linux
> kernel 3.12) to test if qemu-nbd and QCOW2 do support UNMAP.
> 
> 1. Create a qcow2 file via qemu-img
> > sudo qemu-img create -f qcow2 -o cluster_size=524288 base.qcow2 1G
> 
> 2. Connect this qcow2 file with qemu-nbd
> > sudo qemu-nbd -c /dev/nbd0 base.qcow2 --discard=unmap
> 
> 3. Use sg_unmap command to issue UNMAP command to this NBD
> > sudo sg_unmap --lba=0 --num=1 /dev/nbd0
> 
> Everytime I get the following error message:
> 
> unmap cdb: 42 00 00 00 00 00 00 00 18 00
> unmap: pass through os error: Inappropriate ioctl for device
> UNMAP failed (use '-v' to get more information)
> 
> I also try to format this nbd device with EXT4 and mount it, but still
> cannot perform fstrim on the mount point.

NBD isn't a SCSI device so sending UNMAP commands doesn't work.  I think
you need to issue ioctl(BLKDISCARD) instead.  See blkdiscard(8).

Also, make sure to use qemu.git/master if you want qcow2 discard
support.  I didn't check the details but the unmap implementation for
qcow2 has recently been added/modified.

Stefan



Re: [Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2014-01-02 Thread Richard W.M. Jones
On Mon, Dec 30, 2013 at 07:58:29PM +0800, Teng-Feng Yang wrote:
> I have been studying QCOW2 file format for a couple of days, and I am
> a little bit confused about whether QCOW2 supports UNMAP or not.
> As I surf through internet, some mailing list discussion had mentioned
> that qemu-nbd and nbd module both support UNMAP command.
> So I follow the steps below on my machine (Ubuntu 13.10 with linux
> kernel 3.12) to test if qemu-nbd and QCOW2 do support UNMAP.
> 
> 1. Create a qcow2 file via qemu-img
> > sudo qemu-img create -f qcow2 -o cluster_size=524288 base.qcow2 1G
> 
> 2. Connect this qcow2 file with qemu-nbd
> > sudo qemu-nbd -c /dev/nbd0 base.qcow2 --discard=unmap
> 
> 3. Use sg_unmap command to issue UNMAP command to this NBD
> > sudo sg_unmap --lba=0 --num=1 /dev/nbd0
> 
> Everytime I get the following error message:
> 
> unmap cdb: 42 00 00 00 00 00 00 00 18 00
> unmap: pass through os error: Inappropriate ioctl for device
> UNMAP failed (use '-v' to get more information)
> 
> I also try to format this nbd device with EXT4 and mount it, but still
> cannot perform fstrim on the mount point.
> 
> Have I done anything wrong?

There are a lot of factors for getting unmap/discard/trim to work,
including:

 - guest tools (sg_unmap) or guest filesystem must support it
 - guest kernel must support it
 - host qemu must support it
 - host filesystem/etc must support it

My (possibly weak) understanding of the upstream qemu code is that
unmap/discard/trim is not supported in qcow2.  It is only supported in
raw files when using a POSIX-like host OS which has either of:

 - block devices supporting BLKDISCARDZEROES
 - files on XFS
 - files on other filesystems that support FALLOC_FL_PUNCH_HOLE (eg ext4)

Having said that, I did some tests using libguestfs and I could not
show that unmap was working, either using raw or qcow2 (both on ext4),
with virtio-scsi, and recent kernel & qemu.  I did not see any errors,
but also I don't see what I'm doing wrong.

Attached is my test script.

You will need to compile libguestfs with:

  ./configure --with-extra-packages="sg3_utils"

The results on my machine are:

$ /tmp/sparsetest.sh 
0 /tmp/test1
0 /tmp/test2
Read Capacity results:
   Protection: prot_en=0, p_type=0, p_i_exponent=0
   Logical block provisioning: lbpme=1, lbprz=0
   Last logical block address=204799 (0x31fff), Number of logical blocks=204800
   Logical block length=512 bytes
   Logical blocks per physical block exponent=0
   Lowest aligned logical block address=0
Hence:
   Device size: 104857600 bytes, 100.0 MiB, 0.10 GB
Block limits VPD page (SBC):
  Write same no zero (WSNZ): 1
  Maximum compare and write length: 0 blocks
  Optimal transfer length granularity: 0 blocks
  Maximum transfer length: 0 blocks
  Optimal transfer length: 0 blocks
  Maximum prefetch length: 0 blocks
  Maximum unmap LBA count: 0
  Maximum unmap block descriptor count: 0
  Optimal unmap granularity: 8
  Unmap granularity alignment valid: 0
  Unmap granularity alignment: 0
  Maximum write same length: 0x0 blocks

16M   /tmp/test1   <--- note no sparseness is created
16M   /tmp/test2

Please let us know if you get this working, because I'd really like to
fix virt-sparsify so it can work in-place!

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v


sparsetest.sh
Description: Bourne shell script


[Qemu-devel] TRIM/DISCARD/UNMAP support on qemu-nbd

2013-12-30 Thread Teng-Feng Yang
Hi folks,

I have been studying QCOW2 file format for a couple of days, and I am
a little bit confused about whether QCOW2 supports UNMAP or not.
As I surf through internet, some mailing list discussion had mentioned
that qemu-nbd and nbd module both support UNMAP command.
So I follow the steps below on my machine (Ubuntu 13.10 with linux
kernel 3.12) to test if qemu-nbd and QCOW2 do support UNMAP.

1. Create a qcow2 file via qemu-img
> sudo qemu-img create -f qcow2 -o cluster_size=524288 base.qcow2 1G

2. Connect this qcow2 file with qemu-nbd
> sudo qemu-nbd -c /dev/nbd0 base.qcow2 --discard=unmap

3. Use sg_unmap command to issue UNMAP command to this NBD
> sudo sg_unmap --lba=0 --num=1 /dev/nbd0

Everytime I get the following error message:

unmap cdb: 42 00 00 00 00 00 00 00 18 00
unmap: pass through os error: Inappropriate ioctl for device
UNMAP failed (use '-v' to get more information)

I also try to format this nbd device with EXT4 and mount it, but still
cannot perform fstrim on the mount point.

Have I done anything wrong?

Any help would be grateful.
Thanks.

Best Regards,
Dennis